0% found this document useful (0 votes)

61 views13 pages

Application of Machine Learning Techniques in Mineral Classification

This document discusses the application of machine learning techniques for mineral classification in SEM-EDS images. The authors compare the performance of 5 shallow machine learning models (logistic regression, linear SVM, k-NN, random forest, artificial neural networks) and a deep learning U-Net model on classifying 13 mineral phases in SEM-EDS images of a shale sample. The random forest model achieved the highest F1 score of 0.92 among the shallow models, while the U-Net model outperformed it on classifying unseen shale samples despite performing poorly on minority classes due to dataset imbalance. Sensitivity analyses showed the random forest was less impacted by dataset size reduction and more influenced by noise added to specific mineral elements like silicon.

Uploaded by

tina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views13 pages

Application of Machine Learning Techniques in Mineral Classification

Uploaded by

tina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Journal of Petroleum Science and Engineering 200 (2021) 108178

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

journal homepage: http://www.elsevier.com/locate/petrol

Application of Machine Learning Techniques in Mineral Classification for

Scanning Electron Microscopy - Energy Dispersive X-Ray Spectroscopy
(SEM-EDS) Images
Chunxiao Li, Dongmei Wang, Lingyun Kong *
Harold Hamm School of Geology and Geological Engineering, University of North Dakota, United States

A R T I C L E I N F O A B S T R A C T

Keywords: Mineral classification and segmentation is time-consuming in geological image processing. The development of
Mineral segmentation machine learning methods shows promise as a technique in replacing manual classification. In this study, per
Machine learning formances of five shallow machine classification algorithms and a deep learning algorithm were compared for
SEM-EDS
the goal of pixel-level mineral classification of Scanning Electron Microscopy - Energy Dispersive X-Ray Spec
U-net
The Bakken Formation
troscopy (SEM-EDS) images. Five machine learning models, including Logistic Regression (LR), Linear Support
Vector Machine (SVM), k-Nearest Neighbor (k-NN), Random Forest (RF), and Artificial Neuron Networks (ANN),
and a deep learning CNN U-Net model were used in this study. Thirteen mineral phases were classified on SEM-
EDS images of a shale sample taken from the Bakken Formation. Hyperparameters of models were tuned using a
grid-search method. Randomly selected balanced dataset was used for the shallow models, while the original
cropped images were used for the U-Net.
The experimental results showed that all classification algorithms resulted in high F1 scores ranging from 0.86
to 0.92. The RF demonstrated the best performance among the five machine learning models, with an F1 score of
0.92. Additionally, sensitivity analysis on the size of dataset demonstrated that the LR algorithm and the SVM
were less sensitive to the dataset reduction, while the models of k-NN, RF, and ANN were more influenced.
Sensitivity analysis of noise suggested that noises added on the element of Silicon, Aluminum, Magnesium,
Calcium, Potassium, and Iron would decrease the performance of the RF. Furthermore, the noise in Silicon had
the greatest effect on the prediction result compared to the other minerals. In addition, those non-linear clas
sifiers showed a larger performance score drop when the noise was simultaneously included into all the elements
density. Though U-Net shows poor performance on the segmentation of minority classes due to the negative
effect of imbalanced dataset the U-Net model still outperformed the RF model when it comes to unseen shale
samples.

1. Introduction (Haralick and Shapiro, 1985). Segmenting the pore space and organic
matters is the relatively easier part due to their obvious lower grayscale
A significant approach in image analysis for microstructure visuali values (dark) on SEM images (Wu et al., 2019) (Fig. 1). This process can
zation and quantification of rock samples is the use of the scanning be done by a thresholding method where all pixels with grayscale values
electron microscopy (SEM) (Klaver et al., 2012; Kelly et al., 2016; Sun above (or below) the threshold values are assigned to a particular class
et al., 2019). Pore structure information and mineralogy obtained from (Andrew, 2018). However, this threshold technique fails in classification
SEM image analysis is the basis for digital rock analysis and related when the other common minerals are present in shales such as quartz,
computational simulation work (Kong et al., 2019). Therefore, identi feldspar, carbonate minerals, and clay minerals due to their similar
fying the mineral phases on the SEM images are intimately related to the white-gray color in backscattered SEM image (Fig. 1). Therefore, in
subsequent analyses. Image segmentation involves partitioning and order to classifying these minerals with a backscattered SEM image,
clustering the image into different continuous and homogeneous regions more information such as the Energy Dispersive X-Ray Spectroscopy

* Corresponding author.
E-mail addresses: [email protected] (C. Li), [email protected] (L. Kong).

https://doi.org/10.1016/j.petrol.2020.108178
Received 5 March 2020; Received in revised form 11 November 2020; Accepted 23 November 2020
Available online 1 December 2020
0920-4105/© 2020 Elsevier B.V. All rights reserved.
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

approach in complicated image segmentation. The application of CNN

to a specific problem needs a large number of training samples, and the
bottleneck for applying CNN on the geoscience problem is the limited
available data. Efforts have been made to reduce the training dataset
required for CNN learning. The U-Net, developed for biomedical image
segmentation, has shown great performance by using less images and
outputs more precise segmentation maps (Ronneberger et al., 2015).
Researchers have reported many applications of machine learning
algorithms on mineral classification and segmentation of geological
images. Generally, there are two main categories for rock image seg
mentation: mineral phases segmentation and microstructure (pores/
cracks) segmentation (Guntoro et al., 2019; Misra et al., 2019). Mineral
segmentation involves partition of the 2D SEM-EDS images into several
continuous and homogeneous mineral phases, while microstructure
segmentation is to identify the pore spaces or cracks in more magnified
SEM images. Marmo et al. (2005) applied the ANN method to classify
minerals in thin section images. Izadi et al. (2015) also reported mineral
segmentation in thin sections based on an incremental clustering algo
Fig. 1. Grayscale SEM image, showing pixels of pores, cracks, organic matter, rithm and twelve color features (six from RGB color features and six
and pyrite are obvious darker or brighter in grayscale, while the other minerals from Hue-Saturation-Intensity color features). Izadi et al. (2017)
are difficult to be distinguished by pixel thresholding method (modified from continued their works and applied the ANN method to a plane and
(Li et al., 2018)). cross-polarized light thin section images. They reported a 93.21% ac
curacy in recognizing igneous minerals on the test dataset (Izadi et al.,
(EDS) needs to be incorporated. The most commonly used method to 2017). Most of these mineral segmentations were based on thin-section
classify mineralogy in 2D image is QEMSCAN (Quantitative Evaluation images. Tang and Spikes (2017) used element images from SEM as input
of Minerals by Scanning Electron Microscopy) (Pirrie et al., 2004) and features of ANN to distinguish five mineral phases and properties
MAPS (Modular Automated Processing System) (Saif et al., 2017). The including quartz, calcite, feldspar, as well as TOC, and clay/pores.
principles behind these two methods are similar, which involves Knaup et al. (2019) extracted features from EDS images for segmenting
creating phase assemblage maps of a specimen surface scanned by a images and reported prediction accuracy in the range of 64%–87%. Wu
high-energy accelerated electron beam along a predefined raster scan et al. (2019) proposed an SEM segmentation workflow that involves
pattern. Low-count energy-dispersive X-ray spectra are generated and feature extraction followed by a random cluster forecast to locate
provide information on the element composition at each measurement organic matter and pores of samples (Wu et al., 2019).
point. The element composition in combination with back-scattered This study systematically evaluates and compares the application of
electron (BSE) brightness and X-ray count rate information is con five shallow supervised classification algorithms and the U-Net model in
verted into mineral phases. However, the key process of converting pixel-level mineral phases classification/segmentation on SEM-EDS
element composition information to mineral phases either needs to be images. To our knowledge, it is the first time that the task of dis
processed by software with a large element composition-to-mineral tinguishing mineral phases within 2D SEM-EDS maps are solved through
database or to be conducted by experienced expert technicians (Knaup both pixel-level classification by shallow machine learning models and
et al., 2019). The quantitative classifying minerals within the SEM-EDS image segmentation method by deep learning model. For shallow ma
images is a job of labor-intensive and time-consuming. chine learning models, optimal hyperparameters were tuned through
At the same time, the machine learning-based methods for classifi grid search. Additionally, the sensitivity of the algorithms in prediction
cation have attracted much attention recently. This approach can performances was also analyzed regarding the input noise and the size of
excavate inherent rules and correlations based on a significant amount training data. The performance of shallow machine learning models and
of data analysis. In the petroleum industry, the machine learning deep learning CNN U-Net models were compared regarding the perfor
method has been widely used in reservoir characterization (Jung et al., mance of pixel-level mineral classification/segmentation. Intelligent
2018; Luo, 2019), reservoir simulation (Esmaeilzadeh et al., 2019, 2020, segmentation is promising for accelerating the acquisition of SEM data
2019; Temirchev et al., 2019, 2020), drilling optimization (Al-Obaidi as well as reducing the need for post-process filtration. This study will
et al., 2018), fluid transport analysis (Ulker and Sorgun, 2016), petro help geologists to obtain the mineralogy and microstructure of shale
physical property estimation (Li and Misra, 2019), as well as rock typing samples based on SEM-EDS images for a condition if a mineral map is not
(Gupta et al., 2018). A variety of machine learning algorithms, including available.
Logistic Regression (LR), Support Vector Machine (SVM), k-Nearest
Neighbor (k-NN), Random Forest (RF), and Artificial Neuron Networks 2. Sample, data and methodology
(ANN), are popularly employed for classification tasks.
In this study, both shallow-classifying-models and deep-segmenting- 2.1. Samples
model are used for the purpose of identifying different mineral phases.
Regarding the mineral identifying on a 2D image, it can either be treated Two shale samples, labeled as Sample 1 and Sample 2, taken from the
as a pixel-level classification task (classify each pixel into a mineral Bakken Formation, Williston Basin, North Dakota, U.S., were used in this
class) or a segmentation task (partitioning and clustering the image into study. The Bakken Formation is a thin rock unit with the interval
different continuous and homogeneous regions). As a pixel-level clas thickness in the range of 70–100 ft, which can be further divided into
sification task, the procedure of determining the class label for a pixel is three lithologic members: The Upper, Middle and Lower members. The
not interfered with by another neighbor pixels or location information. Upper and Lower members are organic-rich black shales, while the
In contrast, for segmenting a image, the deep model will take the loca Middle member is carbonate rich sandstones and siltstones (Smith and
tion information and the adjacent pixels into account in the process of Bustin, 1996). Data acquired from Sample 1 were used for training and
partition. This allows for the segmented regions to be continuous and validation, and then trained model was deployed on Sample 2’s EDS
homogeneous compared to the results of pixel-level classification. Deep maps to classify/segment the mineral phases.
learning using Convolutional Neural Network (CNN) is the state-to-art

2
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 2. Element intensity images for Sample 1. Intensity is shown in grayscale.

2.2. Data collection

2.2.1. Element images acquisition

The feed-in data was extracted from the element intensity maps
(Fig. 2). The first step of data collection is acquiring these element im
ages, the process of which was carried out by EDS. Regarding the
principle utilization of the EDS, a-high energy beam of an electron emit
an electron in an inner shell, and electrons jumping from the outer
higher energy shell to lower energy release energy in the form of an X-
ray. The number and energy of the X-rays emitted from a specimen can
be measured by an energy-dispersive spectrometer. Since the energies of
the X-rays can be used as a characteristic of the atomic structure of the
emitting element, EDS allows the acquisition of the element composition
of the specimen (Saif et al., 2017). In a 2D section of a specimen, each
analysis point is examined by the EDS X-ray detector, and an X-ray
spectrum provides information about what elements are present and the
quantities of each are generated. Element images are built up by raster
scanning of the 2D area and are taken as a spectrum at each point. An
energy window is defined for each element of interest. Quantities are
detected in the energy window so that a corresponding pixel in the X-ray
map of that element is then given in brightness. Therefore, the bright
ness/intensity of pixel on a grayscale reflects the difference in the Fig. 3. Mineral map of the Sample 1. The mineral type at each pixel was used
as the labeled data for the models. The dominant minerals are quartz, feldspar,
electron density of the target element.
illite, and dolomite. (revised from (Li et al., 2019)).
A total of 12 element maps were captured on Sample 1 by using the
FEI Quanta FEG 650 SEM instrument, including Aluminum (Al), Cal
cium (Ca), Iron (Fe), Potassium (K), Magnesium (Mg), Manganese (Mn), grayscale, where the brightness of pixel reflects the intensity of its cor
Sodium (Na), Phosphorus(P), Silicon (Si), Tin (Ti), and Zirconium (Zr) responding element (Fig. 2).
(Fig. 1). The area of the element images is 1024 μm × 1024 μm, with the
resolution of 1 μm per pixel, which determined the resolution size of the 2.2.2. Mineralogy image acquisition
maps at 1024 × 1024 pixels. Additionally, the images are shown in The labeled ground truth data was extracted from the mineralogy

3
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 4. Schematic illustration of the procedure for acquiring mineralogy map through MAPS (modified from (Saif et al., 2017)).

Fig. 5. The workflow of generating the dataset for the shallow models.

map (Fig. 3). The acquisition of the mineral information was carried out 2.3. Feature extraction and dataset establishing
by the software MAPS, which is an automated mineral mapping software
commercialized by FEI. A mineral map of shale Sample 1, scanned on the The five machine learning models share the same input dataset,
same area of the element maps, was generated by the MAPS. Mapping of while the deep learning CNN U-Net model uses a different one. For the
mineralogy is performed and captured in the following processes as shallow models, the input is the pixel-level grayscale values extracted
illustrated in Fig. 4 (Saif et al., 2017): a) firstly the sample area was from element intensity maps, and the label data is the mineral class at
divided into multiple tiles, then b) the electron beam scans each tile to each pixel. While the U-Net model was trained end-to-end, where the
produce a BSE image, c) each analysis point is examined by EDS x-ray input is images cut from the element maps, and the output is the cor
detector and an X-ray spectrum is acquired, d) phase classification responding mineral map.
matches the observed spectrum at each point with known phases in a
mineral database, and last e) pixels were assigned to mineral composi 2.3.1. Feature extraction and dataset of machine learning models
tion and a mineral map is generated. The workflow for feature extraction for shallow learning models is
A total of 13 main mineral phases including K-feldspar, Quartz, illustrated in Fig. 5. The input data for are the intensity matrix, extracted
Dolomite, Illite, Pyrite, Albite, Muscovite, Calcite, Annite, Organic from the 12 element images. The brightness/intensity of each element at
Matter, Anorthite, Ankerite, and Chamosite were recognized in the each pixel was extracted from the grayscale maps (Fig. 2), where the
sample. Some minor minerals are also recognized by the software in this brightness or intensity values are integers that range from 0 (black) to
sample which is labeled as ‘unknown’ in Fig. 3, and they were ignored 255 (white), meaning the brighter the pixel the higher the intensity of
and not used in this study. Different mineral phases were labeled as the corresponding element. Each element intensity was normalized to
different colors in the mineral map (Fig. 3). transform data within the map to the range of 0–1 by dividing the pixel
values by 255. The densities of 12 elements at the same pixel location
were then extracted as a vector containing 12 element feature values

4
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 6. Correlation matrix for input variables of machine learning models.

15,600 denotes the total number of samples, and the number of 12 de
Table 1
notes the 12 element density features extracted at each data point/pixel.
Mineral distribution in Sample 1.
Additionally, the ground truth data is a vector with size of 15, 600 × 1
Mineral/Class Percentage (%) which contains the value of the mineral class labels (Fig. 5).
Albite 1.14 The dataset of 15,600 data points from Sample 1 was split into the
K-feldspar 38.87 training and validation dataset, where the cross-validation dataset was
Quartz 28.25
25% of the entire dataset and the training process was comprised by a 4-
Illite 12.91
Dolomite 14.14 fold cross validation process. In the end, after the training and valida
Pyrite 2.24 tion, the best selected model was then used on Sample 2.
Calcite 0.81
Muscovite 0.72 2.3.2. Data processing and dataset for U-Net
Annite 0.32
Compared to the machine learning models, of which the input data
Ankerite 0.18
Anorthite 0.13 are the element intensity matrix extracted from the element intensity
OM 0.16 maps, the training for the U-Net is end-to-end. The elements maps
Chamosite 0.13 (Fig. 2) and mineral map (Fig. 3) are sliced into training samples
(cropped images) with a size of 128 × 128 pixels, which are also flipped
in both vertical and horizontal directions to generate sufficient training
(Fig. 5). Correlation matrix of these elemental intensities shows that the
samples through data augmentation procedure. Each input sample has a
correlation between Al and K, Mg and Ca, S and Fe are strongly positive
total of 12 channels (elements), and the output has 13 channels (mineral
related, while for Si and Ca, Si and Mg are highly negatived related
class). After data augmentation, there are a total of 147 samples for the
(Fig. 6). This is easy to understand that each mineral has a fixed
training and testing, which were further divided into training and model
chemical formula. For example, the chemical formula of k-feldspar is
validation, with a fraction of 75% and 25%.
KAlSi3O8, and therefore the presence of element Al is strongly related to
the presence of element K. Meanwhile, only one mineral can present at
2.4. Performance metric
each pixel. This accounts for that the strong negative relationship be
tween elements which don’t exist within the same mineral. Additionally,
To evaluate the prediction performance of different models, the
these mineral classes at the corresponding pixel in the mineral map are
multiclass version of the F1 score was used. The following equation is
the labeled ground truth data (Fig. 5), resulting in 13 mineral classes.
the definition of the F1 score (Müller et al., 2016):
The mineral class distribution as shown in Fig. 3 is imbalanced. The
percentage of each mineral class is calculated for quantifying the level of Precision ⋅ Recall
f 1 = 2⋅ (1)
the imbalance. Table 1 shows the percentage of each mineral class for Precision + Recall
Sample 1. The sample comprises of majority of K-feldspar, Quartz,
Dolomite and Illite. To build a balanced dataset for training and vali Precision =
TP
(2)
dation, 1200 pixels per class were randomly selected, which resulted in TP + FP
15,600 total number of pixels (1200 pixel per classification x13 class)
TP
(Fig. 5). A matrix with size of 15,600 × 12 was used as input, where Recall = (3)
TP + FN

5
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 7. Illustration of logistic regression.

Where TP = True positive prediction, FP= False positive prediction, ( )

FN=False negative prediction. To calculate the multiclass version of the y = logistic
̂ ̂ ̂ 1 *x1 + … w
b+w ̂ n *xn (4)
F1 score, a binary F1 score of each class was adopted by taking that class
as the positive class, and the others were account for the negative 1
y=
̂ [ ( )]
classes. The multiclass F1 score was then calculated by averaging the (5)
1 + exp − ̂ ̂ 1 *x1 + … w
b+w ̂ n *xn
binary F1 scores of all classes. There are three strategies of averaging the
F1 scores through classes (Müller et al., 2016):
The standard form of logistic regression is used for binary classifi
• F1_micro: it computes the total number of FP, FN, TP over all classes, cation. In this study, multi-class classification, with logistic regression,
and then calculate the F1 score using these number. The ‘micro’ can be revised through changing the loss function to cross-entropy loss,
strategy treats each sample equally. whose equation is (Müller et al., 2016):
• F1_weighted: this averaging strategy computes the weighted mean of ( )
1 ∑ N
per-class F1-scores. The weight of each F1-score is the ratio of the J=− p(x)i ⋅log(q(x)i ) (6)
N i=1
number of samples in this class to the total number of samples.
• F1_macro: it directly computes from averaging the per-class f scores.
Where p(x)i is the true distribution of the i-th sample, and q(x)i is the
This strategy cares each class equally much.
estimated distribution of the i-th sample. For one single sample i, the
ground truth p(x)i gives all probability to the specific class, and the other
For the balanced dataset, three types of F1 scores are much similar,
values are zero, for example, the ground truth p(x) = [1,0,0,0,0] for a
especially F1_weighted = F1_macro since there is an equal quantity of
sample labeled as the 1st class among total five classes.
samples in each class, while these score numbers have much difference
for the imbalanced dataset.
3.1.2. Support Vector Machine
The Support Vector Machine (SVM) algorithm is a discriminative
3. Classification algorithms
classifier. Using the SVM algorithm, the objective is to find a hyperplane
in the dimensional space (Suykens and Vandewalle, 1999; Wang, 2005).
Five machine learning models and a deep learning model were used
In the two dimensional space, the Decision boundary of an SVM is a
in this study. The LR, SVM, k-NN, and RF were implemented from the
curve dividing a plane into two parts. The process is to choose the most
Scikit-learn library (Pedregosa et al., 2011), and the ANN model and
confident decision boundary among multiple hyperplanes with the
U-Net are from the TensorFlow library (Abadi et al., 2016).
maximum margin (defined as the maximum width the decision bound
ary area can be increased to before hitting a data point) is proposed.
3.1. Machine learning models Kernels are needed regarding nonlinear separating problems, which
transforms the input data points to higher dimension space, then con
Five classification machine learning algorithms were implemented verting non-separable problems to a separable problem. This study used
for the comparative study of their performance for mineral segmenta the linear SVM based on the fact that the kernelized SVM are compu
tion. Logistic Regression and Linear Support Vector Machine are linear tationally expensive, and time-consuming for large dataset (Tang et al.,
classifiers, while k-Nearest Neighbors, Random Forest, and Artificial 2008).
Neuron Network are non-linear classification algorithms. The theories of
these algorithms are briefly described in this section. 3.1.3. K-Nearest Neighbors
The k-Nearest Neighbors (k-NN) algorithm is widely used in the
3.1.1. Logistic Regression classification problems in the industry. The idea behind the k-NN al
Logistic Regression is a probabilistic classification model, a linear gorithms is that to find the label (class) of an unknown sample, it takes
method with prediction transformed using the logistic function (Drei votes from a group of k labeled samples that are nearest to the unknown
seitl and Ohno-Machado, 2002; Hosmer et al., 2013). The logistic sample. The unknown label sample was obtained by calculating the
function, also known as sigmoid function, is an S-shaped curve that average label attributes of the k known samples (Cover and Hart, 1967).
transforms real-values input to an output number between 0 and 1, Therefore, the performance of these classification algorithms signifi
interpreted as the probability that the input object belongs to the posi cantly depends on the k; the key parameter for k-NN. In this study, the k
tive class, given its input features (x0 , x1 , …, xn ) (Fig. 7). The following value ranged from 1 to 20 were tested to find the optimal value for the
equations can be used to describe the logistic regression (Hosmer et al., best prediction result.
2013):

6
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 8. Illustration of the random forest for segmenting mineral phases.

3.1.4. Random Forest The advantages of using the random forest is that the data does not need
The Random Forest algorithm works effectively on a variety of to be preprocessed. However, to achieve good performance it is critical
problems. It is an ensemble of multiple decision trees (Breiman, 2001; to realize the important hyperparameters needs to be tuned which
Liaw et al., 2002). The idea is to solve the issue that an individual de include the maximum depth of the tree and the maximum number of
cision tree may be prone to overfit a portion of the data. By combining features.
different individual decision trees into an ensemble (Fig. 8), a random
forest can average out the individual mistakes to reduce the risk of 3.1.5. Artificial Neuron Network
overfitting (Breiman, 2001). A random forest creates tens to hundreds of Among various machine learning algorithms, one of the most pop
individual decision trees on a training set, and each individual decision ular technologies is Artificial Neural Networks (ANN). This algorithm
tree is constructed by introducing random variation (Liaw et al., 2002). can extract both implicit and complex data correlation based on large
This random variation during tree building happens in two ways. First, amounts of training data. The typical structure of the ANN model con
the data used to train an individual tree in a forest ensemble, referred to sists of a series of layers. Each layer contains a number of “neuron” units
as the sub-sample or bootstrap sample, is selected randomly. Second, in and carries a calculation of weighted input plus a bias term followed by a
an individual decision tree, the best feature to split a node is picking non-linear transformation (Zurada, 1992). The results obtained by the
within a randomly selected subset of features, instead of across all above procedures are then fed into the next layer. The training process
possible features. By randomizing these two processes, it will guarantee involves minimizing the difference between the true values and pre
that all the decision trees and the random forests will be different. For dicted values. During the process, the weights and bias of each layer are
the classification problem, the overall prediction is based on a weighted iteratively updated by backpropagation algorithms. Due to the
vote across all trees, once the random forest is trained, each individual nonlinear activation function and hidden neurons, deep neural networks
tree can make a prediction based on the target classes (Breiman, 2001). are established to deal with situations where input-output mappings are

Fig. 9. A schematic of U-Net architecture for segmentation.

7
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 10. Effect of regularization parameter on performance of (a) logistic regression classifier and (b) linear SVM.

extensively complex.

3.2. U-Net

Additionally, to using the above shallow models, the U-Net, which is

fully connected CNN, was also used in this study for mineral segmen
tation. The U-Net architecture looks like a ‘U’ which is comprised of
three sections: The contracting section (left part), the bottleneck (middle
bottom) and the expansive section (right part) as shown in Fig. 9. In the
contracting path on the lift side, each block takes an input and applies
two convolutional kernels of 3 × 3, followed by a rectified linear
transformation and a max-pooling operation with a stride of 2 × 2. The
feature channels in the contracting path as doubled increasing from 16
to 256. This allows the network to learn the complex structure effec
tively by propagating context information to the higher resolution.
Additionally, the bottom-most layer connects the contraction section
and the expansion section. Fig. 11. The relationship between n_neighbor and mean F1 scores on test data.
Similar to the contraction layers, the expansion sections are also
comprised of several expansion blocks. Additionally, each block passes Likewise to the LR, the only parameter required to be tune in the
the input to two convolutional layers followed by a 2 × 2 up-sampling linear SVM is the regularization parameter, C. The optimal value C was
layer. In contrast to the contraction section, after each block, the searched from the following list: [0.001,0.01,0.1,1,10,100], by which
channels in the CNN layers cuts to half to maintain symmetry. In the end, the optimal performance was observed when C = 10 (Fig. 10b).
the feature maps pass through an 1 × 1 CNN layer with the number of
feature maps equal to the number of segments desired. 4.1.2. k-NN classifier
In the k-NN classifier, the class attributes of the unknown samples are
4. Results and discussion decided from the average response of its k nearest neighbors. Therefore,
the number of neighbor samples (n_neighbor) plays an important role in
4.1. The effects of tuning parameters on the shallow classifiers’ the performance of the algorithms. In this study, the number of neighbor
performance values ranging from 1 to 20 were tested using the cross-validation
method. The F1 score of the model obtained from test datasets were in
Hyperparameters tuning is the process of choosing a set of optimal a function of the number of neighbors, and the results were given in
hyperparameters for the learning algorithms. In this section, the Fig. 11. As observation, when a small number of neighbors were used,
hyperparameters selection of LR, SVM, k-NN, and RF classifiers were the classifier was overfitting and the prediction results on the testing
discussed. The performance of each classifier was evaluated through dataset were poor. Furthermore, when this number increased, the per
cross-validation. formance of the test was improved It was found that n_neighbor = 5 was
the optimal value where the test accuracy was highest.
4.1.1. Regularization parameters for LR and linear SVM classifier
The L2 regularization was introduced in LR to prevent overfitting. 4.1.3. RF classifier
The regularization constant, C, was tuned to obtain the optimal per As reported from previous studies, the key parameters that need to be
formance. The results show that the classifier performance increased tuned in the Random Forest are the number of decision trees (n_esti
when the value of C increasing from 0.001 to 10. Additionally, when C mator) and the maximum features (max_feature) in each split node
continued increasing to 100, 1000, and 10000, the performance of the (Probst et al., 2019). When the parameter of the minimum sample in an
classifier only shows slight changes. Based on these observations, C = individual leaf (min_sample_leaf) is set to 1, the decision tree is called a
100 was chosen for the logistic classifier (Fig. 10a). ‘full’ tree. Using a full branch tree showed significant overfitting on the

8
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Table 2
Prediction performance of different classifiers.
Learning models F1-score

F1- F1- F1-

micro weighted macro

Machine learning Logistic 0.8705 0.8691 0.8691

models regression
Linear SVM 0.7980 0.7778 0.7778
k-Nearest 0.9150 0.9139 0.9139
Neighbors
Random Forest 0.9238 0.9215 0.9215

ANN 0.8869 0.8873 0.8873

Deep learning model U-Net 0.8832 0.8784 0.7301

training dataset, while n_estimator = 100 provided the best performing in

the validation dataset (Fig.12b). In this case, the parameter = 100 was
chosen.

4.1.4. Hidden layers in ANN

The architecture of ANN was shown to affect the prediction perfor
mance (Zurada, 1992). Through the prediction experiments, it was
observed that when ANN was applied with two or more hidden layers
the performance did not increase over the use of a single hidden layer.
Therefore, in this study, ANN with one single hidden layer with 13 nodes
was used.

4.2. Comparison of the prediction results

For the shallow machine learning models, the validation dataset was
used to measure the prediction performance of each classifier. Perfor
mance results obtained by cross-validation were shown in Table 2. The
F1 score results show a slight contrast with the exception of linear SVM
when applying different averaging strategies. The Random forest clas
sifier with a micro F1 score of 0.9238 performed the best among the five
shallow models followed by the k-NN, LR, ANN, and Linear SVM.
Moreover, the scores calculated by different averaging strategy show
slight differences, meaning that models trained by a balanced dataset
can perform well regardless of the distribution of mineral phases.
For the U-Net model, the following scores were observed: the F1-
micro (0.8832), F1-weighted (0.8784), and F1-macro score (0.7301)
for the validation datasets. The results indicate that the F1 score aver
aged over all classes (F1-macro) is much lower when compared to the
score averaged over all pixels (F1-micro), meaning that the U-Net model
Fig. 12. The effect of (a) minimum sample at each leaf, (b) number of decision showed poor performance when classifying the minor classes. This is
tree, and maximum features on the performance of sample.

training dataset (Fig. 12a). This lead to that trees prepared by the base
algorithms can be prone to overfitting as they became incredibly large
and complex. Additionally, the model can be simplified by setting a
lower limit for the minimum number of samples in an individual leaf
(min_sample_leaf). The simplified model is referred as the ‘pruned tree’.
In this paper, we tested the performance of the RF classifier on both
training and validation datasets by the cross-validation and examine the
effect of max_feature and min_sample_leaf on the prediction results. The
outcome showed when the number of min_sample_leaf increased, the
score of the classifier on training dataset decreased, while the prediction
performance of validation data improved when the parameter increases
from 1 to 2. However, a slight difference of performance was shown
when the parameter was in the range of 2–4, presenting an F1 score
ranging from 0.918 to 0.920. The optimal parameters max_feature = 5,
and min_sample_leaf = 4 for the RF classifier were chosen in this study.
Additionally, the effect of the number of trees in the forest (n_esti
mator) on the classifier performance was evaluated as well. The results
show that n_estimator = 50 provided the best performance for the Fig. 13. Comparison of performance of different models on various data
set size.

9
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 14. Comparison of performance of Random Forest in the situation of 10%,

20%, and 30% Gaussian noise is added into the input element density one at
a time.

due to the negative effect of imbalanced dataset (Miao et al., 2019). As

shown in Table 1, the shale sample mainly comprised of k-feldspar,
quartz, illite, and dolomite, while the others only occupied a small
portion. The imbalance of mineral classes results in the F1-macro much
lower than the F1-micro.

4.3. Sensitivity analysis of the machine learning models

4.3.1. Sensitivity of the dataset size

Sensitivity analysis of the machine learning models to the size of
dataset was conducted by training the models on three different sizes of
datasets. As mentioned in the previous section, 1200 pixels per class
were randomly selected to create the balanced dataset. For the purpose
of a comparative study, another three dataset with reduced size was
used: 600 pixels per class, 300 pixels per class, and 100 pixels per class
were randomly selected to build the second, third and fourth datasets,
respectively. The F1 scores averaged by the micro way were compared
among different sample sizes and models. The results show that LR and
Linear SVM were less sensitive to dataset size, while the k-NN, Random
Forest, and ANN were more affected by the reduction of the size of the
training dataset (Fig. 13). The dataset with 100 pixels per class was
sufficient for the linear algorithms to learn the relationship, due to the
simple nature of linear algorithms. As a result, the performance of linear
regression and Linear SVM did not do better among prediction scores Fig. 15. Comparison of (a) F1 score, and (b) F1_score drop percentage of
including when more data were available for the training. In contrast, various models when 10% and 20% of Gaussian noised were simultaneously
the performance of k-NN, RF, and ANN significantly improved with the added into all input element densities.
size of the dataset increasing, and the improvement was most obvious
when dataset enlarged from 100 pixels per class to 300 pixels per class. significantly decreased. The reason that these elements play more
This was because more data for training helped on the robust nonlinear important role than the others is because these elements are the most
model buildings in these specific methods. common elements within the minerals as shown in Fig. 3. For example
the element Silicon is the basic comprising element for the silicate
4.3.2. Effect of noise on the prediction minerals, such as, Quartz and K-feldspar, and it presents in clay minerals
Two scenarios of noise-adding were conducted to evaluate the effect as well. While adding noise on the elements including Mn, Na, P, S, Ti, Zr
of noise on the performance of different classifier models. First, the did not show an obvious drop on the F1 score, indicating less importance
Random forest model which showed the best performance was in mineral phases segmentation. In general, adding noise once a time on
measured. Gaussian noise, with a standard deviation of 10%, 20%, and individual elements didn’t yield a severe impact on the RF model, for
30% was added separately into each input element density. At each data example, only 2% of the score drop happened when 30% Gaussian noise
point, the magnitude of added noise was randomly selected among the was added to the Si demonstrating the robustness of RF model.
Gaussian distribution. The operation of adding noise and running pre The second noise sensitivity test added 10% and 20% Gaussian
diction was repeated 10 times, with the averaged results given in Fig. 14. noises simultaneously to all input element density. Then the dataset with
Adding noise into the element of Silicon showed a remarkable effect on noise was then fed into the five models to test their performance. As
the prediction performance when the of Random Forest method was observed, adding noise decreased the prediction performance. When
used, followed by the element of Al, Mg, Ca, K, and Fe. Additionally, for more noises was added, the more the F1 score decreased (Fig. 15a). With
these elements, when additional noise was added performance scores the presence of most noise in input data, the Random Forests algorithm

10
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Fig. 16. Mineral maps with (a) ground true labels, and predicted map from (b) RF and (c) U-Net.

still outperforms the others. However, when comparing the dropped presence of noise, k-NN, RF, and ANN models tended to overfit the
prediction score, it was found that the non-linear classifiers-score drop noised data, resulting in a larger performance score drops.
was larger than those from the linear classifiers (Fig. 15b). Compared to
the relatively simple linear classifiers, when training the dataset in the
4.4. An example of applying RF classifier and U-Net on unseen sample

Table 3 In this section, an example of applying the trained RF classifier and

Prediction results of RF classifier on unseen sample. U-Net on mineral segmentation of SEM-EDS images of a total unseen
Ground True Label Predicted Label Pixel Number Relative Error sample is discussed. SEM-EDS images of size of 128 × 128 pixels, a total
of 16,384 pixels were fed into the trained RF model and U-Net for the
K-feldspar Quartz 1520 9.28%
K-feldspar Illite 486 2.97% purpose of comparison.
Illite K-feldspar 113 0.69% The ground true map and predicted mineral map by RF and U-Net are
Illite Quartz 88 0.54% shown in Fig. 16a–c, respectively. The U-Net model outperforms the RF
K-feldspar Albite 34 0.21% on the unseen sample with the F1 score of RF and U-Net 0.85 and 0.92,
Quartz K-feldspar 27 0.16%
respectively. When comparing the predicted results of RF with the

11
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

ground truth map, most of the pixels were correctly predicted by this the work reported in this paper.
model. In terms of mineral classes, the major minerals, including quartz,
illite, pyrite, dolomite, and calcite were correctly recognized. However, Acknowledgment
when looking at the wrong predictions, it was observed most of these
errors are related to k-feldspar (Table 3). It was predicted as either as The authors would like to thank the North Dakota Geological Survey
Quartz, Illite, or Albite. Potential reason for this is that these minerals and Core Library for allowing us access to the shale sample, particularly
share similar comprising elements. For example, comparing the chem Jeffrey Bader, state geologist and director as well as Kent Hollands,
ical formula of k-feldspar, KAlSi3O8, and Illite,K0.65Al 2.0 [Al0.65Si3.35O laboratory technician.
10](OH)2. It founds that they both contain the elements of Al, K, Si, O,
and H. It appears that the close intensities of these elements presented at Nomenclature
a pixel can led to the classifier failing to distinguish one mineral from
another. Further efforts are needed to improve the RF model’s perfor SEM Scanning electron microscopy
mance regarding this issue. Moreover, some discrete K-feldspar pixels LR Logistic Regression
were wrongly predicted as Quartz, for which the potential reason is SVM Linear Support Machine
because the elementary intensity obtained from X-ray of these isolated k-NN k-Nearest Neighbor
k-feldspar pixels was prone to be inferenced by the around Quartz pixels, RF Random Forest
and therefore it is prone to be identified as Quartz. ANN Artificial Neuron Networks
Compared to the RF model, the mineral map predicted by the U-Net EDS Energy Dispersive X-ray Spectroscopy
model presents a better performance in terms of the isolated small MAPS Modular automated processing system
particles, which is due to that the U-Net model not only takes the in TP True positive prediction
tensity information but also the location information of the input data. FP False positive prediction
Another characteristic of the predicted results of the U-Net model is that FN False negative prediction
the shape of the particles turned to be rounder compared to the ground
truth map (Fig. 16a) as the U-Net was originally designed for separating Appendix A. Supplementary data
cells in medical research. Furthermore, the U-Net failed predicting the
minor mineral classes in some pixels, for example, it missed several Supplementary data to this article can be found online at https://doi.
Muscovite particles as shown in brown. This is mainly due to the org/10.1016/j.petrol.2020.108178.
shortness of the imbalanced training dataset for the U-Net model.
References
5. Conclusion
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., others, 2016. Tensorflow: a system for large-scale machine
In this study, mineral segmentation of SEM-EDS images using five learning. In: 12th ${$USENIX$}$ Symposium on Operating Systems Design and
shallow machine learning algorithms and a deep learning CNN U-Net Implementation (${$OSDI$}$ 16), pp. 265–283.
model were implemented and compared. For shallowing learning Al-Obaidi, M., Heidari, Z., Casey, B., Williams, R., Spath, J., 2018. Automatic well-log-
based fabric-oriented rock classification for optimizing landing spots and completion
models, balanced datasets with different training sample sizes were intervals in the midland basin. In: Presented at the SPWLA 59th Annual Logging
performed. Additionally, a sensitivity analysis of the effect of noise on Symposium. Society of Petrophysicists and Well-Log Analysts.
each model was also examined. Finally, the trained RF model and U-Net Andrew, M., 2018. A quantified study of segmentation techniques on synthetic geological
XRM and FIB-SEM images. Comput. Geosci. 22, 1503–1512. https://doi.org/
were implemented on an unseen sample to compare their performance 10.1007/s10596-018-9768-y.
on mineral segmentation. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Results demonstrate that all classification algorithms show a high Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor.
13, 21–27.
overall score ranging from 86% to 92%. Random Forest demonstrates
Dreiseitl, S., Ohno-Machado, L., 2002. Logistic regression and artificial neural network
the best performance among the five shallow models, with an F1 score of classification models: a methodology review. J. Biomed. Inf. 35, 352–359. https://
0.92. Sensitivity analysis on datasets size shows that Linear Regression doi.org/10.1016/S1532-0464(03)00034-0.
and Linear SVM were less sensitive to dataset size, while the k-Nearest Esmaeilzadeh, S., Salehi, A., Hetz, G., Olalotiti-lawal, F., Darabi, H., Castineira, D., 2020.
Multiscale modeling of compartmentalized reservoirs using a hybrid clustering-
Neighbors, Random Forest, and ANN were more sensitive to the based non-local approach. J. Petrol. Sci. Eng. 184, 106485. https://doi.org/
reduction of the size of the training dataset. Sensitivity analysis of noise 10.1016/j.petrol.2019.106485.
indicates that noises adding on the element of Silicon, Aluminum, Esmaeilzadeh, S., Salehi, A., Hetz, G., Olalotiti-lawal, F., Darabi, H., Castineira, D., 2019.
A general spatio-temporal clustering-based non-local formulation for multiscale
Magnesium, Calcium, Potassium, and Iron would decrease the perfor modeling of compartmentalized reservoirs. In: SPE Western Regional Meeting.
mance of RF due to their wider distribution in the element maps. When it https://doi.org/10.2118/195329-MS.
comes to unseen shale samples, although the U-Net shows relatively Guntoro, P.I., Tiu, G., Ghorbani, Y., Lund, C., Rosenkranz, J., 2019. Application of
machine learning techniques in mineral phase segmentation for X-ray
poor performance on the segmenting of minor mineral classes due to the microcomputed tomography (μCT) data. Miner. Eng. 142, 105882. https://doi.org/
negative effect of an imbalanced dataset, it still outperformed the RF 10.1016/j.mineng.2019.105882.
model in terms of correct classification of more pixels (higher F1 score) Gupta, I., Rai, C., Sondergeld, C.H., Devegowda, D., 2018. Rock typing in eagle ford,
barnett, and woodford formations. SPE Reservoir Eval. Eng. 21, 654–670. https://
and better performance on segmenting small isolated particles. In doi.org/10.2118/189968-PA.
Haralick, R.M., Shapiro, L.G., 1985. Image segmentation techniques. Comput. Vis. Graph
Credit author statement Image Process 29, 100–132.
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X., 2013. Applied Logistic Regression.
John Wiley & Sons.
Chunxiao Li: Conceptualization, Methodology, Data Analytics, Pro Izadi, H., Sadri, J., Bayati, M., 2017. An intelligent system for mineral identification in
gramming, Writing- Original draft preparation. Dongmei Wang: Super thin sections based on a cascade approach. Comput. Geosci. 99, 37–49. https://doi.
org/10.1016/j.cageo.2016.10.010.
vision, Reviewing and Editing. Lingyun Kong: Programming for U-Net
Izadi, H., Sadri, J., Mehran, N.-A., 2015. A new intelligent method for minerals
Model, Critical Revision, Language Improvement, and Proofreading. segmentation in thin sections based on a novel incremental color clustering. Comput.
Geosci. 81, 38–52. https://doi.org/10.1016/j.cageo.2015.04.008.
Declaration of competing interest Jung, H., Jo, H., Kim, S., Lee, K., Choe, J., 2018. Geological model sampling using PCA-
assisted support vector machine for reliable channel reservoir characterization.
J. Petrol. Sci. Eng. 167, 396–405. https://doi.org/10.1016/j.petrol.2018.04.017.
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence

12
C. Li et al. Journal of Petroleum Science and Engineering 200 (2021) 108178

Kelly, S., El-Sobky, H., Torres-Verdín, C., Balhoff, M.T., 2016. Assessing the utility of FIB- Probst, P., Wright, M.N., Boulesteix, A.-L., 2019. Hyperparameters and tuning strategies
SEM images for shale digital rock physics. Adv. Water Resour. Pore Scale Model. for random forest. WIREs Data Mining Knowledge Disc. 9, e1301 https://doi.org/
Exp. 95, 302–316. https://doi.org/10.1016/j.advwatres.2015.06.010. 10.1002/widm.1301.
Klaver, J., Desbois, G., Urai, J.L., Littke, R., 2012. BIB-SEM study of the pore space Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for
morphology in early mature Posidonia Shale from the Hils area, Germany. Int. J. biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M.,
Coal Geol. Shale Gas Shale Oil Petrol. Petrophys. 103, 12–25. https://doi.org/ Frangi, A.F. (Eds.), Medical Image Computing and Computer-Assisted Intervention –
10.1016/j.coal.2012.06.012. MICCAI 2015, Lecture Notes in Computer Science. Springer International Publishing,
Knaup, A., Jernigen, J., Curtis, M., Sholeen, J., Borer, J.I., Sondergeld, C., Rai, C., 2019. Cham, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.
Unconventional reservoir microstructural analysis using SEM and machine learning. Saif, T., Lin, Q., Butcher, A.R., Bijeljic, B., Blunt, M.J., 2017. Multi-scale multi-
In: Presented at the SPE/AAPG/SEG Unconventional Resources Technology dimensional microstructure imaging of oil shale pyrolysis using X-ray micro-
Conference, Unconventional Resources Technology Conference. https://doi.org/ tomography, automated ultra-high resolution SEM, MAPS Mineralogy and FIB-SEM.
10.105530/urtec-2019-638. Appl. Energy 202, 628–647. https://doi.org/10.1016/j.apenergy.2017.05.039.
Kong, L., Ostadhassan, M., Hou, X., Mann, M., Li, C., 2019. Microstructure characteristics Smith, M.G., Bustin, R.M., 1996. Lithofacies and paleoenvironments of the upper
and fractal analysis of 3D-printed sandstone using micro-CT and SEM-EDS. J. Petrol. devonian and lower mississippian Bakken Formation, Williston Basin. Bull. Can.
Sci. Eng. 175, 1039–1048. Petrol. Geol. 44, 495–507.
Li, C., Ostadhassan, M., Abarghani, A., Fogden, A., Kong, L., 2019. Multi-scale evaluation Sun, W., Zuo, Y., Wu, Z., Liu, H., Xi, S., Shui, Y., Wang, J., Liu, R., Lin, J., 2019. Fractal
of mechanical properties of the Bakken shale. Journal of materials science 54 (3), analysis of pores and the pore structure of the Lower Cambrian Niutitang shale in
2133–2151. northern Guizhou province: investigations using NMR, SEM and image analyses.
Li, C., Ostadhassan, M., Guo, S., Gentzis, T., Kong, L., 2018. Application of PeakForce Mar. Petrol. Geol. 99, 416–428. https://doi.org/10.1016/j.marpetgeo.2018.10.042.
tapping mode of atomic force microscope to characterize nanomechanical properties Suykens, J.A., Vandewalle, J., 1999. Least squares support vector machine classifiers.
of organic matter of the Bakken Shale. Fuel 233, 894–910. https://doi.org/10.1016/ Neural Process. Lett. 9, 293–300.
j.fuel.2018.06.021. Tang, D., Spikes, K., 2017. Segmentation of shale SEM images using machine learning. In:
Li, H., Misra, S., 2019. Long short-term memory and variational autoencoder with SEG Technical Program Expanded Abstracts 2017. Presented at the SEG Technical
convolutional neural networks for generating NMR T2 distributions. Geosci. Rem. Program Expanded Abstracts 2017. Society of Exploration Geophysicists, Houston,
Sens. Lett. IEEE 16, 192–195. https://doi.org/10.1109/LGRS.2018.2872356. Texas, pp. 3898–3902. https://doi.org/10.1190/segam2017-17738502.1.
Liaw, A., Wiener, M., others, 2002. Classification and regression by randomForest. Tang, Y., Zhang, Y.-Q., Chawla, N.V., Krasser, S., 2008. SVMs modeling for highly
R. News 2, 18–22. imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39,
Luo, X., 2019. Ensemble-based kernel learning for a class of data assimilation problems 281–288.
with imperfect forward simulators. PloS One 14, 1–40. https://doi.org/10.1371/ Temirchev, P., Gubanova, A., Kostoev, R., Gryzlov, A., Voloskov, D., Koroteev, D.,
journal.pone.0219247. Simonov, M., Akhmetov, A., Margarit, A., Ershov, A., 2019. Reduced order reservoir
Marmo, R., Amodio, S., Tagliaferri, R., Ferreri, V., Longo, G., 2005. Textural simulation with neural-network based hybrid model. In: Presented at the SPE
identification of carbonate rocks by image processing and neural network: Russian Petroleum Technology Conference. Society of Petroleum Engineers. https://
methodology proposal and examples. Comput. Geosci. 31, 649–659. https://doi.org/ doi.org/10.2118/196864-MS.
10.1016/j.cageo.2004.11.016. Temirchev, P., Simonov, M., Kostoev, R., Burnaev, E., Oseledets, I., Akhmetov, A.,
Miao, X., Wang, J., Wang, Z., Sui, Q., Gao, Y., Jiang, P., 2019. Automatic recognition of Margarit, A., Sitnikov, A., Koroteev, D., 2020. Deep neural networks predicting oil
highway tunnel defects based on an improved U-net model. IEEE Sensor. J. 1–1 movement in a development unit. J. Petrol. Sci. Eng. 184, 106513. https://doi.org/
https://doi.org/10.1109/JSEN.2019.2934897. 10.1016/j.petrol.2019.106513.
Misra, S., Li, H., He, J., 2019. Machine Learning for Subsurface Characterization. Gulf Ulker, E., Sorgun, M., 2016. Comparison of computational intelligence models for
Professional Publishing. cuttings transport in horizontal and deviated wells. J. Petrol. Sci. Eng. 146, 832–837.
Müller, A.C., Guido, S., others, 2016. Introduction to Machine Learning with Python: a https://doi.org/10.1016/j.petrol.2016.07.022.
Guide for Data Scientists. O’Reilly Media, Inc. Wang, L., 2005. Support Vector Machines: Theory and Applications. Springer Science &
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Business Media.
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Wu, Y., Misra, S., Sondergeld, C., Curtis, M., Jernigen, J., 2019. Machine learning for
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine locating organic matter and pores in scanning electron microscopy images of
learning in Python. J. Mach. Learn. Res. 12, 2825–2830. organic-rich shales. Fuel 253, 662–676. https://doi.org/10.1016/j.
Pirrie, D., Butcher, A.R., Power, M.R., Gottlieb, P., Miller, G.L., 2004. Rapid quantitative fuel.2019.05.017.
mineral and phase analysis using automated scanning electron microscopy Zurada, J.M., 1992. Introduction to Artificial Neural Systems. West St. Paul.
(QemSCAN); potential applications in forensic geoscience. Geol. Soc. Lond. Spec.
Publ. 232, 123–136. https://doi.org/10.1144/GSL.SP.2004.232.01.12.

Nanoelectronics Devices: Design, Materials, and Applications (Part II)
From Everand
Nanoelectronics Devices: Design, Materials, and Applications (Part II)
Gopal Rawat
No ratings yet
Spry Documentation 2016 Part 3 Reporting
100% (1)
Spry Documentation 2016 Part 3 Reporting
40 pages
Wealth Creation Principles: The Measure of Quality
100% (1)
Wealth Creation Principles: The Measure of Quality
20 pages
Standardization and The Use of Standards in Additive Manufacturing
100% (1)
Standardization and The Use of Standards in Additive Manufacturing
7 pages
Mineral Characterization Using Scanning Electron M
No ratings yet
Mineral Characterization Using Scanning Electron M
35 pages
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Pszonka Schulz SEM AM MinResourc 2022
No ratings yet
Pszonka Schulz SEM AM MinResourc 2022
28 pages
MAD347 - Lecture 4 - Equipment1
No ratings yet
MAD347 - Lecture 4 - Equipment1
57 pages
Leveraging Machine Learning For Lithology Discrimination
No ratings yet
Leveraging Machine Learning For Lithology Discrimination
18 pages
Applsci 13 12600 v2
No ratings yet
Applsci 13 12600 v2
33 pages
Journal of Materials Science & Technology
No ratings yet
Journal of Materials Science & Technology
14 pages
Aluminium Additive Manufacturing Applications in Aviation
No ratings yet
Aluminium Additive Manufacturing Applications in Aviation
9 pages
Minerals 12 00060 v3
No ratings yet
Minerals 12 00060 v3
21 pages
Reproductive System by AI
No ratings yet
Reproductive System by AI
24 pages
Enhancing Mineral Processing With Deep Learning Automated Quartz Identification Using Thin Section Images
No ratings yet
Enhancing Mineral Processing With Deep Learning Automated Quartz Identification Using Thin Section Images
17 pages
Minerals 12 00455 v2
No ratings yet
Minerals 12 00455 v2
18 pages
Accurate Quantitative Analysis of Clay and Other M
No ratings yet
Accurate Quantitative Analysis of Clay and Other M
13 pages
A New Quantitative Approach For Element-Mineral de
No ratings yet
A New Quantitative Approach For Element-Mineral de
15 pages
1 s2.0 S0098300419304686 Main
No ratings yet
1 s2.0 S0098300419304686 Main
14 pages
2021 Edxia Microstructure Characterisation From Quantified CCR
No ratings yet
2021 Edxia Microstructure Characterisation From Quantified CCR
17 pages
Remotesensing 12 01218 v2
No ratings yet
Remotesensing 12 01218 v2
23 pages
Machin Well Log
No ratings yet
Machin Well Log
15 pages
Resolving Three Dimensional and Subsurfi
No ratings yet
Resolving Three Dimensional and Subsurfi
20 pages
Electro Facies Classification Based On Core and Well Log Data
No ratings yet
Electro Facies Classification Based On Core and Well Log Data
19 pages
Cecilia Contreras Acosta 2019
No ratings yet
Cecilia Contreras Acosta 2019
14 pages
Classification of Rock Facies Using Deep Convoluti
No ratings yet
Classification of Rock Facies Using Deep Convoluti
18 pages
Andaru
No ratings yet
Andaru
11 pages
Pap122s2 File2
No ratings yet
Pap122s2 File2
19 pages
Critique Paper - MET 112
No ratings yet
Critique Paper - MET 112
12 pages
Qian Li (2021) - Prediction of Rock Abrasivity and Hardness From Mineral Composition
No ratings yet
Qian Li (2021) - Prediction of Rock Abrasivity and Hardness From Mineral Composition
13 pages
1 s2.0 S0098300413003002 Main
No ratings yet
1 s2.0 S0098300413003002 Main
9 pages
Quantitative Evaluation of Mineral Grains Using Automated SEMeEDS Analysis and Its Application Potential in Optically Stimulated Luminescence Dating
No ratings yet
Quantitative Evaluation of Mineral Grains Using Automated SEMeEDS Analysis and Its Application Potential in Optically Stimulated Luminescence Dating
11 pages
Airborne Data
No ratings yet
Airborne Data
9 pages
Paper Inter
No ratings yet
Paper Inter
17 pages
Grain Segmentation in Sandstone Thin-Section Based
No ratings yet
Grain Segmentation in Sandstone Thin-Section Based
11 pages
Geometallurgy and Automated Mineralogy A Tool Ore-Deposit Eval Predict Process
No ratings yet
Geometallurgy and Automated Mineralogy A Tool Ore-Deposit Eval Predict Process
11 pages
1 s2.0 S0098300419309239 Main
No ratings yet
1 s2.0 S0098300419309239 Main
13 pages
Predicting Mineralogy by Integrating Core and Well Log Data Using A Deep
No ratings yet
Predicting Mineralogy by Integrating Core and Well Log Data Using A Deep
12 pages
2019 Wu Et al-Shales-Machine Learning-Organic matter-pores-SEM
No ratings yet
2019 Wu Et al-Shales-Machine Learning-Organic matter-pores-SEM
15 pages
2020 Chen Et al-Shale-Deep learning-SEM
No ratings yet
2020 Chen Et al-Shale-Deep learning-SEM
10 pages
Detection of Economic Minerals in Beach Placer Samples by Machine Learning-Based Microscopic Image Processing
No ratings yet
Detection of Economic Minerals in Beach Placer Samples by Machine Learning-Based Microscopic Image Processing
4 pages
J Cageo 2014 07 006
No ratings yet
J Cageo 2014 07 006
16 pages
Computers and Geosciences: Rafael Pires de Lima, David Duarte, Charles Nicholson, Roger Slatt, Kurt J. Marfurt
No ratings yet
Computers and Geosciences: Rafael Pires de Lima, David Duarte, Charles Nicholson, Roger Slatt, Kurt J. Marfurt
11 pages
Clay Minerals From The Perspective of Oil and Gas Exploration
No ratings yet
Clay Minerals From The Perspective of Oil and Gas Exploration
18 pages
Minerals: An Enhanced Rock Mineral Recognition Method Integrating A Deep Learning Model and Clustering Algorithm
No ratings yet
Minerals: An Enhanced Rock Mineral Recognition Method Integrating A Deep Learning Model and Clustering Algorithm
17 pages
RG Cross Disciplinary Machinelearning MAIN
No ratings yet
RG Cross Disciplinary Machinelearning MAIN
21 pages
Paper SQP Dan SQs
No ratings yet
Paper SQP Dan SQs
8 pages
39 - Egana Ortiz Final Paper
No ratings yet
39 - Egana Ortiz Final Paper
8 pages
MID TERM Medicine Recommended System Report
No ratings yet
MID TERM Medicine Recommended System Report
43 pages
X-Ray Diffraction and The Identification and Analysis of Clay Minerals
No ratings yet
X-Ray Diffraction and The Identification and Analysis of Clay Minerals
2 pages
X-Ray Diffraction and The Identification and Analy
No ratings yet
X-Ray Diffraction and The Identification and Analy
2 pages
CHJV04I04P0164
No ratings yet
CHJV04I04P0164
11 pages
Critical Review Paper Sample
No ratings yet
Critical Review Paper Sample
3 pages
IPTC-18377-MS Mitigating Shale Drilling Problems Through Comprehensive Understanding of Shale Formations
No ratings yet
IPTC-18377-MS Mitigating Shale Drilling Problems Through Comprehensive Understanding of Shale Formations
10 pages
Baklanova Baklanov
No ratings yet
Baklanova Baklanov
6 pages
Machine Learning Algorithmsfor Predictionofmobilephone Price
No ratings yet
Machine Learning Algorithmsfor Predictionofmobilephone Price
9 pages
MineralMineral Identification Using Color Spaces and Artificial Neural Networks Identification Using Color Spaces and Artificial Neural Networks
No ratings yet
MineralMineral Identification Using Color Spaces and Artificial Neural Networks Identification Using Color Spaces and Artificial Neural Networks
7 pages
AI Powered IDS
No ratings yet
AI Powered IDS
6 pages
Applicationofmachinelearninginoilandgas 170522165748 PDF
No ratings yet
Applicationofmachinelearninginoilandgas 170522165748 PDF
10 pages
Porosity Prediction Supervised-Learning of Thermal History For Direct Laser Deposition
No ratings yet
Porosity Prediction Supervised-Learning of Thermal History For Direct Laser Deposition
14 pages
NDX Euzen PDF
No ratings yet
NDX Euzen PDF
4 pages
NDX Euzen PDF
No ratings yet
NDX Euzen PDF
4 pages
ZN-PB Image Jmmce
No ratings yet
ZN-PB Image Jmmce
9 pages
To The Sem/Eds or "Every Composition Tells A Story" John T. Cheney and Peter D. Crowley
No ratings yet
To The Sem/Eds or "Every Composition Tells A Story" John T. Cheney and Peter D. Crowley
4 pages
Geomodel Tips PDF
No ratings yet
Geomodel Tips PDF
4 pages
NDX Euzen PDF
No ratings yet
NDX Euzen PDF
4 pages
Deep Learning Techniques For Lung Cancer Recogniti
No ratings yet
Deep Learning Techniques For Lung Cancer Recogniti
7 pages
Facies Classification
No ratings yet
Facies Classification
1 page
Well Log Cluster Analysis and Electrofacies Classification: A Probabilistic Approach For Integrating Log With Mineralogical Data
No ratings yet
Well Log Cluster Analysis and Electrofacies Classification: A Probabilistic Approach For Integrating Log With Mineralogical Data
4 pages
Seminar Shivani
No ratings yet
Seminar Shivani
58 pages
Fruit Classification Draft
No ratings yet
Fruit Classification Draft
41 pages
Machine Learning Facilitated Multiscale Imaging For Energy Materials
No ratings yet
Machine Learning Facilitated Multiscale Imaging For Energy Materials
32 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
TauFactor Manual
No ratings yet
TauFactor Manual
11 pages
A Comprehensive Survey of Generative Adversarial Networks GANs in Cybersecurity Intrusion Detection - 2023
No ratings yet
A Comprehensive Survey of Generative Adversarial Networks GANs in Cybersecurity Intrusion Detection - 2023
24 pages
Er VLDB2012 PDF
No ratings yet
Er VLDB2012 PDF
179 pages
Skin Lesion Detection and Classification
No ratings yet
Skin Lesion Detection and Classification
40 pages
A Study On The Recycling of Aluminium Alloy 7075 Scrap
No ratings yet
A Study On The Recycling of Aluminium Alloy 7075 Scrap
4 pages
Zero-Day Network Intrusion Detection Using Machine Learning Approach
No ratings yet
Zero-Day Network Intrusion Detection Using Machine Learning Approach
9 pages
Anomaly and Fraud Detection in Credit Card Transactions Using The ARIMA Model
No ratings yet
Anomaly and Fraud Detection in Credit Card Transactions Using The ARIMA Model
11 pages
Skin Detect2
No ratings yet
Skin Detect2
11 pages
Cse400 p2 Group8 (PDF) - 1
No ratings yet
Cse400 p2 Group8 (PDF) - 1
21 pages
Unit Iv - Irt
No ratings yet
Unit Iv - Irt
62 pages
Machine Learning QB
No ratings yet
Machine Learning QB
2 pages
Unit 9 - Classification & Clustering
No ratings yet
Unit 9 - Classification & Clustering
34 pages
Machine Learnig - Mini Project
No ratings yet
Machine Learnig - Mini Project
5 pages
Yang Et Al. - 2023 - Real-Time Detection of Crop Rows in Maize Fields Based On Autonomous Extraction of ROI
No ratings yet
Yang Et Al. - 2023 - Real-Time Detection of Crop Rows in Maize Fields Based On Autonomous Extraction of ROI
16 pages
Article
No ratings yet
Article
8 pages
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
No ratings yet
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
8 pages
Network Intrusion Detection System Using Single Level Multi-Model Decision Trees
No ratings yet
Network Intrusion Detection System Using Single Level Multi-Model Decision Trees
27 pages
Wearable Smart Rings For Multifinger Gesture Recognition Using Supervised Learning
No ratings yet
Wearable Smart Rings For Multifinger Gesture Recognition Using Supervised Learning
12 pages
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
No ratings yet
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
8 pages
Machine Learning Engineer Nanodegree: Supervised Learning Project: Finding Donors For Charityml
No ratings yet
Machine Learning Engineer Nanodegree: Supervised Learning Project: Finding Donors For Charityml
18 pages
Identify The Parts of Automobile (Object Identification) On Raspberry Pi (LIT2021032)
No ratings yet
Identify The Parts of Automobile (Object Identification) On Raspberry Pi (LIT2021032)
6 pages
Automatic Image Annotation Using Modified Multi-Label Dictionary Learning
No ratings yet
Automatic Image Annotation Using Modified Multi-Label Dictionary Learning
8 pages
Narrative Dataset: Towards Goal-Driven Narrative Generation: Karen Stephen Rishabh Sheoran Satoshi Yamazaki
No ratings yet
Narrative Dataset: Towards Goal-Driven Narrative Generation: Karen Stephen Rishabh Sheoran Satoshi Yamazaki
6 pages

Application of Machine Learning Techniques in Mineral Classification

Uploaded by

Application of Machine Learning Techniques in Mineral Classification

Uploaded by

Journal of Petroleum Science and Engineering 200 (2021) 108178

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

Application of Machine Learning Techniques in Mineral Classification for

approach in complicated image segmentation. The application of CNN

Fig. 2. Element intensity images for Sample 1. Intensity is shown in grayscale.

2.2. Data collection

2.2.1. Element images acquisition

Fig. 6. Correlation matrix for input variables of machine learning models.

Fig. 7. Illustration of logistic regression.

Where TP = True positive prediction, FP= False positive prediction, ( )

Fig. 8. Illustration of the random forest for segmenting mineral phases.

Fig. 9. A schematic of U-Net architecture for segmentation.

Additionally, to using the above shallow models, the U-Net, which is

F1- F1- F1-

Machine learning Logistic 0.8705 0.8691 0.8691

ANN 0.8869 0.8873 0.8873

training dataset, while n_estimator = 100 provided the best performing in

4.1.4. Hidden layers in ANN

4.2. Comparison of the prediction results

Fig. 14. Comparison of performance of Random Forest in the situation of 10%,

due to the negative effect of imbalanced dataset (Miao et al., 2019). As

4.3. Sensitivity analysis of the machine learning models

4.3.1. Sensitivity of the dataset size

Table 3 In this section, an example of applying the trained RF classifier and

You might also like