Enhancing Mineral Processing With Deep Learning Automated Quartz Identification Using Thin Section Images
Enhancing Mineral Processing With Deep Learning Automated Quartz Identification Using Thin Section Images
Si-wei Wu, Jian Yang, and Guang-ming Cao, Prediction of the Charpy V-notch impact energy of low carbon steel using a
shallow neural network and deep learning, Int. J. Miner. Metall. Mater. , 28(2021), No. 8, pp. 1309-1320.
https://doi.org/10.1007/s12613-020-2168-z
Zheng-hua Deng, Hai-qing Yin, Xue Jiang, Cong Zhang, Guo-fei Zhang, Bin Xu, Guo-qiang Yang, Tong Zhang, Mao Wu, and
Xuan-hui Qu, Machine-learning-assisted prediction of the mechanical properties of Cu–Al alloy, Int. J. Miner. Metall. Mater.,
27(2020), No. 3, pp. 362-373. https://doi.org/10.1007/s12613-019-1894-6
Xiao-liang Zhang, Jue Kou, Chun-bao Sun, Rui-yang Zhang, Min Su, and Shuo-fu Li, Mineralogical characterization of copper
sulfide tailings using automated mineral liberation analysis: A case study of the Chambishi Copper Mine tailings, Int. J. Miner.
Metall. Mater., 28(2021), No. 6, pp. 944-955. https://doi.org/10.1007/s12613-020-2093-1
Bao-hua Yang, Ai-xiang Wu, Guillermo A. Narsilio, Xiu-xiu Miao, and Shu-yue Wu, Use of high-resolution X-ray computed
tomography and 3D image analysis to quantify mineral dissemination and pore space in oxide copper ore particles, Int. J. Miner.
Metall. Mater., 24(2017), No. 9, pp. 965-973. https://doi.org/10.1007/s12613-017-1484-4
Sajjad Sattari and Amir Atrian, Effects of the deep rolling process on the surface roughness and properties of an Al-3vol%SiC
nanoparticle nanocomposite fabricated by mechanical milling and hot extrusion, Int. J. Miner. Metall. Mater., 24(2017), No. 7,
pp. 814-825. https://doi.org/10.1007/s12613-017-1465-7
Peng Xing, Bao-zhong Ma, Peng Zeng, Cheng-yan Wang, Ling Wang, Yong-lu Zhang, Yong-qiang Chen, Shuo Wang, and Qiu-
yin Wang, Deep cleaning of a metallurgical zinc leaching residue and recovery of valuable metals, Int. J. Miner. Metall. Mater.,
24(2017), No. 11, pp. 1217-1227. https://doi.org/10.1007/s12613-017-1514-2
Abstract: The precise identification of quartz minerals is crucial in mineralogy and geology due to their widespread occurrence and in-
dustrial significance. Traditional methods of quartz identification in thin sections are labor-intensive and require significant expertise, of-
ten complicated by the coexistence of other minerals. This study presents a novel approach leveraging deep learning techniques combined
with hyperspectral imaging to automate the identification process of quartz minerals. The utilizing of four advanced deep learning mod-
els—PSPNet, U-Net, FPN, and LinkNet—has significant advancements in efficiency and accuracy. Among these models, PSPNet exhib-
ited superior performance, achieving the highest intersection over union (IoU) scores and demonstrating exceptional reliability in seg-
menting quartz minerals, even in complex scenarios. The study involved a comprehensive dataset of 120 thin sections, encompassing
2470 hyperspectral images prepared from 20 rock samples. Expert-reviewed masks were used for model training, ensuring robust seg-
mentation results. This automated approach not only expedites the recognition process but also enhances reliability, providing a valuable
tool for geologists and advancing the field of mineralogical analysis.
Keywords: quartz mineral identification; deep learning; hyperspectral imaging; deep learning in geology
SAM has been applied to hyperspectral mineral identific- portance in the fields of mineralogy and geology. This sec-
ation but has limitations in performance when identifying tion presents the deep learning methodology and data util-
fine-grained minerals in complex datasets [16]. Agrawal ized for the detection of cross-sectional areas of quartz min-
et al. [3] applied random forest and support vector machines erals. Specifically, it involves four models: PSPNet, U-Net,
for mineral identification using hyperspectral data and noted FPN, and LinkNet. All processing is conducted using Python
that these methods, while effective for certain minerals, with numpy [25], and visualization is achieved through Mat-
struggle with generalization across diverse mineral types. In- plotlib [26]. Additionally, well-known libraries such as Ker-
cluding a comparison with these approaches could further as–Tensorflow (deep learning) [27] are employed for the im-
emphasize the robustness of deep learning, especially for plementation.
quartz identification. PCA has been a common method for
2.1. Preparation of geological thin sections and identific-
dimensionality reduction in hyperspectral data analysis, but it
ation of quartz minerals
often sacrifices the granularity needed for precise mineral
identification [17]. The production of geological thin sections and the accur-
In recent years, the advancement of deep learning tech- ate identification of minerals within them are of paramount
niques and hyperspectral imaging technology have presented importance in the fields of mineralogy and geology. In this
new opportunities for the automatic recognition of quartz article, we will delve into the process of creating geological
minerals [14,17]. Hyperspectral images can measure the re- thin sections, the methods for mineral identification using a
flections of objects across numerous spectral bands, provid- light microscope, and a comparative analysis of these pro-
ing a rich source for mineral recognition. Deep learning cesses. The production of geological thin sections com-
serves as an effective approach for analyzing this vast dataset mences with the collection of samples from a specific geolo-
and recognizing minerals [5,18–21]. Consequently, a deep gical site. These samples are then prepared in a laboratory
learning-based approach for the automatic identification of setting for subsequent examination. Rocks are first cut into
quartz minerals offers an alternative to traditional methods specific dimensions and then sliced into thin layers. These
that require both time and expertise.
thin sections are mounted onto prepared slides for optical
This study introduces a deep learning-based approach de-
analysis.
veloped to enhance the automatic recognition of quartz min-
Geological thin sections are examined by using a light mi-
erals from hyperspectral images. This approach leverages the
croscope, a vital tool for observing the optical properties of
advantages of using hyperspectral images and expedites the
minerals. The identification of quartz minerals is particularly
recognition of quartz minerals. Additionally, experimental
intriguing. Quartz is characteristically transparent and is not
results will be presented to evaluate the accuracy and reliabil-
confined to a specific color. However, when viewed under
ity of this method [1,22–23]. Regarding the contributions of
specific light polarizations, quartz exhibits distinct shapes
this study, foremost, it has the potential to expedite the auto-
and colors. This feature serves as a key criterion for distin-
matic recognition of quartz minerals, enabling geologists to
guishing quartz from other minerals when using a light mi-
access more data in less time. Furthermore, the accuracy and
croscope.
reliability of this deep learning-based approach surpass tradi-
Nevertheless, quartz is often found intermingled with oth-
tional methods. Therefore, this study can be considered a sig-
er minerals, complicating the identification process. There-
nificant step in the recognition of quartz minerals [15–16,24].
fore, geologists must consider the presence of other minerals
By integrating four advanced semantic segmentation mod-
when making identifications. Another challenge faced
els—PSPNet, U-Net, Feature Pyramid Network (FPN), and
by geologists is the time-consuming nature of traditional
LinkNet—we systematically analyze and compare their per-
formance in accurately recognizing quartz minerals. The pre- methods in making accurate mineral identifications. The
paration of geological thin sections, the microscopic examin- manual examination of each part of a thin section and the
ation, and the data processing techniques are meticulously manual identification of minerals require significant time and
detailed, ensuring a robust foundation for the deep learning expertise.
models. Extensive experimental results are presented, high- In conclusion, the preparation of geological thin sections
lighting the superior accuracy and reliability of these models and the process of mineral identification using a light micro-
in segmenting quartz from complex geological samples. scope hold great significance in the fields of mineralogy and
This research fills a critical gap in the literature by demon- geology. These processes are fundamental tools for examin-
strating how cutting-edge technologies can be creatively ap- ing and accurately identifying the structures of rocks and
plied to overcome the limitations of traditional methods. By minerals. While quartz’s distinct color and shape can be
offering a scalable, accurate, and automated solution to min- readily identified by using a light microscope, the potential
eral identification, the study paves the way for future ad- for intermingling with other minerals necessitates careful
vancements in geological research and automated analytical consideration. Additionally, the time-intensive nature of
techniques. these processes has spurred the exploration of automated
identification methods, such as deep learning.
2. Methodology 2.1.1. Preparation of rock thin sections
For the identification of quartz minerals to be taught to the
The identification of quartz minerals holds significant im- deep learning system, 120 thin sections were prepared from
G. Külekçi et al., Enhancing mineral processing with deep learning: automated quartz identification using thin ... 3
C1.1 C1.2 C1.3 C1.4 C1.5 C1.6 C1.7C C1.7T C10.4C C10.4T C10.5C C10.5T C10.6C C10.6T C10.7C C10.7T
C9.2 C9.3 C9.4 C9.5 C9.6 C9.7 C9.8C C9.8T C16.4T C16.5C C16.5T C16.6C C16.6T C16.7C C16.7T C16.8C
C1.8C C2.1 PLJ C1.8T C2.1T PLJ C2.2C C2.2T C2.3C C2.3T C11.1C C11.1T C11.2C C11.2T C11.3C C11.3T C11.4C C11.4T
C9.9C C9.9T C10.1C C10.1T C10.2C C10.2T C10.3C C10.3T C16.8T C16.9C C16.9T C16.10C C16.10T C17.1C C17.1T C17.2C
Fig. 3. Some of the photographs containing quartz taken from thin sections.
(a)
(b)
Fig. 4. Some of the photographs (256 px × 192 px) containing quartz for (a) images and (b) masks.
3. Semantic segmentation models ation outcomes. LinkNet is similar with U-Net but utilizes re-
sidual blocks in its encoder and decoder. FPN, akin to U-Net,
In recent years, deep learning (DL) techniques have in- uses a 1×1 convolution layer and combines features differ-
creasingly been applied to image segmentation tasks, with se- ently. PSPNet incorporates a pyramid pooling module for
mantic segmentation becoming crucial in fields such as med- global context aggregation and an auxiliary loss [33]. Vari-
ical imaging and disaster management. The objective of se- ous encoders were employed for feature extraction, chosen
mantic segmentation is to label each pixel in an image ac- based on their performance in prior studies and suitability for
cording to the object class it belongs to. This task is particu- this task, including different variations of networks [34].
larly challenging due to variations in object shapes, sizes, ori-
3.1. PSPNet
entations, and the potential for low-quality or occluded im-
ages in disaster scenarios. PSPNet [29] is a deep learning architecture designed for
To address these challenges, several DL architectures semantic segmentation that excels in capturing context in-
have been developed [28]. This study employs four encoder– formation at various scales. The key feature of PSPNet is its
decoder-based semantic segmentation models (SSMs) for pyramid pooling module, which performs pooling operations
segmenting collapsed buildings post-earthquake: PSPNet at four different scales to enhance the global representation of
(pyramid scene parsing network) [29], U-Net (u-shaped net- features. As shown in Fig. 5, the pyramid pooling module
work) [30], FPN (feature pyramid network) [31], and processes feature maps at different scales, capturing global
LinkNet (link network) [32]. U-Net features an encoder–de- context information effectively. In PSPNet, the final feature
coder structure with skip connections that improve segment- map P is defined by Eq. (1):
G. Külekçi et al., Enhancing mineral processing with deep learning: automated quartz identification using thin ... 5
CONV
CONV
UPSAMPLE
CONV
CNN POOL CONV
CONV
CONCAT
(a) Input image (b) Feature map (c) Pyramid pooling module (d) Final prediction
Fig. 5. Architecture overview of PSPNet.
( )
P (x) = Concat Up (x1) , Up (x2) , Up (x3) , Up (x4) , x (1) Here, k indexes the convolutional layer, with k = 1, 2, . . . , n .
where x denotes the original feature map, x1 , x2 , x3 , and x4 Fk = ReLU (Zk ∗ X + bk ) (2)
are the feature maps pooled at varying scales, Up () is the up- where Zk are the convolutional filters, bk is the bias, and *
sampling function, and Concat () is the concatenation func- denotes convolution. Max pooling reduces dimensions by a
tion. This design allows PSPNet to integrate contextual in-
factor of 2. At the bottleneck, the feature maps are processed
formation effectively, making it highly suitable for complex
as Eq. (3).
scene parsing tasks.
In Fig. 5, CNN represents a convolutional neural network Fn = Conv (Fn−1 ) (3)
used for initial feature extraction, CONV denotes a convolu- The expanding path upscales the feature maps using trans-
tion operation applied to refine feature maps, POOL stands posed convolution:
for pooling, which reduces spatial dimensions to capture
F ′ = Convtranspose (Fn ) (4)
multi-scale features, and CONCAT refers to the concatena-
tion operation that combines upsampled feature maps from Followed by concatenation with corresponding feature
different pooling scales into a unified representation. Consid- maps from the contracting path:
ering Fig. 5, input images are typically greater than Fconcat = concat (F ′ , Fk ) (5)
(256,256). Using transfer learning and dilated convolutions, The final segmentation output is computed as:
the network constructs feature maps. Smaller kernels gather
information over larger areas, with the number of feature Y = Softmax (Conv (Fconcat )) (6)
maps N as a tunable hyperparameter. The pyramid pooling The architecture ensures that both high-level features and
module performs average pooling at scales such as global av- spatial context are preserved, allowing for precise segmenta-
erage pooling and (2 × 2) to segment varying object sizes. tion of complex images.
For instance, N = 512 maps and n = 4 pooling sizes yield N / Crucial to U-Net’s effectiveness are its skip connections,
n = 128 feature maps per level. Module B contains three lay- which concatenate feature maps from the encoder directly to
ers of residual blocks, outputting 256 feature maps, Module the decoder, enhancing feature propagation and enabling pre-
C implements pooling to reduce pooled feature maps to 64, cise pixel classification. This architecture is particularly ad-
totaling 512 maps and Module D, a convolution layer, out- ept at capturing detailed spatial hierarchies necessary for
puts maps sized (256,256,3), flattened to 196608 for further high-accuracy segmentation, making it ideal for tasks requir-
processing. ing detailed localization such as medical imaging. Fig. 6 de-
picts the U-Net structure, consisting of two feature encoding
3.2. U-Net
and decoding steps.
U-Net is a powerful convolutional neural network de-
3.3. FPN
signed for biomedical image segmentation, characterized by
its unique architecture that combines a contracting path for FPN shown in Fig. 7 is a robust architecture for semantic
feature extraction and an expansive path for precise localiza- segmentation that enhances multi-scale feature learning. It
tion. The model operates on an input image and progress- builds on a backbone network, such as ResNet, by construct-
ively reduces its dimensionality while capturing context ing a pyramid of feature maps at various scales. The key
through convolutional layers and max pooling. At the bottle- strength of FPN lies in its top-down pathway, where high-
neck, the network maintains critical information, and during level semantic features from deeper layers are upsampled and
the expansive phase, it upscales and concatenates feature combined with corresponding lower-level features through
maps from the contracting path. This architecture allows U- lateral connections. This fusion of high-resolution spatial fea-
Net to produce high-quality segmentation results, effectively tures with semantic-rich layers allows FPN to perform accur-
distinguishing fine details in complex images [30]. ate segmentation, particularly for detecting objects of vari-
Let X ∈ RH×W×C, where H and W are height and width, re- ous sizes across an image [31].
spectively, and C is the number of channels. The contracting FPN is a widely used architecture for multi-scale feature
path consists of n convolutional layers defined as Eq. (2). extraction, particularly in object detection and semantic seg-
6 Int. J. Miner. Metall. Mater.
16 16 32 16 16
Conv 3 × 3, Relu
Concatenate
32 32 Max pool 2 × 2 64 32
256 px × 192 px Up-conv 2 × 2
64 × 64
Conv 1 × 1
32 × 32 64 64 128 64
32 × 32
128 256 128
16 × 16
16 × 16
256
8×8
Predict
Lateral
connection 2 × up
1 × 1 conv +
tion on the feature map at level k. This ensures the dimen- Each level consists of convolution, batch normalization,
sional consistency before merging feature maps. The final and activation operations. For the decoder, the feature maps
feature maps are processed to predict pixel-wise segmenta- from the encoder are gradually upsampled and combined via
tion. Each P′k is either upsampled to match the dimensions of skip connections. The decoder maps are denoted as:
P2 , concatenated, and passed through a final convolutional
Dk = Upsample(E k+1 ) + Ek (14)
layer for segmentation.
Let’s assume we’re classifying each pixel into Nclass cat- where the feature map from the next level Ek+1 is upsampled
egories. The final prediction map would have dimensions and added to Ek , ensuring spatial detail preservation.
RH×W×N . The softmax activation function is applied to pro-
class The final output O ∈ RH×W×C is achieved by refining the
out
duce probability distributions across all classes for each pixel. decoder’s output through convolutional layers to predict seg-
The model is typically trained using a loss function such as mentation masks. Where, O represents the final output fea-
pixel-wise cross-entropy: ture map or segmentation mask produced by the network,
Cout refers to the number of segmentation classes. LinkNet is
1 ∑∑ ( )
N C
L=− yi,c lg ŷi,c (12) optimized for computational efficiency, allowing real-time
N i=1 c=1 segmentation with fewer parameters than traditional en-
where yi,c is the true label for pixel i and class c , and ŷi,c is coder–decoder networks like U-Net.
the predicted probability. 3.5. Evaluation metrics
By incorporating these multi-scale features and top-down
pathway refinement, FPN achieves better segmentation per- In image segmentation, evaluating the performance of
formance, especially for detecting objects across different models is crucial, and several metrics are commonly utilized
scales. for this purpose: accuracy, loss, specificity, sensitivity, preci-
sion, recall, F1 score, intersection over union (IoU), dice
3.4. LinkNet coefficient and area under the curve (AUC). These metrics
LinkNet is designed for efficient semantic segmentation assess the effectiveness of segmentation models in distin-
using a streamlined encoder–decoder architecture. Its defin- guishing between distinct image regions or objects, provid-
ing feature is the direct linkage between each encoder and de- ing insights into the quality of segmentation.
coder block through shortcut connections, which facilitate the Accuracy is the most intuitive performance measure, and
retention and restoration of spatial and feature information it is simply a ratio of correctly predicted observations to the
lost during down-sampling. As shown in Fig. 8, it employs an total observations. It is suitable for binary and multiclass
encoder–decoder framework, uniquely integrated with link classification problems. Accuracy is defined as:
connections that facilitate the flow of feature maps from the TP + TN
encoder directly to the decoder. Accuracy = (15)
TP + TN + FP + FN
Let I ∈ RH×W×C be the input image. The encoder gener- where TP, TN, FP, and FN are true positive rate, true negat-
ates feature maps Ek , where k represents the level:
ive rate, false positive rate, and false negative rate, respect-
E k = fk (13) ively.
Specificity, also known as the true negative rate, meas-
[224, 224, 3] ures the proportion of actual negatives that are correctly iden-
Input layer
tified as such (e.g., the percentage of healthy people who are
Encoder block
[112, 112, 64] correctly identified as not having the condition):
TN
Specificity = (16)
Downsampling
[7, 7, 512]
diagnosed):
+ + + + + TP
[7, 7, 512] Sensitivity = (17)
TP + FN
[14, 14, 256] Precision measures the model’s accuracy in identifying
Upsampling
ing a balanced measure of a model’s performance, particu- The IoU scores are impressive, with training IoU close to 0.8
larly useful in scenarios with imbalanced datasets: and validation IoU also around 0.75, indicating effective seg-
Precision × Recall mentation performance. The Dice coefficient curves show
F1 = Dice coefficient = 2 × =
Precision + Recall similar trends, with training reaching 0.85 and validation
2TP around 0.8, further supporting the model’s robust perform-
(20)
2TP + FP + FN ance.
IoU quantifies the overlap between the predicted and The performance metrics for U-Net are illustrated in four
ground truth regions, serving as a measure of similarity: plots in Fig.10. The accuracy for both training and validation
TP sets improves steadily over the epochs, with the training ac-
IoU = (21)
TP + FP + FN curacy approaching 0.98 and validation accuracy stabilizing
Lastly, AUC measures the overall performance of binary around 0.85. This indicates a strong learning capability with
classifiers across various thresholds by plotting the true pos- some overfitting. The training loss decreases significantly,
itive rate against the false positive rate, providing a compre- stabilizing around 0.2, while the validation loss levels off
hensive assessment of classifier effectiveness. around 0.75. The discrepancy between training and valida-
These metrics are integral to selecting the most appropri- tion loss suggests overfitting. The IoU score for training data
ate segmentation models for specific applications, as they approaches 0.9, whereas the validation IoU remains around
highlight different aspects of model performance, from ac- 0.64, indicating that while the model performs well on train-
curacy to the balance between precision and recall. ing data, it generalizes less effectively on unseen data. The
Dice coefficient follows a similar pattern, with training ex-
4. Results and discussion ceeding 0.9 and validation stabilizing around 0.75, further
highlighting the overfitting issue.
4.1. Performance analyses In Fig. 11, the FPN model exhibits steady learning pro-
The comparative analysis of the four models—PSPNet, gress but shows some overfitting, similar to U-Net. The train-
U-Net, FPN, and LinkNet—is shown in Figs. 9–12. In Fig. 9, ing accuracy approaches 0.98, while the validation accuracy
the performance metrics for PSPNet show its robustness stabilizes around 0.86. This significant gap suggests overfit-
across all measured parameters. Both training and validation ting. Training loss decreases to below 0.2, while validation
accuracy curves demonstrate strong performance, with train- loss levels off around 0.6, reinforcing the overfitting observa-
ing accuracy nearing 0.95 and validation accuracy slightly tion. The IoU for training data reaches about 0.95, but valida-
above 0.9. This suggests that PSPNet maintains high accur- tion IoU remains around 0.68, indicating less effective gener-
acy on both seen and unseen data. The training loss de- alization. The Dice coefficient for training nears 0.98, while
creases to about 0.3, and validation loss converges around validation remains around 0.78, indicating similar overfitting
0.4, indicating good generalization with minimal overfitting. issues.
0.80
Loss
0.75 1.0
0.70 0.8
0.65 Training 0.6
0.60 Validation
0.4
0.55
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
0.8 0.9
(c) (d)
0.7 0.8
Dice coefficient
0.6 0.7
IoU
0.5 0.6
0.4 0.5
Training Training
0.3 Validation 0.4 Validation
0.2
0.3
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
Fig. 9. The performance metrics for PSPNet: (a) accuracy; (b) loss; (c) IoU; (d) Dice coefficient.
G. Külekçi et al., Enhancing mineral processing with deep learning: automated quartz identification using thin ... 9
1.0
(a) (b)
0.9 2.0 Training
0.8 Validation
1.5
Accuracy
0.7
Loss
0.6 1.0
0.5
Training 0.5
0.4 Validation
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
Dice coefficient
0.8
0.7
IoU
0.6 0.7
0.5 0.6
0.4 Training Training
Validation 0.5 Validation
0.3
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
Fig. 10. The performance metrics for U-Net: (a) accuracy; (b) loss; (c) IoU; (d) Dice coefficient.
1.0
(a) (b)
1.75 Training
0.9 Validation
1.50
0.8 1.25
Accuracy
Loss
0.7 1.00
0.6 0.75
Training 0.50
0.5 Validation
0.25
0.4
0
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
1.0
0.9 (c) (d)
0.9
0.8
Dice coefficient
0.8
0.7
0.7
IoU
0.6
0.5 0.6
0.4 Training 0.5
Training
0.3 Validation Validation
0.4
0.2
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
Fig. 11. The performance metrics for FPN: (a) accuracy; (b) loss; (c) IoU; (d) dice coefficient.
In Fig. 12, LinkNet also shows a consistent performance ing better segmentation performance compared to U-Net and
with some overfitting, though it performs better in some as- FPN. The Dice coefficient for training approaches 0.95,
pects compared to FPN and U-Net. Training accuracy ap- while validation stabilizes around 0.78, suggesting reason-
proaches 0.95, while validation accuracy stabilizes around able generalization capability.
0.85, showing better generalization compared to FPN. Train- The comparative analysis of the four models—PSPNet,
ing loss decreases to about 0.3, while validation loss levels U-Net, FPN, and LinkNet—reveals that PSPNet consistently
off around 0.5, indicating moderate overfitting. Training IoU delivers the most robust performance across all metrics, in-
reaches about 0.90, with validation IoU around 0.65, show- dicating effective generalization and high accuracy. U-Net
10 Int. J. Miner. Metall. Mater.
1.0 (b)
(a)
2.5
0.9 Training
Validation
0.8 2.0
Accuracy
0.7
Loss
1.5
0.6
1.0
0.5 Training
Validation 0.5
0.4
0 20 40 60 80 100 0 20 40 60 80 100
Epochs Epochs
Dice coefficient
0.8
0.7
IoU
0.6 0.7
In conclusion, our comparative analysis underscores curate, shows slight discrepancies compared to PSPNet. This
PSPNet’s overall dominance across multiple metrics, estab- suggests that U-Net is effective but less precise in segment-
lishing it as the preferable model for high-stakes segmenta- ing quartz minerals under certain conditions. FPN recorded
tion tasks. However, the choice of model should still be an IoU score of 0.674, comparable to U-Net. The predicted
tailored to specific application needs, considering the trade- mask effectively delineates the quartz mineral, indicating
offs highlighted between sensitivity and precision among the FPN’s capability to perform accurate segmentation. LinkNet
models evaluated. displayed an IoU score of 0.672, slightly lower than FPN but
still demonstrating effective segmentation. The performance
4.3. Experimental results
is robust, although marginally less accurate than PSPNet.
Figs. 14–17 comprehensively compare the performance of PSPNet
four different segmentation models—PSPNet, U-Net, FPN, Fig. 15 shows that PSPNet delivered an exceptional IoU
and LinkNet—in detecting quartz minerals. IoU scores of score of 0.983, with the predicted mask nearly perfectly
these models were evaluated along with their prediction matching the actual mask. This underscores PSPNet’s super-
masks, real masks and segmentation images. The overall per- ior accuracy in quartz mineral segmentation. U-Net matched
formance of each model in different scenarios and important PSPNet with an IoU score of 0.983, indicating equally high
observations are presented below. performance in this specific example. This demonstrates U-
In Fig.14, PSPNet achieved the highest IoU score of Net’s potential to achieve precise segmentation in optimal
0.736. The predicted mask closely aligns with the actual conditions. FPN achieved a slightly lower IoU score of
mask, demonstrating high accuracy in segmenting the quartz 0.968, with minor inaccuracies in the predicted mask. Des-
mineral region. This result highlights PSPNet’s robustness in pite this, FPN shows strong performance in segmenting
accurately identifying mineral boundaries. U-Net attained an quartz minerals. LinkNet scored 0.974, demonstrating high
IoU score of 0.669. The predicted mask, while reasonably ac- accuracy in segmentation, comparable to PSPNet and U-Net.
Segmented image
Image Actual mask Predicted mask
IoU: 0.736
PSPNet
Segmented image
Predicted mask
IoU: 0.669
U-Net
Segmented image
Predicted mask
IoU: 0.674
FPN
Segmented image
Predicted mask
IoU: 0.672
LinkNet
Segmented image
Image Actual mask Predicted mask
IoU: 0.983
PSPNet
Segmented image
Predicted mask
IoU: 0.983
U-Net
Segmented image
Predicted mask
IoU: 0.968
FPN
Segmented image
Predicted mask
IoU: 0.974
LinkNet
This indicates LinkNet’s effectiveness in identifying mineral of 0.855, indicating effective segmentation with minor inac-
regions accurately. curacies. This result underscores FPN’s reliability in varying
PSPNet conditions. LinkNet recorded an IoU score of 0.780, demon-
In Fig. 16, PSPNet achieved an IoU score of 0.847, des- strating reasonable accuracy but lower performance com-
pite the generally high performance. The predicted mask pared to PSPNet and FPN. This suggests that while LinkNet
shows some inaccuracies, indicating challenges in segment- is effective, there is room for improvement.
ing under less ideal conditions. U-Net exhibited a signific- The comparative analysis of segmentation models for
antly lower IoU score of 0.435, struggling with accurate seg- quartz mineral identification reveals PSPNet as the most ro-
mentation in this instance. The results suggest that U-Net bust and reliable model, consistently delivering high IoU
may face difficulties in complex scenarios. FPN recorded an scores and accurate segmentation masks. U-Net, FPN, and
IoU score of 0.707, performing better than U-Net. The pre- LinkNet also show potential, particularly in high-perform-
dicted mask identifies the quartz region, although with not- ance examples, but face challenges in more complex scenari-
able inaccuracies. LinkNet displayed an IoU score of 0.690, os. Future work could explore hybrid approaches or model
indicating moderate performance. The predicted mask is less improvements to enhance segmentation accuracy across di-
accurate compared to PSPNet, reflecting the challenges in verse conditions.
low-performance scenarios. Fig. 18 presents examples where all four models—
In Fig. 17, PSPNet demonstrated a high IoU score of PSPNet, U-Net, FPN, and LinkNet—exhibited low perform-
0.907, accurately identifying the quartz mineral despite the ance, highlighting their inadequacies in challenging segment-
challenging conditions. This reaffirms PSPNet’s robustness ation tasks. Although generally demonstrating high perform-
in diverse scenarios. U-Net attained an IoU score of 0.836, ance, PSPNet achieved a low IoU score in this instance, fail-
showing good performance but with some inaccuracies in the ing to accurately segment the quartz mineral. This indicates
predicted mask. This indicates U-Net’s potential, though it that PSPNet can struggle under certain conditions. U-Net en-
may require further refinement. FPN achieved an IoU score countered significant difficulties with complex and low-con-
G. Külekçi et al., Enhancing mineral processing with deep learning: automated quartz identification using thin ... 13
Segmented image
Image Actual mask Predicted mask
IoU: 0.847
PSPNet
Segmented image
Predicted mask
IoU: 0.435
U-Net
Segmented image
Predicted mask
IoU: 0.707
FPN
Segmented image
Predicted mask
IoU: 0.69
LinkNet
trast images, resulting in the second lowest IoU scores ob- 5. Conclusions
served. This highlights U-Net’s limitations in challenging
segmentation scenarios. FPN performed slightly better than This study presents a novel approach for the automatic
the other models but still failed to accurately delineate miner- identification of quartz minerals using deep learning tech-
al boundaries, as evidenced by its relatively low IoU score. niques combined with hyperspectral imaging. The results
This suggests that FPN has room for improvement in hand- demonstrate the significant advancements in the efficiency
ling difficult cases. LinkNet recorded the lowest IoU score, and accuracy of mineral recognition brought about by this
making it the least effective model for mineral segmentation method. By integrating four advanced semantic segmenta-
in this example. The predicted mask was highly inaccurate, tion models—PSPNet, U-Net, FPN, and LinkNet—this re-
demonstrating LinkNet’s struggles with this task. search offers a comprehensive analysis and comparison of
These results illustrate that the performance of segmenta- their performance in accurately recognizing quartz minerals.
tion models can vary significantly under different conditions The innovative aspect of this work lies in the application
and that there is a need for further improvement. The model’s of deep learning and hyperspectral imaging to automate a tra-
lower performance under these scenarios can be attributed to ditionally manual and expertise-driven process. Unlike con-
several factors. In low-contrast and complex images, PSPNet ventional optical methods, our approach leverages modern
may struggle to differentiate subtle mineral boundaries, espe- AI techniques to expedite and enhance mineral identification,
cially when the feature extraction process encounters noise or setting a new standard in the field of mineralogical analysis.
lacks distinguishing characteristics. This can result in mis- This study utilized a dataset comprising 120 thin sections
classification or incomplete segmentation. Future research prepared from 20 rock samples, producing 2470 images,
will investigate more advanced techniques to address these which were divided into training and testing sets. Expert-
issues, such as incorporating attention mechanisms, improv- reviewed images were masked and organized for model
ing multi-scale feature extraction, and using more diverse training, providing a robust foundation for deep learning ap-
training datasets to increase robustness. plications.
14 Int. J. Miner. Metall. Mater.
Segmented image
Image Actual mask Predicted mask
IoU: 0.907
PSPNet
Segmented image
Predicted mask
IoU: 0.836
U-Net
Segmented image
Predicted mask
IoU: 0.855
FPN
Segmented image
Predicted mask
IoU: 0.78
LinkNet
The experimental results highlight the superior perform- particularly in challenging scenarios. The work could indeed
ance of the PSPNet model, which consistently outperformed focus on techniques such as data augmentation and regulariz-
the other models across multiple metrics, including accuracy, ation to improve the performance of these models. Data
specificity, and IoU scores. PSPNet demonstrated exception- augmentation, including random transformations like crop-
al accuracy and reliability in segmenting quartz from com- ping, flipping, and brightness adjustments, would introduce
plex geological samples, achieving the highest IoU scores in greater variability to the dataset, effectively reducing overfit-
various performance scenarios. The U-Net, FPN, and ting. Additionally, regularization techniques such as L2 regu-
LinkNet models also showed potential, particularly in high- larization or dropout can further enhance model generaliza-
performance examples, but faced challenges in more com- tion. Another avenue to explore would be the implementa-
plex scenarios, indicating the need for further refinement. tion of early stopping to avoid overfitting during the training
Despite the overall success of these models, certain condi- phase. Moreover, expanding the dataset with diverse and
tions revealed limitations in their segmentation accuracy, es- complex examples would improve robustness and lead to
pecially in challenging and low-contrast images. This sug- more generalized model performance, which can be critical
gests that while the current models provide a substantial im- in ensuring accurate segmentation across different applica-
provement over traditional methods, there is still room for tions.
enhancement to ensure more reliable and accurate outcomes The continued development and application of these ad-
under diverse conditions. vanced techniques will further enhance the efficiency and re-
In conclusion, this study significantly contributes to the liability of mineral identification, providing valuable tools for
field of automated mineralogical analysis by demonstrating geologists and advancing the broader field of geology.
the practical utility and high performance of deep learning
models. Future work should focus on addressing the Conflict of Interest
observed limitations, exploring hybrid approaches, and refin-
ing model architectures to improve segmentation accuracy, All authors do not have competing interests to declare.
G. Külekçi et al., Enhancing mineral processing with deep learning: automated quartz identification using thin ... 15
Segmented image
Image Actual mask Predicted mask
IoU: 0.121
PSPNet
Segmented image
Predicted mask
IoU: 0.098
U-Net
Segmented image
Predicted mask
IoU: 0.163
FPN
Segmented image
Predicted mask
IoU: 0.053
LinkNet
Fig. 18. Automatic identification examples of quartz minerals demonstrating low performance.
way forward for mining sector, Artif. Intell. Rev., 53(2020), No. [24] W.L. Chen, C.N. Ji, D. Xu, and N. Srinil, Wake patterns of
8, p. 6025. freely vibrating side-by-side circular cylinders in laminar flows,
[15] T. Long, Z.B. Zhou, G. Hancke, Y. Bai, and Q. Gao, A review J. Fluids Struct., 89(2019), p. 82.
of artificial intelligence technologies in mineral identification: [25] T.E. Oliphant, Guide to NumPy, [2024–08–20], https://csc.uc-
Classification and visualization, J. Sens. Actuator Network, davis.edu/~chaos/courses/nlp/Software/NumPyBook.pdf
11(2022), No. 3, art. No. 50. [26] J.D. Hunter, Matplotlib: A 2D graphics environment, Comput.
[16] T. Sun, H. Li, K.X. Wu, F. Chen, Z. Zhu, and Z.J. Hu, Data- Sci. Eng., 9(2007), No. 3, p. 90.
driven predictive modelling of mineral prospectivity using ma- [27] Keras-Resources, GitHub [2024–08–20], https://github.com/
chine learning and deep learning methods: A case study from fchollet/keras-resources.
southern Jiangxi province, China, Minerals, 10(2020), No. 2, [28] Segmentation Models, GitHub [2024–08–20], https://github.
art. No. 102. com/qubvel/segmentation_models.
[17] H.J. Zhao, K.W. Deng, N. Li, Z.W. Wang, and W. Wei, Hier- [29] H.S. Zhao, J.P. Shi, X.J. Qi, X.G. Wang, and J.Y. Jia, Pyramid
archical spatial-spectral feature extraction with long short term scene parsing network, [in] 2017 IEEE Conference on Com-
memory (LSTM) for mineral identification using hyperspectral puter Vision and Pattern Recognition (CVPR), Honolulu, 2017,
imagery, Sensors, 20(2020), No. 23, art. No. 6854. p. 6230.
[18] N. Agrawal and H. Govil, A deep residual convolutional neural [30] O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional
network for mineral classification, Adv. Space Res., 71(2023), networks for biomedical image segmentation, [in] Medical Im-
No. 8, p. 3186. age Computing and Computer-assisted Intervention—MICCAI
[19] Y.S. Chen, Z.H. Lin, X. Zhao, G. Wang, and Y.F. Gu, Deep 2015: 18th International Conference, Munich, 2015, p. 234.
learning-based classification of hyperspectral data, IEEE J. Sel. [31] T.Y. Lin, P. Dollár, R. Girshick, K.M. He, B. Hariharan, and S.
Top. Appl. Earth Obs. Remote. Sens., 7(2014), No. 6, p. 2094. Belongie, Feature pyramid networks for object detection, [in]
[20] D.Y. Li, Z.D. Liu, Q.Q. Zhu, C.X. Zhang, P. Xiao, and J.Y. Ma, 2017 IEEE Conference on Computer Vision and Pattern Recog-
Quantitative identification of mesoscopic failure mechanism in nition (CVPR), Honolulu, 2017, p. 936.
granite by deep learning method based on SEM images, Rock [32] A. Chaurasia and E. Culurciello, LinkNet: Exploiting encoder
Mech. Rock Eng., 56(2023), No. 7, p. 4833. representations for efficient semantic segmentation, [in] 2017
[21] Z.D. Liu, D.Y. Li, Q.Q. Zhu, C.X. Zhang, J.Y. Ma, and J.J. IEEE Visual Communications and Image Processing (VCIP),
Zhao, Intelligent method to experimentally identify the fracture St. Petersburg, 2017, p. 1.
mechanism of red sandstone, Int. J. Miner. Metall. Mater., [33] J.X. Hu, L. Li, Y.J. Lin, F.G. Wu, and J.S. Zhao, A comparison
30(2023), No. 11, p. 2134. and strategy of semantic segmentation on remote sensing im-
[22] E.J.Y. Koh, E. Amini, G.J. McLachlan, and N. Beaton, Util- ages, [in] 15th International Conference on Natural Computa-
ising convolutional neural networks to perform fast automated tion, Fuzzy Systems and Knowledge Discovery, Kunming, 2019,
modal mineralogy analysis for thin-section optical microscopy, p. 21.
Miner. Eng., 173(2021), art. No. 107230. [34] K.M. He, X.Y. Zhang, S.Q. Ren, and J. Sun, Deep residual
[23] H. Liu, Y.L. Ren, X. Li, et al., Rock thin-section analysis and learning for image recognition, [in] 2016 IEEE Conference on
identification based on artificial intelligent technique, Pet. Sci., Computer Vision and Pattern Recognition (CVPR), Las Vegas,
19(2022), No. 4, p. 1605. 2016, p. 770.