0% found this document useful (0 votes)
59 views18 pages

Leveraging Machine Learning For Lithology Discrimination

This document describes using machine learning techniques to develop a model for classifying different rock lithologies (rock types). 10 petrophysical attributes from a dataset of 14,405 observations were used to train models to distinguish between 5 lithologies: shale, siltstone, sandstone, limestone, and dolomite. Random forest, logistic regression, KNN, SVM, and discriminant analysis models were tested on a validation set. The random forest model achieved a perfect accuracy score of 1.000 and lowest misclassification rate, indicating it is well-suited for lithology discrimination based on the selected attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views18 pages

Leveraging Machine Learning For Lithology Discrimination

This document describes using machine learning techniques to develop a model for classifying different rock lithologies (rock types). 10 petrophysical attributes from a dataset of 14,405 observations were used to train models to distinguish between 5 lithologies: shale, siltstone, sandstone, limestone, and dolomite. Random forest, logistic regression, KNN, SVM, and discriminant analysis models were tested on a validation set. The random forest model achieved a perfect accuracy score of 1.000 and lowest misclassification rate, indicating it is well-suited for lithology discrimination based on the selected attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1.

5
Lift Curve (Siltstone)
Leveraging Machine
In the world of rocks and
1
Learning for

Lift
earth sciences, being 0.5

precise is essential. This 0


0% 20% 40% 60% 80% 100%
Lithology
study takes you on a Percentage of Sample

journey to discover the Discrimination


incredible "Lithology Lift Curve (Limestone)
3
Discriminator" It is like a 2.5
2 Unveiling the Lithology Discriminator
super-smart tool we have

Lift
1.5
1

built using Machine learning 0.5


0
0% 20% 40% 60% 80% 100%

Our data is like a treasure chest full Percentage of Sample

of numbers and facts. We used this


data to guess what kind of rocks are Lift Curve (Sandstone)
1.6
hiding underground. Our aim? To be
1.4
really, really good at it.
Lift

1.2

0.8
0% 20% 40% 60% 80% 100%
Percentage of Sample
Leveraging Machine Learning for Lithology Discrimination

Abstract:
This report presents a comprehensive study that employs various machine-learning techniques to develop a discriminative

model for lithology classification. The study utilized a substantial dataset containing key petrophysical attributes, including

PE, DT, GR, NPHI, RHOB, RT10_l10, RT20_l10, RT30_l10, RT60_l10, and RT90_l10, to distinguish between five distinct

lithologies: Shale, Siltstone, Sandstone, Limestone, and Dolomite.

Introduction
The classification of lithologies in the field of geology and petrophysics is a fundamental task that has traditionally relied on

expert interpretation. However, the application of machine learning and artificial intelligence techniques has shown

promising results in automating and enhancing this process. In this study, we explore the application of various machine-

learning algorithms to create a discriminative model for lithology classification.


Methods:
We have chosen to use the following machine learning methods in our study:

1. Logistic Regression

2. Random Forests

3. K Nearest Neighbors

4. Support Vector Machine (SVM)

5. Discriminant Analysis

Dataset and Split:


Our dataset consists of a diverse range of petrophysical attributes, including PE, DT, GR, NPHI, RHOB, and various resistivity

values (RT10_l10, RT20_l10, RT30_l10, RT60_l10, RT90_l10). The data was randomly split into two samples: 80% of the

observations were used for model training, and the remaining 20% were reserved for model validation.
Results:
Our machine learning techniques were applied to the lithological dataset, and the results demonstrate significant accuracy

and reliability in distinguishing the various lithologies. Clear patterns emerged, indicating the potential benefits of artificial

intelligence and machine learning in lithology discrimination.

Performance metrics Logistic regression Random forests K Nearest Neighbors SVM Discriminant Analysis

Accuracy 0.995 1.000 0.977 0.690 0.372


Precision 0.954 1.000 0.784 0.471 0.198
Recall 0.956 1.000 0.700 0.440 0.197
Correct classification 3583.000 3601.000 3519.000 2486.000 1341.000
Misclassification 18.000 0.000 82.000 1115.000 2260.000
F-score 0.955 1.000 0.739 0.455 0.198
The best model, according to the statistic Misclassification computed on the validation sample, is colored in green in the table above.

Discussion:
The results indicate that the Random Forests model outperforms the other techniques, achieving a perfect accuracy and

the lowest misclassification rate. This suggests that Random Forests are well-suited for lithology discrimination based on

the selected dataset.


Conclusion:
This study highlights the potential of machine learning techniques in lithology discrimination. The application of Random

Forests, in particular, demonstrates the capability to automate the classification of lithologies with a high degree of accuracy.

As the field of petrophysics and geology continues to evolve, these techniques offer a valuable tool for efficient lithological

analysis.
Some obtained results
Dataset for training
Dataset for testing
Modeling Results
Variable Observations Obs. with missing data Obs. without missing data Minimum Maximum Mean Std. deviation

PE 14405 0 14405 1.950 14.315 4.040 1.453


DT 14405 0 14405 47.680 107.337 72.158 9.400
GR 14405 0 14405 5.854 205.206 78.553 39.293
NPHI 14405 0 14405 -1.152 38.857 13.109 7.486
RHOB 14405 0 14405 2.026 2.923 2.524 0.117
RT10_l10 14405 0 14405 -0.650 2.593 0.701 0.554
RT20_l10 14405 0 14405 -0.636 2.597 0.660 0.558
RT30_l10 14405 0 14405 -0.685 2.591 0.642 0.569
RT60_l10 14405 0 14405 -0.712 2.601 0.629 0.584
RT90_l10 14405 0 14405 -0.724 2.597 0.627 0.592

Summary statistics (Quantitative data)

Variable Categories Counts Frequencies %


Lithology Dolomite 44 44 0.305
Limestone 193 193 1.340
Sandstone 5110 5110 35.474
Shale 6644 6644 46.123
Siltstone 2414 2414 16.758

Summary statistics (Qualitative data)


The data I selected was randomly split into two samples. 80% of the observations were used to train the model and 20%
for validation.
I have chosen to use the following methods:
- Logistic regression
- Random forests
- K Nearest Neighbors
- SVM
- Discriminant Analysis
I will find below a summary table containing several indices, computed on the validation sample, corresponding to these
methods. I will then find detailed results for each method.

Performance metrics Logistic regression Random forests K Nearest Neighbors SVM Discriminant Analysis
Accuracy 0.995 1.000 0.977 0.690 0.372
Precision 0.954 1.000 0.784 0.471 0.198
Recall 0.956 1.000 0.700 0.440 0.197
Correct classification 3583.000 3601.000 3519.000 2486.000 1341.000
Misclassification 18.000 0.000 82.000 1115.000 2260.000
F-score 0.955 1.000 0.739 0.455 0.198

Summary table: The best model, according to the statistic Misclassification computed on the validation sample, is colored
in green in the table above.
Logistic regression
Classification for the training sample (Variable Lithology):
Classification for the validation sample (Variable Lithology):
Random forests
Misclassification rate:

Confusion matrix (OOB sample):

Confusion matrix (Validation sample):


Importance chart:
Discriminant Analysis
Eigenvalues:
ROC Curve & Cumulative gains curve
Lift Curve:
Thank you for taking the time to read our report. If you have any questions or require further information, please do not

hesitate to contact us via email ([email protected]). Additionally, if you would like to receive the Excel

sheet containing the results we obtained, kindly provide us with your email address, and we will promptly send it to you

You might also like