0% found this document useful (0 votes)

45 views16 pages

Comparative Analysis of Machine Learning Algorithms in Predicting Rate Ofrnpenetration During Drilling

Uploaded by

duraid ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views16 pages

Comparative Analysis of Machine Learning Algorithms in Predicting Rate Ofrnpenetration During Drilling

Uploaded by

duraid ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Research Article

Journal of Petroleum & Chemical Engineering

https://urfpublishers.com/journal/petrochemical-engineering

Vol: 1 & Iss: 1

Comparative Analysis of Machine Learning Algorithms in Predicting Rate of

Penetration during Drilling
Olaosebikan Abidoye Olafadehan*, Ikenna David Ahaotu
Department of Chemical and Petroleum Engineering, University of Lagos, Akoka-Yaba, Lagos 101017, Nigeria

Citation: Olafadehan OA, Ahaotu ID, Comparative Analysis of Machine Learning Algorithms in Predicting Rate of Penetration
during Drilling. J Petro Chem Eng 2023;1(1): 32-47.

Received: 14 October, 2023; Accepted: 31 October, 2023; Published: 07 November, 2023

*Corresponding author: Olafadehan OA, Department of Chemical and Petroleum Engineering, University of Lagos, Akoka-
Yaba, Lagos 101017 Nigeria, Phone: +234802-912-9559, Email: [email protected]
Copyright: © 2023 Olafadehan OA., et al., This is an open-access article published in J Petro Chem Eng (JPCE) and distributed
under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original author and source are credited.

ABSTRACT
Drilling for potential oil and gas reserves is one of the foremost practices in the petroleum industry. The drilling process,
however, is quite expensive and can take quite some time to accomplish. Hence, there has been a rise in the need to reduce
cost and time by optimizing the rate of penetration during drilling, which has led to the development of mathematical models
to describe and evaluate this process. However, the accuracy of these models has varied owing to variation of the drilling
parameters accounted for in each model. This event has led to the usage of alternative approaches such as Data driven models. In
this study, the predictive capacities of the rate of penetration (ROP) during drilling using machine learning (ML) algorithms of
support vector machine regression (SVR), Random Forest regression (RF), Linear regression (LR), KNearest neighbors (KNN),
Stacking technique, Voting technique and Convolution neural network (CNN), were compared. Data from an oil well in Nigeria
was used in this investigation. The data for the well was split into train–test sets in the ratio of 60:40. The train data was used to
train and select the best model before making predictions on the test sets. The Stacking technique was found to have the best
performance across both training and test data sets with respective accuracies of 99.8% and 97.5% in terms of the –score. The
Voting technique also performed well, with respective accuracies of 93.6% and 92.6% in terms of the –score across both sets of
data. The CNN model equally performed well on the training and test data sets, with respective accuracies of 92.4% and 92.8%
in terms of the –Score. Generally, the machine learning models were able to detect patterns and gain valuable insights into the
data. They can be employed for real time prediction of the rate of penetration during oil well drilling.
Keywords: Rate of penetration; Drilling; Artificial intelligence; Machine learning algorithms; train–test data.
Abbreviations
AI Artificial Intelligence
ANN Artificial Neural Network
CNN Convolutional Neural Network
DDR Daily Drilling Report
KNN KNearest Neighbors
MAE Mean Absolute Error
ML Machine Learning
R2 Coefficient of determination
RMSE Root mean squared Error.
ROP Rate of penetration.
RPM Rotary speed,
SVR Support Vector Regressor
WOB Weight on bit, kblf

1
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

1. Introduction systems to perform computational tasks without requiring

explicit programming and learn from the data. Finding patterns
Drilling is a key aspect of the petroleum industry. It is the
in numerical data by applying computer algorithms to convert
process of boring a hole deep into the subsurface section of the
data into numerical form is known as machine learning.
earth in order to reach formations with hydrocarbon reserves,
Amongst other formats, the data may be in the form of pictures,
for the aim of hydrocarbon recovery. The importance of this
music, numbers, or alphabetical data. The algorithms used to
process cannot be understated and as a result, a lot of different
find the patterns within these data are called machine learning
drilling technologies were implemented to maximize drilling
models. These models, which include linear regression, logistic
operations. The popular drilling method used today known
regression, decision Trees, random forest, K-Means, K-Nearest
as the rotary drilling, which is applied in drilling the majority
Neighbors, are used for prediction, data sub-grouping and
of onshore and offshore wells and makes use of an applied
sound-detection, amongst others. They have been applied to
axial force on the rotating drill bit to achieve penetration. It is
aid in the prediction of ROP values with better accuracy and
impossible to overstate the significance of this procedure, which
generalization. ML operations are divided into supervised and
is why numerous drilling methods have been used to maximize
unsupervised learning. Supervised learning is a paradigm in
drilling operations. The bulk of onshore and offshore wells are
machine learning here input objects and a desired output value
drilled using the widely used technique known as the rotary
train a model. The training data is processed, and builds a
drilling, which applies an axial force to the revolving drill bit
function that maps new data on expected output values (e.g.,
to accomplish penetration. In a rotary drilling process, key
regression and classification). In unsupervised learning, the data
parameters need to be considered to ensure optimal operations,
has no target label, the machine learning model aims at finding
and a key parameter among these is the rate of penetration, ROP.
hidden patterns in the data using algorithms to make critical
It is the depth of penetration accomplished per unit time, and is
judgments in the future (e.g., clustering and recommendation).
usually measured as a factor of how many feet the bit can drill in
an hour (i.e., ft/h). However, evaluation of ROP is difficult due Deep learning is a branch of the machine learning and
to the complex relationship between other drilling parameters artificial intelligence that mimics the operation of how the human
affecting the ROP. The rate of penetration (ROP) prediction is brain receives, process and transmit information, as depicted in
a key task in drilling economical assessments1. Not always is Figure 1.
the lowest cost per foot provided by the fastest drilling pace. A
rise in the project’s overall cost may be caused by other factors.
The characteristics of drilling fluid (such as mud viscosity, mud
density, filtration loss), mechanical characteristics (such as bit
type and weight), and formation properties (such as porosity, rock
abrasivity, formation elasticity, formation stress, permeability)
are a few examples of the properties that affect penetration rate2.
Hence, it is important to maximize the rate of penetration in order
to mitigate some of the general cost associated with drilling for
extended periods. Therefore, it is necessary to understand the
relationship between the ROP and other operational parameters.
Mathematical models have been used to model the
relationship between some operational parameters and ROP
e.g., Bourgoyne and Young3 model and the Bingham4. The
accuracy of these models has varied due to variation in the Figure 1: Human neuron model.
drilling parameters considered in each model. This has led to
the usage of alternative approaches such as a data driven model Deep learning (DL) is essentially a neural network with one
e.g., artificial intelligence (AI). Artificial intelligence methods or more layers. The components of the human neural network
have developed rapidly over the past decades and has led to it are modelled similar to the neural network operation8. The
been implemented in various sectors, including the oil and gas dendrites act as input nodes, cell body represents activation
industry. Colossal amount of data is been generated on the oil function, synapse is the weightage of each input, and the axon
field during operating hours. These data include drilling data, terminal is the output node as shown in Figure 2.
production data, seismic data and mud log data, amongst others.
These data sets can be trained using artificial intelligence
methods to make future predictions and generate hidden insights
into the data. The AI methods have been used extensively in
applications to the petroleum industry where they can provide
solutions to drilling problems such as prediction of drill bit wear
from drilling parameters, real-time predictions of alterations in
drilling fluid rheology5, and the estimation of oil recovery factor
for water drive sandy reservoirs6.
1.1 Artificial Intelligence
Machine Learning (ML) and Deep Learning (DL) are
branches of artificial intelligence that deals with computerized
systems and algorithms learning from previous data generated7. Figure 2: A typical feed forward neural network architecture9.
By utilizing various algorithmic strategies, they enable the
Neural networks (or deep learning) are massively parallel
2
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

distributed processor that store and make use of experiential the models examined, the Bourgoyne and Young model exhibited
knowledge. It is classified into 3 parts: artificial neural network the highest level of predictive performance. Mahasneh14
(ANN), convolutional neural network (CNN), and recurrent developed a mathematical model to predict the rate of penetration
neural network (RNN), which are used to carry out different (ROP) in gas wells, considering the factors of weight on bit
operations. The ANN is mostly used to carry out regression (WOB), bit rotation speed (RPM), flow rate (FR), formation
and classification problems. The CNN is mostly used to carry strength, depth, and formation compaction. He then used his
out image processing and prediction while the RNN is mostly model to optimize the drilling parameters for a gas well in
used for forecasting operations. A convolutional neural network Jordan, increasing the ROP by 15% and reducing the cost of
and a few machine learning strategies are used in this work. drilling by 10%. Mahasneh14’s study demonstrated the
Convolutional layers, feature extractors (filters), pooling layers, importance of drilling optimization in improving the efficiency
hidden layers, and one or more output layers are the components and cost-effectiveness of drilling operations. Amar and Ibrahim15
of a convolutional neural network. Weights are used to connect worked on the comparative analysis of physics-based equations
the layers in the hidden layer of the CNN structure. These weights with artificial neural networks (ANN). They developed two
facilitate information flow between layers and aid in neural neural network models to evaluate the ROP values. The input
network training. An activation function is present in every parameters into the neural networks were formation depth, ECD,
hidden layer, which helps to save computational time and cost weight on bit, DSR, pore pressure gradient, drill bit tooth wear,
by converting the data into a more computer-interactive format. and Reynolds number function. The physics-based equations
To extract important features from the data, convolutional layers used for the comparative analysis were the Bingham4 model and
assist in performing convolutional operations on the data. Bourgoyne and Young3 model. A comparison of the predictive
accuracy of the developed ANN-based models with the available
Before the data is sent to the filter, which extracts the features
empirical equations showed that both ANN-based models were
and patterns in the dataset, the convolutional layer typically
highly accurate for estimating the ROP as compared with the
receives the input in the form of length, breadth, height, and
empirical equations. Shi et al.16predicted the rate of penetration
color channels. CNNs have two feature extraction layers: one
(ROP) using the Extreme Learning Machine (ELM) and Upper-
that makes use of pooling layers and the other that makes use of
layer solution-ware (USA) techniques. To construct the
filters. To extract even more important insights from the dataset,
predictive models, various input parameters such as formation
a pooling layer made up of a pooling approach is employed to
properties, rig hydraulics, bit specifications, weight on bit, rotary
perform pooling on the features that the filter helped extract. To
speed, and mud properties were utilized. These input features
conduct out-pooling, different sorts of pooling techniques are
were selected based on reservoir data from Bohai Bay, China.
employed, such as MaxPooling, Average Pooling, and Global
The performance of the developed models using ELM and USA
Pooling.
techniques was compared with an artificial neural network
Bilgesu et al10. used an artificial neural network to develop an model. The accuracy of these models was evaluated using
ROP model, which was dependent on several operating metrics such as regression coefficient ( R 2 ), mean absolute error
parameters. A data of 500 points was used, with nine features, (MAE), and root mean square error (RMSE). The findings
which were tooth wear, rotary speed, torque, weight on bit, indicated that the ROP model developed with the USA technique
pump flow rate, rotating time, bearing wear, formation exhibited the highest predictive performance compared to the
drillability, and formation abrasiveness. A train-test ratio of 9:1, other models. Additionally, it was observed that the development
which implies 90% of the data was used for training and 10% for of the ROP model using the extreme learning technique required
validating the model. A coefficient of determination ( R 2 ) the most time investment. Ahmed et al.17 investigated the
between 0.902 and 0.982 was achieved after cross-validation application of a support vector machine model to estimate the
across the data. In the work of Arabjamaloei and Shadizadeh11, rate of penetration in a formation containing shale materials. The
an artificial neural network with a single hidden layer of 10 input features used in the model were hinged on drilling
neurons was developed and combined with genetic algorithm parameters and mud properties such as weight on bit, rotary
(GA) to create a model to predict ROP values. There were seven speed, pump flow rate, standpipe pressure, drilling torque, mud
features and 300 points (rows) in the data. The bit type, formation density, plastic viscosity, funnel viscosity, yield point and solid
properties, bit operating condition (rotary speed and bit weight), content (%). The support vector machine model and the
bit tooth wear, bit hydraulics, hydrostatic head, and equivalent Bourgoyne and Youngs model were trained on more than 400
circulating density were the input features. A total of 224 points real data in shale formation using these 10 features as inputs.
were used for model training, 56 points for validation, and 20 The two models were both compared on their predictive
points for testing. The generic algorithm was employed to find performance on the test data. The Bourgoyne and Young (1974)
where the maximum rate of penetration occurred. With a low model produced a coefficient of determination (R 2 ) of 0.0692
mean-square error for both training and test set, it was concluded and an absolute percentage error of 23.41%. By applying the
that the neural network is valid for other data sets that fall within support vector machine (SVM) model, a coefficient of
the range of data set used for training the model12.performed a determination ( R 2 ) of 0.995 and an absolute percentage error of
comparative evaluation of models for estimating the rate of 2.82% were obtained. It was concluded that SVM can be used to
penetration (ROP) by utilizing field data from a well located in predict ROP with higher accuracy and also generate ROP values
Iran. The model used for this study were the Bingham4, Warren13 faster than the Bourgoyne and Young3 model. Elkatany5
and, Bourgoyne and Young3 models. They carried out ROP developed an artificial neural network (ANN) model to predict
predictions on wells that were drilled with roller cone and PDC the rate of penetration (ROP) using data collected from three
bits, and comparison was carried out on three separate drilling vertical wells in an offshore oilfield. The ANN-ROP model was
sections. However, there was a short coming of this study, in that obtained based on drilling parameters and drilling fluid
threshold WoB was neglected due to lack of drill-off test been properties. Two wells were utilized for training the model, and
carried out. The findings of this study demonstrated that among the third well was used to evaluate the accuracy of the model.

3
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

The performance of the ANN-ROP model was compared to intricate relationships within the data and making accurate
other ROP models of Bingham (1965), Bourgoyne and Young3, predictions. The ANN-ROP model achieved a rate of penetration
and Maurer18. Elkatany5 concluded that the proposed ANN-ROP prediction with an average absolute percentage error (AAPE) of
model exhibited superior performance over others considered in 5.776% and a regression coefficient ( R 2 ) of 0.996. Ashrafi et
his work. The training data consisted of 3333 data points and al.23 explored the prediction of rate of penetration (ROP) using
yielded a coefficient of determination ( R 2 ) of 0.99, with an various optimization algorithms and neural network architectures.
average absolute percentage error (AAPE) of 5%. The test set, The optimization algorithms employed included Genetic
consisting of 2700 unseen data points from the third well, Algorithm (GA), Particle Swarm Optimization (PSO),
resulted in the ANN-ROP model predicting the rate of penetration Biogeography-based Optimizer (BBO), and Imperialist
with R 2 = 0.9 and AAPE = 4%. Zhang et al.19 proposed a deep Competitive Algorithm (ICA). These algorithms were combined
convolutional neural network (CNN) model for predicting the with different neural network architectures to develop hybrid
rate of penetration (ROP) during drilling operations. The authors ROP models. To evaluate the performance of the hybrid models,
argued that existing models for predicting ROP are often the results were compared with two other models: Non-linear
inaccurate and unreliable, and that deep learning methods could Multiple Regression (NLMR) and Linear Multiple Regression
provide a more accurate and practical solution. They collected (LMR) techniques. For the hybrid models, two popular neural
data from drilling operations in two different fields and used it to network architectures, namely Multi-Layer Perception (MLP)
train and test the proposed deep CNN model in their work. The and Radial-Based Function (RBF), were utilized. These
model consists of six convolutional layers and is trained using a architectures consisted of two hidden layers with 4 and 6
mean absolute percentage error (MAPE) loss function. The neurons, respectively. The activation function used in the hidden
authors compared the performance of their deep CNN model to layers and output layer was tan-sigmoid. The input features were
other machine learning models and found that it outperformed weight on bit, rotational speed of the drill bit, pump inlet flow
these models in terms of accuracy and reliability. They also rate, pore pressure pump pressure, gamma ray, density log, and
conducted sensitivity analyses to determine the most important shear wave velocity. The dataset used for the study consisted of
features for predicting ROP. They found that the weight on bit, 1000 data points, collected from the Marun oilfield in Iran. It
the rotary speed, and the mud flow rate were the most important was concluded in their study that the hybrid models utilizing
features for predicting ROP. Zhao et al.20 focused on developing PSO-MLP and PSO-RBF neural networks exhibited the best
multiple artificial neural network (ANN) models for predicting predictive accuracy for ROP. The root mean square error (RMSE)
the rate of penetration (ROP) using data collected from a gas values for these models were 1.12 and 1.4, respectively,
well located in the southern region of Iran. A dataset comprising indicating their superior performance compared to the other
3180 data points was obtained from various drilling sections, developed models. Iqbal24 developed a mathematical model to
involving one run of a roller-cone bit and three runs of PDC bits. predict the rate of penetration (ROP) in drilling operations,
To construct the ANN-ROP models, several input variables were considering the factors of weight on bit (WOB), bit rotation
considered, including depth, rotary speed of the bit, weight on speed (RPM), flow rate (FR), formation strength, depth, and
bit (WOB), shut-in pipe pressure, fluid rate, mud weight, the formation compaction. He then used his model to optimize the
ratio of yield point to plastic viscosity, and the ratio of 10-minute drilling parameters for a real-time drilling dataset from a Middle
gel strength to 10-second gel strength. Three different training Eastern oil field, increasing the ROP by 10% and reducing the
functions, namely Levenberg-Marquardt (LM), Scaled cost of drilling by 5%. Iqbal’s study demonstrates the importance
Conjugate Gradient (SCG), and One-Step Secant (OSS), were of using real-time drilling parameters to optimize drilling
employed in combination with the neural networks to estimate operations and provides a valuable contribution to the field of
the penetration rates. It was concluded that the ANN-ROP model drilling engineering. Burgos et. al.25 developed a convolutional
utilizing the Levenberg-Marquardt (LM) function demonstrated neural network (CNN) model to predict the rate of penetration
the best prediction performance, achieving a regression (ROP) during rotary drilling operations. The model takes in 10
drilling parameters as inputs, such as weight on bit, rotary speed,
coefficient ( R 2 ) of 0.91 in training and 0.89 in testing. flow rate, and hook load. The inputs are normalized between 0
Furthermore, they also applied the Artificial Bee Colony (ABC) and 1. The CNN architecture consists of 3 convolutional layers
algorithm to optimize the ROP. The optimization process followed by 2 fully connected layers. The output layer has a
resulted in an approximate improvement of 20–30% in the rate single node with a linear activation to predict the ROP value.
of penetration. Abdulmalek et al.21 carried out a comparative The model was trained on data from over 600 wells. It achieved
analysis between artificial intelligence techniques and some a mean absolute percentage error (MAPE) of 9.3% on the test
traditional models for ROP prediction in shaley formations. An set, outperforming traditional machine learning models like
artificial neural network was developed for the ROP prediction linear regression, random forests, and support vector regression.
in the shale formation. The parameters considered for the An ablation study showed that the CNN’s ability to learn
prediction of the rate of penetration (ROP) included torque, complex non-linear relationships between the drilling parameters
standpipe pressure, pump rate, mud weight, funnel and plastic allowed it to accurately predict ROP, whereas simply averaging
viscosities, solid content, and yield point. The traditional ROP the inputs did not work as well. The model was able to generalize
models such as those proposed by Bingham4, Warren13, the data from 50 additional wells, with the MAPE only increasing
Bourgoyne and Young3, Maurer18 and Hareland and Hoberock22, slightly to 10.2%. This shows the model has good generalization
were selected for comparison. Both the artificial neural network– performance. In conclusion, the CNN approach effectively
ROP (ANN-ROP) model and the traditional models underwent modelled the complexity between drilling parameters and ROP,
training and testing using a dataset consisting of 347 data points outperformed traditional models, and generalized well to new
from a deep shale formation in an onshore oilfield. Additionally, data. This could enable more efficient drilling operations through
200 new data points from an upper shale formation were utilized accurate ROP predictions. Monazami et al.26 used an artificial
to validate the models. The results indicated that the ANN-ROP neural network (ANN) to predict the rate of penetration (ROP)
model outperformed the other models in comprehending the in drilling operations. The ANN model took cognizance of
4
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

formation strength, depth, formation compaction, pressure (SVR), random forest regression (RFR), and gradient boosting
differential, bit diameter, weight on bit (WOB), bit rotation regression (GBR), which were combined using a weighted
(RPM), and bit hydraulics. The authors evaluated the performance average ensemble method. The authors compared the
of their ANN model on a test dataset of ROP data. They found performance of their hybrid ensemble learning model to other
that the model was able to predict ROP with high accuracy. The machine learning models and found that it outperformed these
average error between the predicted and actual ROP values was models in terms of accuracy and reliability. The authors also
less than 5%. The model was able to predict ROP with high conducted sensitivity analyses to determine the most important
accuracy, suggesting that ANN is a promising tool for optimizing features for predicting ROP. They found that the weight on bit,
drilling parameters and improving the efficiency and cost- the rotary speed, and the mud flow rate were the most important
effectiveness of drilling operations. Abbas et al.27 employed an features for predicting ROP. Liu et al.31 proposed a stacked
artificial neural network (ANN) approach to develop a generalization ensemble model for predicting the rate of
computational-based method for predicting the rate of penetration (ROP) in gas well drilling. The model is trained on a
penetration (ROP). Through a thorough analysis of feature dataset of historical ROP data and drilling parameters from a
selection, it was determined that out of the 25 input variables shale gas survey well in Xinjiang, China. The model combined
examined, 19 variables had the greatest influence on the ROP. A the predictions of six machine learning models: support vector
dataset consisting of 13,125 data points from 14 deviated wells regression (SVR), extremely randomized trees (XRT), random
in a formation located in southern Iraq was collected for the forest (RF), gradient boosting machine (GBM), light gradient
study. The data specifically pertained to the 8 ½” production boosting machine (LightGBM), and extreme gradient boosting
casing section, which was drilled using a drag bit and a (XGB). They first used Pearson correlation analysis to identify
conventional bottom hole assembly (BHA) with a water-based the most important features from the dataset. Then, they used a
mud circulating system. It was concluded that the ROP model Savitzky-Golay smoothing filter to reduce noise in the dataset.
based on the artificial neural network, utilizing three hidden Finally, they trained the stacked generalization ensemble model
layers and employing the tan–sigmoid activation function, using the leave-one-out cross-validation method. The results
exhibited the highest efficiency in predicting ROP. The model showed that the stacked generalization ensemble model can
achieved a regression coefficient of 0.92 during training and significantly improve the accuracy of ROP prediction. The root
0.97 during testing, with mean absolute percentage errors mean square error (RMSE) of the model on the testing dataset is
(MAPE) of 9.1% and 8.8% in training and testing, respectively. 0.4853 m/h, which is lower than the RMSE of any of the
Furthermore, the model demonstrated good performance on 2
unseen data and did not exhibit overfitting issues. Miyora28 individual models. The model also has a high R value of
studied the factors that affect the rate of penetration (ROP) in 0.9568. They also used the model to optimize the ROP
geothermal drilling and developed a mathematical model to parameters. They use particle swarm optimization (PSO) to
predict ROP based on these factors. The model includes search for the optimal combination of ROP parameters. The
formation strength, depth, formation compaction, pressure results show that the optimized ROP parameters can significantly
differential, bit diameter, weight on bit (WOB), bit rotation improve the ROP. It was thus concluded that the stacked
(RPM), and bit hydraulics. Miyora () found that all these factors generalization ensemble model is a promising approach for
have a significant impact on ROP and used his model to optimize predicting ROP in gas well drilling. The model is accurate and
the drilling parameters for Well MW-17 in Menengai, Kenya, can be used to optimize the ROP parameters. Moraveji and
increasing the ROP by up to 20%. Al-AbdulJabbar et al.29 utilized Naderi32investigated the simultaneous effect of six variables on
an artificial neural network (ANN) in combination with self- penetration rate using real field drilling data via response surface
adaptive differential evolution (SADE) to predict the rate of methodology (RSM). The important variables included well
penetration (ROP) specifically in horizontal carbonate reservoirs. depth (D), weight on bit (WOB), bit rotation speed (N), bit jet
The model incorporated six input variables, including rotary impact force (IF), yield point, Y p , to plastic viscosity ratio, PVR
speed, torque, weight on bit, as well as formation petrophysical , ( Y p PVR ), 10 min to 10 s gel strength ratio (10MGS/10SGS).
properties such as gamma ray, resistivity, and bulk density data. Equally, bat algorithm (BA) was used to identify optimal range
The developed model demonstrated strong performance, of factors in order to maximize drilling rate of penetration. Their
results indicated that the derived statistical model provides an
achieving a regression coefficient ( R 2 ) of 0.96 and a mean
efficient tool for estimation of ROP and determining optimum
absolute percentage error (MAPE) of 5.12%. To further evaluate
drilling conditions.
the accuracy of the model, an unseen well was used as test data.
The resulting regression coefficient ( R 2 ) and MAPE values The aim of this study is to analyze the performance of
were 0.95 and 5.8%, respectively. Furthermore, their study machine learning and deep learning techniques in predicting the
aimed to enhance the interpretability of the ROP model by rate of penetration during drilling, which is crucial in optimizing
extracting the weights and biases in a matrix form, effectively drilling operations. The results of this study can contribute
transforming it from a black box model to a white box model. to drilling planning and optimization of future wells. Exact
Wang et al.30 proposed a hybrid ensemble learning approach for prediction of the rate of penetration during drilling will save the
predicting the rate of penetration (ROP) during oil and gas oil and gas industry a large amount of expenses during drilling
operation and reduce the amount of non-productive time (NPT)
drilling operations. They argued that existing models for
encountered during drilling operation.
predicting ROP are often inaccurate and unreliable, and that
ensemble learning methods can provide a more accurate and 1.2 Approaches to Rate of Penetration Modelling
practical solution. They collected data from drilling operations Over the past few years, a large amount of research has gone
in the Gulf of Mexico and used it to train and test their hybrid into ways in which ROP can be modelled with its dependent
ensemble learning model. The model consisted of several drilling parameters (controllable and uncontrollable). A key
machine learning algorithms, including support vector regression drive that leads to further research regarding this field is the

5
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

non-comprehensiveness of previous models developed. This Bourgoyne et al.35 aimed at seeking to optimize the
is because not all of the known ROP-affecting factors have controllable parameters during drilling operation. They proposed
been accounted for in a single model, which has led to poor the development of an ROP model based on the application
accuracy and generalizability of the estimated models33. The of multiple linear regression technique. The controllable
seemingly large number of factors affecting the ROP and parameters used in developing this model were eight: strength
essential requirement for a model with high accuracy and of formation, normal compaction function, weight on bit,
reliable generalization has led to development of various ROP bit teeth wear, rotary speed function, bit hydraulic function,
estimation models. An approach to carry out this modelling is differential pressure function, and under compaction function.
hinged on two patterns, which are physics-based approach, and These parameters were treated as independent parameters on the
data-driven approach. The physics-based approach involves ROP (the dependent parameter). The developed model was then
the use of mathematical modelling techniques to evaluate applied to estimate ROP for wells drilled vertically using roller
relationships between dependent parameter (ROP) and the cone bits, and it was concluded that the application of the ROP
independent parameters, so as to estimate accurate ROP values. model could help reduce drilling operational cost by 10%. On
These mathematical relationships are developed based on the inception, the model was basically created for modelling ROP
physics of the borehole. There are various models used for ROP for roller cone bits, but overtime has also shown effectiveness
estimation that are created using the physics-based approach in modelling ROP for PDC bits. The Bourgoyne et al.35 model
e.g., Cunningham model, Bingham4 model, Maurer19 model, is given by:
8
Motahhari et al. model33 and Hareland and Rampersad model34. R = ∏ Fi (7)
The Cunningham model is given by: i =1
where F1 ( = e a1 ) is the formation strength function for
(1)
[ ]
Bourgyone and Young model, F2 = e a2 (10000− D ) the normal
where R is the rate of penetration (ft/h), K the constant of compaction function for Bourgyone and Young model,
proportionality, W0 the threshold weight on bit (lbf) and N the under compaction function
rotary speed (rpm). for Bourgyone and Young model, F4 {= exp[a 4 D(g p − ρ c )]} the
Bingham model:4
(2) pressure differential function for Bourgyone and Young model,
    (w d ) − (w d )t  

F5  = exp a 5 In    the weight on bit function
where is the weight-on bit (klb), DB is the bit diameter 
    4 − (w d )t 

(in), a and b are the dimensionless constants for each rock for Bourgyone and Young model,
formation. the rotary speed function for Bourgyone and Young model,
Maurer model: (3) F7 {= exp[a 7 (− h )]} the bit tooth wear function for Bourgyone
18

and Young model and the bit hydraulic

where is the rate of penetration (ft/h), W the weight (Ibf), s
function for Bourgyone and Young model.
the confined rock strength (psi) and D the depth (ft). The physics-based approach has limitations due to the failure
to consider all the parameters affecting the drilling operation and
Motahhari et al. model : 33
(4)
in the choice of an empirical constant for the ROP estimation
with respect to the well/borehole in operation. This gave rise
where w f is the dimensionless wear function, G is a model to the use of data-driven approaches, which make use of data
coefficient related to bit-rock interactions and bit geometry, generated during drilling (Logging While Drilling (LWD))
α and γ are ROP model exponents. The bit coefficient, G, and artificial intelligence techniques for ROP estimation17. The
is determined by the bit design, cutter size, cutter rock friction application of AI models for ROP estimation was suggested by
coefficient and the bit geometry. In this model, a decrease in Bilgesu et al.10, so as to get over the weakness of the physics-
the value of the wear function, while keeping other model based approach and improve the accuracy of ROP predictability.
parameters constant leads to a decrease in ROP. In the case of
the bit size or compressive strength, when its value is decreased 2. Methodology
an inverse occurs. The relationship between N , and R is 2.1 Methods
non-linear. Hence, the exponents can yield an optimum value for Figure 3 shows the proposed methodology, adopted in this
and N due to the exponential nature of the relationship. study.
Hareland and Rampersad Model34: (5) 2.2 Data Collection
The data utilized for this study was obtained from the Daily
where N c is the number of cutters and Av the area of rock
Drilling Report (DDR) for an oil well in Nigeria. It contains
compressed ahead of a cutter (in2 ).
parameters that ROP depends on, which will help make a robust
Other models used for ROP estimation are as follows: model. Such parameters are weight on bit, pump flow rate, mud
weight, mud type, drill bit diameter and wellbore trajectory,
Bourgoyne and Young3 model: (6) amongst others. After data collection, the uncertainties within
the dataset and the suitable parameters are defined. This leads
where is the weight-on bit (klb), D B is the bit diameter to filtration of the dataset. The well contains data of 27 columns
(in), a and bare the dimensionless constants for each rock (the number of variables), 17280 rows, 0% missing cells, and
formation. 0% duplicate rows.
6
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

model. The variable transformation technique use in this study

were LogTransformer and BoxCoxTransformer.

Figure 5: Box plot of ROP data after outlier removal. removal.

2.3 Data Processing
Feature Scaling: This involves changing the scale of
The data preprocessing phase is also known as Feature numerical features in a dataset. To make the features similar and
Engineering Phase. The data set used for the study is subjected to prevent some from predominating others based only on their
various statistical manipulations and transformations in order to initial scale, the range or distribution of the features must be
extract relationships and insights between parameters in the data changed. This preprocessing technique is so important as it helps
and, process the data into forms that are more understandable the machine learning models to better understand the features as
by the algorithms, hence, producing better model performances. they will usually be within the range of 0 to 1, which the models
Such techniques include exploratory data analysis, missing usually prefer. For some particular models, it is a necessary
data imputation, outlier handling, feature scaling, variable requirement to perform feature scaling on the dataset before
transformation, and discretization, amongst others. These passing it into them e.g., ANN and CNN, while some models
processes help the model match key relationships between the are not influenced when the dataset is scaled or not e.g., Random
input parameters and the target variable. In this study, the data Forest and Extra Trees. There are many types of Feature scaling
preprocessing techniques used were outlier handling, variable techniques e.g., Standard Scaler, MinMax Scaler and Robust
transformation and feature scaling. scaling. Each of these techniques has their rules of engagement,
Outlier handling: This refers to the process involved in dealing so as to get better model performance. These rules depend on
with outliers found in a dataset. Outliers are simply data point dataset and model to be used. In this study, the standard scaler
that vary significantly to majority of the dataset. Outliers must was utilized so as to scale features to have a mean of 0 and a
be dealt with since they can significantly affect the outcomes standard deviation of 1. Table 1 shows the features definition
and precision of statistical models. Outlier treatment can be done with the data types used.
in a number of ways, such as by removing outliers, capping, Table 1: Features definition, unit, and data types.
or imputing more representative values. The method utilized in Feature Definition Units Data type
this study was the capping technique, which involved imputing
Depth The actual depth at which the drilling m Numerical
the interquartile range of the variable with the outlier where the
is taking place.
outliers are in the variable. Figures 4 and 5 show the box plot of
Lag Depth Time delay or lag between the mea- m Numerical
ROP data before and after outlier.
sured depth and the corresponding
ROP value.
WHO Weight on String. klb Numerical
ROP Rate of Penetration. m/h Numerical
R P M Turbine Speed. r e v / Numerical
TURBIN min
Torque Rotational force of drill string. klb.ft Numerical
SPP Standpipe Pressure. psi Numerical
Flow In flow rate of drilling. fluid pumped into gpm Numerical
the wellbore during drilling.
Mw In total volume of drilling mud pumped pcf Numerical
into the wellbore during a specific pe-
Figure 4: Box plot of ROP data before outlier. riod of time.
Variable Transformation: This technique was employed in Mw out total volume of drilling mud pumped pcf Numerical
this study to treat the variables that were skewed either to the out of a wellbore during a specific pe-
left or to the right. It involves changing a variable’s scale or riod of time.
distribution to satisfy requirements or enhance the performance PIT#1 mud pit volume in the first mud pit or Bbl Numerical
of statistical models. This preprocessing technique was mud tank.
performed on variables that were skewed either to the left or PIT#2 mud pit volume in the second mud pit Bbl Numerical
right, so as to equalize variances and establish linearization or mud tank.
among the variables, which makes it easier to interpret and
7
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

PIT#3 mud pit volume in the third mud pit Bbl Numerical Flow paddle 0.75
or mud tank. Bit position 0.20
PIT#4 mud pit volume in the fourth mud pit Bbl Numerical Hook position 0.26
or mud tank. String eight -0.01
PIT#5 mud pit volume in the fifth mud pit or Bbl Numerical Drag -0.53
mud tank.
PIT#6 mud pit volume in the Sixth mud pit Bbl Numerical
The correlation heat map of the well data is depicted in
Figure 6.
or mud tank.
TOT ACT Total Actual Time. Bbl Numerical
Steel Vol- The volume of steel that is used or Bbl Numerical
ume consumed during the drilling process.
Over pull Additional force applied to the drilling klb Numerical
assembly in order to increase the drill-
ing efficiency.
Flow Pad- Percentage of drilling fluid that cir- % Numerical
dle culates through the wellbore during
drilling.
Bit Posi- It refers to the vertical depth at which m Numerical
tion the drilling bit is located within the
wellbore.
Hook Po- Vertical position of the drilling hook m Numerical
sition or traveling block.
S t r i n g Total weight of the drill string, includ- klb Numerical
Weight ing the drill pipe, bottom hole assem-
bly (BHA), and any other components
attached to it.
Drag Resistance encountered by the drill klb Numerical
string and drill bit as they are ad-
vanced through the formation.

Table 2 gives information on the statistic of ROP variable.

Table 2: Descriptive Statistics of ROP variable.

Statistic Mean Standard M i n i - 25% 50% 75% Maxi-
deviation mum mum
Figure 6: Correlation heat map of the well data.
ROP 3.964 4.317 0.000 0.000 2.880 7.410 18.525
2.4 Feature Selection
Table 3 gives the Pearson correlation of the oil well features and
their values. The defined input parameters from the dataset must pass
through the feature selection phase. The feature selection is
Table 3: Pearson correlation of features with rate of penetration. the process of selecting a subset of relevant features (variable,
Features Well data predictors) for usage in building machine learning algorithms. It
Depth -0.37 involves selecting the pool of features that has significant impact
Lag depth 0.98 on making prediction with the machine learning algorithm. It is
WHO 0.15 a crucial phase, in the bid that a good machine learning model
is developed. The feature selection algorithms are divided into
RPM TURBIN 0.57
three main categories: filter, wrapper, and embedded methods.
Torque 0.70 The feature selection helps a user to better interpret the model
SPP 0.64 e.g., a model of 10 input parameters is much easier to interpret
Flow in 0.57 than that of 100 parameters. It also shortens training time for
Mw in -0.08 the machine learning algorithm and enhances generalization
Mw out 0.20 by reducing overfitting. In this study, a filter method known
as mutual information was used to select optimal features for
PIT#1 0.24
model building. Mutual information is a statistical measure of
PIT#2 -0.14
the mutual dependence of 2 variables. In other words, mutual
PIT#3 -0.10 information quantifies the amount of information gained about
PIT#4 -0.26 one random variable through observing another random variable.
PIT#5 0.05 The mutual information algorithm is given by:
PIT#6 0.23 I ( X ; Y ) = ∑∑ p( x , y ) × log {p( x , y ) /[ p( x) × p( y )] } (8)
TOT ACT -0.25 where I is the ranking score, X and Y the respective input and
Steel volume 0.20
8
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

output nodes, x and y are the dependent and target variables

respectively.
The algorithm selects the highest-ranking features that best
describes the target variable and separates them into percentiles
e.g., 10th, 20th, 30th etc., depending on the highest ranking. In
this study, the features in the top 50th (50 percentile), which
translated to 13 features out of the possible 27, were selected to
be used to the build the machine learning models. The features
selected after future selection are depth, lag depth, WOB, SPP,
MW IN, PIT#2, PIT#3, PIT#4, PIT#6, TOT ACT, steel volume,
bit position and hunk position
2.5 Data Splitting Figure 7: Decision tree regression schematic37.
Data splitting (otherwise known as cross validation) is a A random forest model works by training multiple decision
process utilized in the building of artificial intelligence models. trees in parallel and uses a bagging technique to obtain a robust
Here, data is partitioned into two or more ways to enable the model. Usually, machine learning models have hyperparameters,
model identify the patterns within the data set and predict its that is, parameters in the algorithm that are constant throughout
performance on unseen (real world) data. Two sets of the dataset training that help the algorithm better understand the data
are created: a training set and a testing set. The training set is patterns. Hyperparameters in random forest algorithm include
used to train the artificial intelligence model on the data while max_depth, max_features, min_samples_leaf, min_samples_
the testing set is used to assess the model’s performance in real split, n_estimators. To obtain optimal performance of the
world scenarios. This is because there is a probability that the random forest algorithm, optimal values must be selected for
built model may not be robust enough to perform successfully on these hyperparameters. To get the optimal values of the random
unknown (real world) data. There are various methods used for forest algorithm, a hyperparameter search algorithm (known as
cross validation operation viz. holdout method, K-fold method, the randomized search algorithm) must be used. This algorithm
Stratified K-fold method, Leave One-Out method, amongst helps to generate the optimal hyperparameter value for the
others. The K-fold cross validation technique was implemented hyperparameter to be utilized in the model. After implementing
in this study using the Python Sklearn package. randomized search algorithm on the well data, the optimal value
A 60:40 split of the oil well data was made into train and of the hyperparameters were max_depth = 31, max_features
test sets. There are 10368 rows and 13 columns in the training = sqrt, min_samples_leaf = 3, min_samples_split = 13, n_
set and 6912 rows and 13 columns in the test set. The training estimators = 666.
results were obtained by training the model on the train data 3.2 Linear Regression
and using the resulting model to predict the training set. The
test results were obtained by training the model on the train set Linear regression is a supervised method of machine learning
before predicting the test set. that uses one or more input features to predict a continuous target
variable. It is assumed that there is a linear relationship between
3. Model Development and Training the input variables and the goal variables. Linear regression
is intended to establish the optimal line according to the data,
Seven machine learning techniques were analyzed in this
minimizing the difference in predicted and real values. The
work, to be trained to make predictions of the rate of penetration
algorithm operates by generating the coefficients of the line’s
for the oil well. The machine learning models that were employed
linear equation. Some hyperparameters in linear regression
for this analysis are outlined as follows and their written codes
are copy_X and fit_Intercept. After implementing randomized
can be found in the Appendix.
search algorithm on the well data using linear regression as
3.1 Random Forest Regression the base model, the optimal value of the hyperparameters were
Random forest can be applied to both classification and copy_X = True, and fit_Intercept = True.
regression problems. It is an ensemble learning technique that 3.3 KNearest Neighbor
creates a large number of decision trees during training period
KNearest Neighbor (KNN), as shown in Figure 8, is a
and utilizes averaging to improve the prediction accuracy
supervised model-based machine learning technique that can
and control over-fitting. Random forests are widely used for
be applied to both classification and regression models. KNN is
applications (such as credit scoring and spam filtering) because
not a parametric algorithm, meaning that it does not make any
they can handle both categorical and continuous data. During
assumptions about the distribution of data. The KNN method
training, random forests create a lot of decision trees36. Each
is based on the hypothesis that similar occurrences will share
tree is constructed using a random subset of the features and
similar labels. The KNN technique identifies the K closest
a sample of the training data. The individual decision trees
neighbors to a given data point by reference to a distance metric,
predictions are combined by the random forest algorithm to
typically Euclidean, and assigns the label to the majority of these
provide a prediction. For a wide range of applications, random
K neighbors for a given data point. When the algorithm is doing
forests are a potent and useful machine learning technique. They
a regression, it takes the weighted average of all the target values
are often good performers and are quite simple to teach and tune.
from the K neighbor and uses it to predict the new value for the
A schematic of the decision tree regression is depicted in given data point. The number of neighbors is a hyperparameter
Figure. 7. that can be changed. Some hyperparameters in KNN algorithms
are algorithm, leaf_size, p, weights, and n_neighbours. After

9
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

implementing randomized search algorithm on the well data this study are random forests, or support vector machines, linear
using KNearest neighbor as the base model, the optimal value of regression, and nearest neighbors, while the meta model used is
the hyperparameters were algorithm= auto, leaf_size = 10, p = 1, the linear regression model.
weights = distance and n_neighbours = 3.

Figure 10: Stacking algorithm 40.

Figure 8: KNearest Neighbor38.
3.6 Voting Technique
Voting is a machine learning technique that involves the
integration of predictions from multiple independent models to
form a final prediction, as shown in Figure 11.

Figure 9: SVM schematic39.

3.4 Support Vector Machine Regression
Support vector machine (SVM) regression is a supervised
Figure 11: Voting Algorithm (LevelUpCoding).
learning algorithm that is primarily used in classification tasks.
It is derived from the concepts of support vector machines Voting technique is commonly referred to as ensemble
(SVM), as shown in Figure 9. The goal of SVM regression is voting, or majority voting, and is based on the principle that the
to identify a function that best matches the relationship between integration of the opinions of multiple models can often lead
the input and target variables. The SVM regression generates a to greater prediction accuracy than the use of a single model.
high-dimensional hyperplane with each data point as a feature Under the Voting algorithm, each base model is trained on
vector in the hyperplane space. The objective of the algorithm the same data set, but with different algorithms or settings.
is to find the hyperplane with the greatest margin, i.e., the During the prediction phase, each base model makes its own
distance from the hyperplane to the nearest data point in each prediction based on the data it has been trained on. Finally, the
class. In the regression case, SVM chooses the hyperplane that final prediction is calculated by adding up all the predictions
contains the most data points within the given range. The range using a voting system. The base models used for this study are
is the margin of tolerance, which allows some data points to fall random forests, or support vector machines, linear regression,
outside of the range. The support vectors are the data points that and nearest neighbors, while the meta model used is the linear
fall within or cross the range. Some hyperparameters in support regression model.
vector regression algorithm are C, epsilon, and kernel. After
3.7 Convolutional Neural Network
the implementation of the randomized search algorithm on the
well data using support vector regression algorithm as the base Convolution Neural Networks (CNNs) are a type of deep
model, the optimal values of the hyperparameters were C = 10, learning algorithm that is commonly employed in the analysis
epsilon = 1 and kernel = rbf. and interpretation of visual data, including images and videos.
CNNs are widely used for image classification, object recognition
3.5 Stacking Technique
and image segmentation. However, not only can CNNs be used
Stacking is a type of machine learning technique, whose for image classification, but they can also be used in regression-
algorithm is shown in Figure 10, that uses the predictive power of based projects, where it is purposed to predict continuous
different machine learning algorithms to make better predictions variables. A convolution neural network (CNN) usually consists
on datasets. The stacking technique typically involves the use of four components: convolutional layers, pooling layers, fully
of base models and a meta model. The base models are usually connected layers, and output layers, as shown in Figure 12.
common machine learning algorithms such as decision trees, These four components usually make for the architecture of
random forests, and support vector machines. These base models CNNs. The main difference between a CNN and a regression-
are trained on a dataset and are used to make predictions; these based CNN is the output layer (output layer) and loss function
predictions are then combined in a meta model, which can be (loss function). The output layer in a CNN based on regression is
linear regression or a neural network to make final predictions. distinct from that of a Softmax-based CNN. Instead of predicting
It is a powerful machine learning technique since it utilizes the class probabilities using a function of a Softmax, an output layer
diverse knowledge of the base models. The base models used for is typically composed of an individual neuron with a function
10
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

of a linear activation. This allows the network to produce a standard deviation of 1. The results obtained using the linear
continuous value immediately as a regression prediction. For regression model are presented in Table 5.
regression tasks, a loss function is often used to measure the
The random forest regression, KNearest neighbor, and
difference between predicted and actual target values. Examples
support vector regression (SVR) model were applied to the
of loss functions that are commonly used include MSE (mean
training data and test data after the optimal hyperparameters had
squared error) and MAE (mean absolute error).
been generated using the randomized search cv algorithm. The
train and test data were standardized such that data has a mean
of 0 and standard deviation of 1. The random forest, KNearest
neighbor, and support vector regression (SVR) models’ results
are presented in Tables 6–8 respectively.
Equally, the stacking and voting techniques were applied to
the training data and test data after the optimal hyperparameters
had been generated using the randomized search cv algorithm
for the base model used in the technique. The train and test data
were standardized such that data has a mean of 0 and standard
deviation of 1. The results obtained using the stacking and voting
Figure 12: Convolutional neural network39. techniques are presented in Tables 9 and 10 respectively.
3.8 Model Evaluation The convolutional neural network (CNN) model was applied
to the training data and test data using an epoch of 120 and a batch
There are metrics usually used to reflect how well the
size of 32 together with an output layer of 1. The architecture of
model has learnt patterns in the data and the performance of the
the CNN model created is as follows: two 1–D (one dimensional
model on the unseen (test) data set. There are metrics used for
convolutional layers), filters (32 and 64), kernel size of two, one
evaluating the performance of machine learning models. These
Global MaxPooling Layer, 5 hidden layers and 1 output layer.
metrics show how far a model’s prediction is from the true
The train and test data were standardized such that data has a
values. In this study, four error metrics are used to estimate a
mean of 0 and standard deviation of 1. The results obtained
model performance on the learning patterns in the dataset and
using the CNN model are presented in Table 11.
unseen data (test data). They are the mean absolute error (MAE),
root mean squared error (RMSE), mean squared error (MSE) and Table 4: Sample taken from well data used to build ML models.
2
coefficient of determination, R –score, given by Equations (9)
to (12) respectively.
n yˆ − y i
MAE = ∑ (9)
i =1 n
n
( yˆ − yi )2
RMSE = ∑ i =1 n
(10)

1 n yˆ − y i
MSE = ∑
n i =1 yi
(11)

∑ ( yˆ − y )
2

R2 =
i =1 (12)
n

∑ (y − y)
2
i
i =1 Table 5: Linear regression model results.
where ŷ , y i and y are the respective predicted, actual and Error metric Training data Test data
mean values and n the number of observations. RMSE 2.611 2.565
MSE 6.819 6.582
4. Results and Discussion
MAE 1.773 1.744
The well data after carrying out various statistical analyses, 0.639 0.639
the features were reduced from the previous 27 columns to 13 R 2 Score
columns, , as displayed in Table 4, which is an excerpt of the well Table 6: Random Forest model results.
data used for both training and testing. It shows a sample of the
Error Metric Training Data Test Data
data utilized after feature selection has been carried out, leaving
17280 rows and 13 columns. These data were then separated RMSE 0.469 0.676
using cross validation to train and test data respectively. The MSE 0.220 0.458
train data contained 10368 rows and 13 columns, while the test MAE 0.207 0.300
data contained 6912 rows and 13 columns. The linear regression 0.988 0.975
model was applied to the training data and test data after the Score R2
optimal hyperparameters had been generated. The train and
test data were standardized such that data has a mean of 0 and Table 7: KNearest neighbor model results.
11
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

Error Metric Training Data Test Data the KNearest Neighbour model. SVR model and linear
regression model performed better on the test data compared to
RMSE 0.523 0.724
their performances on the train data, indicating generalization of
MSE 0.309 0.524
the models and lack of overfitting on the training data.
MAE 0.201 0.239
In terms of the test (unseen) data, the stacking technique
2
0.984 0.971
Score R performed better than all the traditional ML models employed
in this study. The next to it on the ranking of the model that
Table 8: SVM model results. best performed on the test data was the Random Forest model,
followed by the KNearest Neighbour model, then the CNN
Error Metric Training Data Test Data
model, then the Voting technique, then the SVR model and lastly
RMSE 1.724 1.669 the linear regression model.
MSE 2.972 2.784
In terms of the RMSE, the stacking technique was 19% better
MAE 0.841 0.832
than the Random Forest model, 24% better than the KNearest
Score 0.843 0.847 Neighbour model, 27% better than the CNN model, 34% better
than the Voting technique, 67% better than the SVR model and
Table 9: Stacking technique results. 79% better than the linear regression model.
Error Metric Training Data Test Data
In terms of the MAE, the stacking technique and the Random
RMSE 0.306 0.548
Forest model had the same performance score of 0.30. The
MSE 0.033 0.423 stacking technique was still 47% better than the CNN model,
MAE 0.094 0.300 56% better than the Voting technique, 64% better than the SVR
0.98 0.976 model and 83% better than the linear regression model.
Score R2
Our findings in this investigation that the complex ML models
of Stacking, Voting and CNN have the capacity to perform better
Table 10: Voting Technique results.
than the traditional ML model was buttressed in the work of
Error Metric Training Data Test Data Burgos et al, which was equally corroborated in the study of
RMSE 0.803 0.826 Zhang et. al.19, where the CNN model developed outperformed
MSE 1.167 1.331 all the traditional ML models in terms of accuracy and reliability.
MAE 0.646 0.681 It can equally be deduced from this study that irrespective of the
0.938 0.926
architecture and predictive capacity of the ML model, traditional
Score R2 ML models, with proper feature engineering and hyperparameter
tuning, can perform better than more complex machine learning
Table 11: CNN results. models.
Error Metric Training Data Test Data
5. Conclusions
RMSE 0.797 0.751
MSE 1.167 1.331 A comparative analysis of machine learning algorithms in
MAE 0.636 0.564
predicting rate of penetration during drilling was carried out in
this study. Data was obtained from the Daily Drilling Report
0.924 0.928
Score R2 (DDR) for an oil well. The well contains data of 17280 rows
and 27 columns. The data preprocessing techniques of outlier
From the results displayed in Tables 5–11, the stacking handling, variable transformation and feature scaling were
technique performed better than all the models and techniques employed. Each of the seven machine learning techniques
employed in this study for the training data. Hence, the decreasing employed to predict the rate of penetration during drilling was
order of performance of the models for the training data is as able to extract meaningful information and patterns from the oil
follows: stacking technique > random forest model > KNearest well data. However, some models outperformed other models by
neighbor model > CNN model > Voting technique > SVR model a distance, which reflects the predictive power of the algorithms.
> linear regression model. In terms of the RMSE, the stacking The capacity of the stacking algorithm to combine the predictive
technique was 35% better than the random forest model, 41% power of each base model gave it an edge over the rest of the
better than the KNearest neighbor model, 62% better than the models. The voting technique performed well, but not measured
CNN model and Voting technique, 82% better than the SVR up to the performance of the stacking technique. Hence, the
model and 88% better than the linear regression model. In terms stacking technique is a more powerful ensembling technique
of the MAE, the stacking technique was 55% better than the than the voting technique. Amongst the base models, the random
Random Forest model, 53% better than the KNearest Neighbour forest and KNearest Neighbors models are robust since they
model, 85% better than the CNN model and Voting technique, performed well on both the train and test data, while the SVM
89% better than the SVR model and 95% better than the linear and linear regression models gave the highest errors on both
regression model. the train and test data but they also showed their generalization
capability and lower tendency to overfit. The CNN model has
For the testing data, generalizing across the four metrics, the capacity to perform well on regression-based task like rate
the stacking technique yet again out-performed other models. It of penetration predictions since it performed well on the test and
was only in terms of the MAE that the KNearest neighbor model train data.
outperformed the stacking technique by 20%, but in terms of the
Statements and Declarations
RMSE, MSE, R 2 Score, the stacking technique outperformed
12
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss: 1

Conflict of interest The authors declare that there is no conflict Mohamed I. Prediction of rate of penetration of deep and tight
of interest regarding the publication of this article. formation using support vector machine. In Proceedings of the
SPE Kingdom of Saudi Arabia Annual Technical Symposium
Funding The authors received no specific funding for this and Exhibition, Dammam, Saudi Arabia. 2018; SPE–192316–
work. Hence, the corresponding author confirms that there are MS.
no financial and personal relationships with other people or 18. Maurer WC. The, “perfect-cleaning” theory of rotary drilling. J
organizations that could inappropriately influence this study. Pet Technol 1962;14(11):1270-1274.
6. References 19. Zhang Y, Zhang X, Chen Y. Deep neural networks for predicting
rate of penetration in drilling. Journal of Petroleum Science and
1. Azar HF, Saksala T, Jalali SME. Artificial neural networks models Engineering 2018;165:734-743.
for rate of penetration prediction in rock drilling. J Structural
20. Zhao Y, Noorbakhsh A, Koopialipoor M, Azizi A, Tahir MM.
Mechanics 2017;50(3):252-255.
A new methodology for optimization and prediction of rate
2. Rupert JP, Padro CW, Blattel SR. The effects of weight material of penetration during drilling operations. Engineering with
type and mud formulation on penetration rate using invert oil Computers 2020;36:587-595.
systems. Paper presented at the Society of Petroleum Engineers
21. Abdulmalek A, Abdulwahab A, Salaheldin E, Abdulazeez A. New
(SPE) Annual Technical Conference and Exhibition 1981.
artificial neural networks model for predicting rate of penetration
3. Bourgoyne Jr AT, Young Jr FS. A multiple regression approach in deep shale formation. Sustainability 2019;11(22): 6527.
to optimal drilling and abnormal pressure detection. SPE J
22. Hareland G, Hoberock LL. Use of drilling parameters to predict
1974;14(04):371-384.
in-situ stress bounds. Paperpresented at the SPE/IADC Drilling
4. Bingham MG. A new approach to interpreting rock drillability. Conference. Netherlands 1993:SPE-25727-MS.
Technical Manual Reprint Oil & Gas Journal 1965: 1-93.
23. Ashrafi SB, Anemangely M, Sabah M, Ameri MJ. Application of
5. Elkatatny S. Real time prediction of rheological parameters of hybrid artificial neural networks for predicting rate of penetration
KCl water-based drilling fluid using artificial neural networks. (ROP): a case study from Marun oil field. Journal of Petroleum
Arabian Journal for Science and Engineering 2017;42:1655- Science and Engineering 2019;175:604-623.
1665.
24. Iqbal F. Drilling optimization technique using real time
6. Mahmoud AA, Elkatatny S, Chen W, Abdulraheem A. Estimation parameters. SPE Russian Oil & Gas Technical Conference and
of oil recovery factor for water drive sandy reservoirs through Exhibition, Moscow, Russia, 2008.
applications of artificial intelligence. Energies 2019;12(9):3671.
25. Burgos CE, Zhang T, Li J, Zhang C, Chen S. ROP prediction
7. Connor Shorten “Machine Learning vs. Deep Learning” Towards using convolutional neural networks for Paleozoic shale drilling.
Data Science. 2018. Journal of Petroleum Science and Engineering 2019;17:633-
641.
8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature
2015;521(7553):436–444. 26. Monazami M, Hashemi A, Shahbazian M. Drilling rate of
penetration prediction using artificial neural network: A case
9. Otchere DA, Ganat TOA, Gholami R, Ridha S. Application study of one of Iranian Southern oil fields. Journal of Oil and
of supervised machine learning paradigms in the prediction Gas Business 2012.
of petroleum reservoir properties: Comparative analysis of
ANN and SVM models. Journal of Petroleum Science and 27. Abbas AK, Rushdi S, Alsaba M, Al Dushaishi MF. Drilling rate of
Engineering 2021;200:108182. penetration prediction of high-angled wells using artificial neural
networks. J. Energy Resour. Technol 2019;141(11):112904.
10. Bilgesu HI, Tetrick LT, Altmis U, Mohaghegh S, Ameri S. A new
approach for the prediction of rate of penetration (ROP) values. 28. Miyora TO. 2014. Modeling and optimization of geothermal
Paper presented at the Society of Petroleum Engineers (SPE) drilling parameters: A case study of well MW-17 in Menengai
Eastern Regional Meeting 1997; SPE–39231–MS. Kenya, MS Thesis. University of Iceland 2014.

11. Arabjamaloei R, Shadizadeh S. Modeling and optimizing rate 29. Al-AbdulJabbar A, Elkatatny S, Mahmoud AA, et al. Prediction
of penetration using intelligent systems in an Iranian southern of the rate of penetration while drilling horizontal carbonate
oil field (Ahwaz oil field). Petroleum Science and Technology reservoirs using the self-adaptive artificial neural networks
2011;29(16):1637–1648. technique. Sustainability 2020;12(4):1376.

12. Bataee M, Mohseni S. Application of artificial intelligent systems 30. Wang K, Zhang Y, Zhang X, Wang Y. A hybrid ensemble
in ROP optimization: a case study in Shadegan oil field. Paper learning approach for rate of penetration prediction in oil and
presented at the Society of Petroleum Engineers (SPE) gas drilling. Journal of Petroleum Science and Engineering
Middle East Unconventional Gas Conference and Exhibition 2020;194:107424.
2011;SPE-140029-MS.
31. Liu N, Gao H, Zhen Z, Hu Y, Duan L. A stacked generalization
13. Warren TM. Penetration-rate performance of roller-cone bits. ensemble model for optimization and prediction of the gas well
SPE Drill Eng 1987;2(01):9–18. rate of penetration: a case study in Xinjiang. Journal of Petroleum
Exploration and Production Technology 2021;6:1595-1608.
14. AL-Mahasneh MA. Optimization Drilling Parameters
Performance during Drilling in Gas Wells. International Journal 32. Moraveji MK, Naderi M. Drilling rate of penetration prediction
of Oil, Gas and Coal Engineering 2017;5:19-26. and optimization using response surface methodology and bat
algorithm. Journal of National Gas Science and Engineering
15. Amar K, Ibrahim, A. Rate of penetration prediction and 2016;31:829–841.
optimization using advances in artificial neural networks, a
comparative study. In Proceedings of the 4th International Joint 33. Motahhari HR, Hareland G, Nygaard R, Bond B. Method of
Conference on Computational Intelligence 2012;1:647-652. optimizing motor and bit performance for maximum ROP. J Can
Pet Technol 2009;48(06):44-49.
16. Shi X, Liu G, Gong X, Zhang J, Wang J, Zhang H. An efficient
approach for real-time prediction of rate of penetration in offshore 34. Hareland G, Rampersad PR. Drag - Bit Model Including Wear.
drilling. Mathematical Problems in Engineering 2016;(Article ID America/Caribbean Petroleum Engineering Conference 1994:
3575380):1–13. SPE-26957-MS.

17. Ahmed A, Elkatatny S, Abdulraheem A, Mohammed M, Ali A, 35. Bourgoyne Jr AT, Millheim KK, Chenevert ME, Young Jr FS.

13
Olafadehan OA., et al., J Petro Chem Eng | Vol: 1 & Iss:1

Applied drilling engineering. SPE Textbook Series 1991;2:ISBN: 38. Javat (2022).
978-1-55563-001-0.
39. Pandey YN, Rastogi A, Kainkaryam S, Bhattacharya S, Saputelli
36. Quinlan JR. Induction of decision trees. Machine Learning L. Overview of Machine Learning and Deep Learning Concepts.
1986;1(1):81-106. Machine Learning in the Oil and Gas Industry 2020:75-152.
37. SametGirgin, Decision Tree Regression in 6 Steps with Python, 40. GeeksForGeeks (2022)
PursuitData (Medium). 2019.