Prediction Using Machine Learning
Prediction Using Machine Learning
A R T I C L E I N F O A B S T R A C T
Keywords: Machine Learning adoption within drilling is often impaired by the necessity to train the model on data collected
Directional drilling from wells analogous in lithology and equipment used to the well where the model is meant to be deployed.
Machine learning Lithology information is not always well documented and fast-paced development of drilling equipment com
Continuous learning
plicates the challenge even further, as a model would likely become obsolete and inaccurate when new tech
Inclination prediction
nologies are deployed. To bypass this problem a training-while-drilling method utilizing neural networks that are
capable of modelling dynamic behaviour is proposed. It is a continuous learning approach where a data-driven
model is developed while the well is being drilled, on data that is received as a continuous stream of information
coming from various sensors. The novelty in presented approach is the use of Recurrent Neural Network elements
to capture the dynamic behaviour present in data. Such model takes into account not only values of the adjacent
data, but also patterns existing in the data series. Moreover, results are presented with a focus on the continuous
learning aspect of the method, which was sparsely researched to date. A case study is presented where inclination
data is predicted ahead of the inclination sensor in a directional drilling scenario. Our model architecture starts
to provide accurate results after only 180 m of training data. Method, architecture, results, and benchmarking
against classical approach are discussed; full dataset with complete source code is shared on GitHub.
Credit statement pictures taken outdoors. Such problem was explored in practice when a
neural network was trained to discern dogs from wolves. Training
Andrzej Tunkiel, Conceptualization, Methodology, Software, Data dataset was made flawed on purpose, where pictures of dogs were taken
curation. Dan Sui, Formal analysis, Writing - review & editing, Super on grass, and pictures of wolves in the snow. This lead to the classifier
vision. Tomasz Wiktorski, Writing - review & editing, Supervision using snow as the key feature, and subsequently poor model perfor
mance (Ribeiro et al., 2016).
1. Introduction To solve this underlying issue, continuous learning (Liu, 2017)
methods could be used, where a model is continuously retrained while
Lack of adequate training data is one of the major issues preventing the well is being drilled. Data collected from a drilled section are used to
machine learning model deployment within petroleum. While in general train a model that can be applied to the further section of the same well.
it is relatively easy to develop data-driven models for problems like rate When additional section of a well is drilled, the process is repeated to
of penetration (ROP) prediction, such models will be valid only for wells create an updated model. Advances in computational power make
where geology, equipment, and general design matches closely the data-driven model training time negligible in comparison to time
training dataset. This is further corroborated by the lack of published required to drill a well making the training-while-drilling approach
general-purpose data-driven ROP prediction models. All machine feasible. Model training is often fast enough to be completed in the short
learning models face such challenge; if an algorithm is trained to detect breaks in the drilling process, such as adding a stand to the drillstring.
cats, but the dataset contains only cats indoors, it will struggle to classify Additional benefit of a dynamically trained model is that any
* Corresponding author.
E-mail address: [email protected] (A.T. Tunkiel).
URL: http://www.ux.uis.no/%7Eatunkiel/ (A.T. Tunkiel).
https://doi.org/10.1016/j.petrol.2020.108128
Received 24 June 2020; Received in revised form 24 September 2020; Accepted 9 November 2020
Available online 14 November 2020
0920-4105/© 2020 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
discrepancies between predictions and incoming data are used as a between the sensor and the bit.
feedback to improve the subsequent iteration of the model.
There is limited previous research related to such approach in dril 1.2. Innovation
ling. Data-driven rate of penetration (ROP) prediction models are
abundant in the latest literature: (Ahmed et al., 2019a, 2019b; Hegde There are a number of innovative elements in the presented paper.
and Gray, 2017, 2018; Hegde et al., 2015; Soares and Gray, 2019a; Only one prior published study was identified discussing the recreation
Sabah et al., 2019; Han et al., 2019; Shi et al., 2016; Mantha and Samuel, of sensor data using machine learning methods, apart from parts of the
2016; Eren and Ozbayoglu, 2010; Soares et al., 2016; Amar and Ibrahim, proposed method presented by the authors on the 39th International
2012; Eren and Ozbayoglu, 2010, 2010; Yi et al., 2014; Jiang and Conference on Ocean, Offshore & Arctic Engineering in August 2020
Samuel, 2016). There is however limited work related to continuously (Tunkiel et al., 2020a). Presentation was given (Koryabkin et al., 2019)
expanding dataset. Only few papers were identified where the training on similar topic applying basic regression algorithms, lasso, ridge,
to testing ratio was explored showing improvement over analytical random forest and gradient boosting, to predict a number of sensor
methods even at smallest training datasets (Hegde et al., 2017). Appli values lagging behind the bit. Achieved results showed relative error less
cation of continuous expanding of the training dataset was researched as than 16% for 80% of the tested data. Our research uses more advanced
well in other papers, such as (Hegde and Gray, 2017), applying random network architecture as well as is considered within continuous learning
forest algorithm to again predict the ROP, and (Soares and Gray, 2019b), environment. Applying machine learning allows for method deployment
where expanding dataset was used for ROP prediction implemented as when prior specific knowledge of the bit steering mechanism is not
changing train/test ratio, evaluating random forest, support vector necessary. Such exact information is on the other hand needed to follow
machines and neural networks, and comparing it to analytical models recently published analytical approach, such as performed by (Wang,
such as Bingham, and Bourgoyne and Young. No other analysis of 2017; Wang et al., 2020), where beam bending model is developed
continuous learning in drilling environment was identified. based on exact bottom hole assembly geometry and function.
To expand on this existing work, a novel model was developed that Another key innovative element presented in the case study is the
uses not only the real-time attributes as inputs from a specific time and application of continuous learning. This concept is related to lifelong
space, but also utilizes previous values; this is what this paper refers to as machine learning (Liu, 2017), where continuously expanding training
dynamic behaviour. It means that the model is aware of not only the dataset is used to evaluate samples from the immediate future. While
current state, but also of the previous values and how they change along there is significant research related to data stratification, i.e. the split
the data series, be it space or time, identifying the dynamics of the local ratio between training and testing datasets, such as (Anifowose et al,
environment. This is achieved through the use of Recurrent Neural 2011, 2017), it must be noted that this is a similar, yet different topic.
Network (RNN) (Rumelhart et al., 1986), where attribute values are fed Continuous learning mimics the real life learning, where immediate
to the network from multiple steps along the data series. future is predicted using all the past experiences, while stratification
To the best of our knowledge, no drilling related continuous learning studies consider a fixed dataset and the best way to split it. Presented
research was done that utilized the recurrent neural networks the way case study focuses of the models’ performance in the continuous
this paper proposes. This paper also performs a thorough analysis of how learning scenario in detail, which we were unable to identify in
the models’ performance change as the data is continuously acquired; literature.
we were unable to identify any drilling-related paper that would discuss Lastly, inclusion of past values as inputs via use of recurrent neural
this aspect in a comparable detail. networks is also a topic sparsly explored in research related to drilling.
While our novel approach does not produce results from the first Publications related to flow rate estimation (Chhantyal et al., 2018)
meter drilled, it requires relatively small dataset to start working reli utilized generic recurrent neural networks, as well as newer work on
ably. A case study is presented where lagging inclination data is pre kick detection (Osarogiagbon et al., 2020) utilized newer architecture of
dicted in a directional drilling scenario using a bent sub. It was selected Long-Short Term Memory. Our work expands on this by utilizing Gated
because the problem is sparsely explored in the existing research, and Recurrent Units, RNN cell first discussed in 2014 (Chung et al., 2014) in
the way that data behaves makes it a good candidate for a neural a continuous learning scenario, a combination that we were unable to
network model with recurrent elements. The applied model is based on identifiy in literature related to drilling.
our earlier work (Tunkiel et al., 2020a) where the problem of predicting The proposed solution is fast to deploy, requires no proprietary
lagging inclination data was first explored. In this paper, accuracy along software and can be run using any modern consumer-grade Graphics
the depth of the well is explored to evaluate method’s usefulness and Processing Unit (GPU), making the necessary investment very low.
applicability in real-life situations. Properly set up system automatically adapts to available data through
dimensionality reduction techniques discussed in the further chapters. A
1.1. Motivation single well data is required to validate the method for a given use case.
Given the auxiliary nature of the generated results, there is little to no
In the recent years, directional drilling became one the common risk in deploying the presented method to the field. The accuracy of the
drilling methods, especially in relation to shale developments (Wang method can be continuously monitored, since true values are measured
et al., 2018). Precise well placement is an important factor when it with 23 m lag relative to the prediction.
comes to the future well performance. Directional driller depends on the
values from downhole sensors to know where the well is being placed. 1.3. Machine learning methods used
One of the challenges is, that due to space constraints, those sensors are
at a significant distance from the bit, often tens of meters. This in turn Machine learning can be applied in various ways. Generally
creates a blind zone, a section of a well that is drilled, but the driller does speaking, an algorithm learns the correlations between inputs and out
not know where it exactly is, potentially leading to a delayed corrective puts that can later be exploited for prediction purposes. One of the
actions. methods of implementing this is to use data from a given moment in time
As the sensor data is delayed, decisions taken based on these sensors’ to predict a different, unknown parameter. For example, weight on bit
readings are delayed as well, leading to suboptimal well placement. (WOB) and drill bit’s rpm can be correlated with rate of penetration, so
With pay zones only 5–15 m thick, as in case of the Bakken field (Zou, that optimization can be done on the developed model to maximize the
2017), minimizing that delay distance in the directional readings is ROP. That correlation can be captured using various algorithms, such as
critical. The goal of this case study is to predict such continuous incli linear regression, decision trees, neural networks, gradient boosting and
nation readings that are yet to be made, predicting the well direction others. This approach will however not capture any dynamic behaviour
2
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
of the model. This can be rectified partially by calculating derivatives of particularly well, because it is equivalent to how data is collected while
inputs, but it will have a very limited impact. To fully capture dynamic drilling. In this approach initial results are poor due to small size of the
behaviour of a given model Recurrent Neural Networks (Rumelhart dataset, but the assumption is, that while the dataset expands the model
et al., 1986) are used. This is an architecture suited for data-series, such will outperform models created on data from offset wells, as it better
as speech, language processing, or drilling logs. Its internal structure is represents drilling currently at hand. Fig. 2 is meant to visually explain
well suited to take inputs both from the current state as well as a number the data split strategies discussed above.
of previous states. It contains a connection that feeds the output from
step t-1 to step t. The basic principle is shown in Fig. 1 on the left hand 2. Case study and model design
side. Practically this type of network is implemented in an unfolded
form, seen on the right hand side. Input x0 generates output h0 . At the The case study data from the open Volve dataset (Equinor, 2018;
next step, the network is fed both input x1 as well as the output h0 , Tunkiel et al., 2020b) was used, specifically the well F9A. It was chosen
generating new output h1 . The actual model of the case study uses Gated as it contained a relatively long section of the well without any data
Recurrent Units (Cho et al., 2014) as its RNN component. This archi issues in its depth-based log. It contains a curved section drilled with a
tecture was found to perform well on relatively small datasets (Chung bent sub motor, where inclination rises and falls in waves, as is char
et al., 2014), which is a key requirement for the training while drilling acteristic of this method, see Fig. 3 for reference. The sensor lag is
approach, where dataset gradually grows from empty while the well is introduced artificially in the data and is equal to 23 m, a value that is in
drilled. range of a typical BHA configration. This was necessary as the log in
Another important aspect of developing machine learning model is question contained already depth-corrected data, an operation that is
how the training and testing datasets are created. This is especially performed after the well is drilled, hence a reverse operation was needed
important in work related to drilling, where logs are data-series. Most for a case study. What the model predicts is the continuous inclination
common way of creating a train/test split is random sampling, where a data between the sensor and the bit location of each sample. Real-time
percentage of a dataset is randomly selected to be a part of a training or attributes are the input to the model, including Rate of Penetration,
testing. This is a method that cannot be used for predictive models in and Weight on Bit from all the locations behind the bit, hence over
drilling, since spurious correlations will inflate the testing result. Correct lapping with the continuous inclination prediction. Inclination from the
approach is to split the data into continuous sections, where first n% of a locations behind the sensor is used as in input to the RNN portion of the
well is used as training, and remainder is used as testing. This is the most network. This is explained in detail in further sections.
common way of performing a data split in research related to drilling.
A relatively new approach is continuous learning (Liu, 2017), where
2.1. Data preparation
training dataset is continuously growing, and predictions are done based
on training on all previous data. This approach fits field deployment
Raw data from the real-time drilling logs are rarely useable as-is. A
3
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
output attributes and how it affects the prediction was evaluated and it
was found that a reduction to 3 components from initial 51 attributes1
generates best results (Mean Square Error (MSE) = 0.035) in terms of
Fig. 3. Case study well inclination profile.
prediction error. The study for determining optimal number of PCA
components was performed through complete training-while-drilling
exercise, from 15% to 80% of available data, with 1% increments -
number of processes were applied to increase their quality. First, a
process explained in detail further in the paper. Results were on average
section of the well without any missing data was identified, in this case,
better than selecting all the attributes without PCA dimensionality
between 500 and 848 m measured depth. Since our approach relies on
reduction (MSE = 0.041). PCA-based results were also better than
neighbouring data in the model as an input, depth-steps in the data se
manual selection of attributes based on engineering judgement -
ries had to be made even. RadiusNeighbourRegressorm part of scikit-
approach applied to a related case study before (Tunkiel et al., 2020a)
learn (Pedregosa et al., 2011) was used to re-sample the data at even
(MSE = 0.048), where average surface torque, average rotary speed, and
depth intervals of 0.230876 m - median distance between datapoints in
rate of penetration were selected as inputs. Data from PCA dimension
the original dataset. Attributes that have missing data after resampling
evaluation results are shown in Fig. 5, where mean square training error
process are considered not complete enough and disregarded. If a sec
is plotted against the number of PCA components used, plus the refer
tion of the well is missing some attributes, it will get discarded from
ence values. The best solution, with 3 components, explains 88% of the
future predictions. Alternatively, one can develop a system where such
total variance. It is worth noting that standardization of data was not
section of the well may be ignored completely in the process to retain
performed. This process of substracting mean from sample values was
certain attributes in the model when they come back on-line.
tried through using RobustScaler, a solution from the Sklearn package
To include the past values information a windowing process was
(Pedregosa et al., 2011), and it produced overall inferior results.
applied. Referring to Fig. 4, a single input sample contains inclination
The reason why dimensionality reduction decreases the error of a
data from behind the sensor (already measured inclination values), as
model is most likely tied to overfitting and spurious correlations. As
well as real-time attributes from behind and ahead of the sensor. In the
explained earlier, inclusion of each real-time attribute in our case study
presented case study, the distance between the sensor and the bit is 23
increases number of inputs by 200. This results in 10 000 inputs if 50
m, divided into 100 discrete measurements. Distance behind the sensor
attributes were to be used. Such high number of inputs in a dataset as
taken as an input the model is also 23 m, divided into 100 discrete
(relatively) small as ours is bound to cause overfitting to some extent.
measurements. The output of the model is inclination values between
It must be noted that no prior attribute selection was performed. No
the sensor and the bit, also 23 m and 100 discrete values. Referring back
correlation matrices were calculated nor any other approach was
to Fig. 4, distances p and b are equal to 23 m; number of discrete steps,
applied. This is connected to the expected deployment of the method,
both n and m is equal to 100. The distance between steps is even and
were decision related to which attributes will be available during dril
approximately 0.23 m. This setup creates a model with a high number of
ling operation is not always known much in advance. Attribute selection
inputs and outputs. Each included real-time attribute adds 200 inputs,
is not trivial, and methods, such as mentioned correlation analysis are
since there are 100 values before and 100 values after the sensor. There
difficult to implement to work automatically; furthermore, the basic
are also 100 inputs related to inclination values. Presented case study
correlation methods will uncover only linear relationships. Therefore
has 51 useable real-time attributes. These are however reduced to 3
using all the available parameters through the PCA transformation is
attributes through principle component analysis (PCA), described in
proposed as a solution that can be done fully automatically without
further subsection, resulting in practice in 3x(100 + 100) + 100 =
manual intervention.
700inputs to the machine learning algorithm itself.
2.2.1. Nominal and incremental inclination data
2.2. PCA transformation Preparation of inclination data was different than for other param
eters. It is not immediately obvious if best results will be achieved while
In relation to input attributes, to simplify selection process, and easy predicting inclination data itself, or change in inclination (incremental
field deployment, all instantaneously available attributes are used. value, first derivative), therefore both approaches were evaluated in
Inclination data is stored separately, while all other data is compressed parallel. Use of inclination change is simpler, as it can be used directly
using Principle Component Analysis (Pearson, 1901), a dimensionality with (0,1) normalization. Use of actual inclination data is more difficult,
reduction method. Note that this reduces dimensions that the machine
learning algorithm is exposed to only, as the input to the complete setup
still takes all attributes. Resampled data is first normalized to a range (0,
1), fed through a PCA algorithm that reduces it to the prescribed amount 1
These are attributes such as Weight on Bit kkgf, Average Standpipe Pressure
of components, and normalized again to a range of (0,1). The number of kPa, Average Surface Torque kN.m, Rate of Penetration m/h, etc..
4
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
2
Fig. 6. Neural network architecture. https://github.com/AndrzejTunkiel.
5
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
smaller sample size, hence starting point of 15% was selected. Training evaluated case study. Hyperparameters stay constant throughout the
and validation subsets are continuous and the validation data boarders complete drilling operation and all the iterations of the model generated
with the recent end of the data; alternative strategies were tested for as new data becomes available.
locating the validation data, and the best results were achieved when it Tuning of these parameters was done using Bayesian optimization
was placed at the end. This split is necessary to implement early stop algorithm (Nogueira, 0000). Best parameters vary between nominal
ping, another method crucial for avoiding overfitting. The validation inclination and change in inclination approaches, with dropout layer at
data are not used in the backpropagation part of the training process 50 percent, Gaussian noise layer at standard deviation equivalent to 0.2
itself, but they are continuously evaluated while the model is trained. percent of full scale, and approximately 350 neurons in the RNN layer
Typically validation error drops together with training error along the and 10 and 100 neurons in final dense layer, depending on the predic
training epochs, but at the point where overfitting begins, it starts to tion approach. Specific values can be found in the source code provided.
increase. This is the point where training is stopped and the model with All tuning was done with early stopping, with patience at 25 epochs and
best validation score is retained. Data consisting of future 20 percent of saving only the best model.
the dataset is set aside for testing of the model from current iteration. Three datapoints were selected, with 30, 55 and 80 percent of dataset
20% is relatively big, and it was chosen to be indicative of a wider model used in the case study for training and validation as a basis for hyper
performance. It is also important to mention that the PCA dimension parameter tuning exercise. Average loss of these three points was used
ality reduction model is fit only on the available data within an iteration, when evaluating changing performance. Alternative methods are
and not on the testing data, as it is considered not available at the time of possible, such as evaluation based on the worst score, or evaluation
training. In other words, the PCA transformation rules (calculating the based only on most difficult sections, i.e. those with little data. Method
data covariance) are established only on the part of the dataset that is selection should be driven by specific objectives of the network under
considered known. Subsequent transformation is done on the dataset development. In our case study average overall performance was chosen
that contains the testing data. The inclination values are not a part of as the key factor and method selected accordingly. Only three percent
PCA transformation. The PCA model is later used for model evaluation, age points were selected to limit the time required for hyperparameter
as the input data have to be processed with the same PCA model that was tuning, which is notorious for being time consuming. Note that PCA
used for training. dimensionality reduction was not a part of final hyperparameter tuning.
Training process is repeated ten times to increase accuracy with two It was decided that this is a critical aspect of the model and therefore
competing strategies evaluated: a lottery ticket approach (Frankle and analysis of component quantity from 1 to 20 was performed separately,
Carbin, 2018), where the model with best validation score is later used as shown before in Fig. 5.
for testing, and an average of all ten models - results from both ap In the future, as computational power increases, hyperparameter
proaches are elaborated on in the results section. Next, the percentage of tuning prior to model deployment may not be necessary. As it is required
the well assumed to be drilled is increased by one percentage point and to evaluate hundreds of alternative hyperparameter configurations in
the complete training process is repeated. Increments can in practice be the tuning process, even models that are trained in mere minutes take
either shorter or longer. New models can be trained continuously and hours to become optimized. This time has to be significantly reduced, by
there is no underlying reason to artificially increase the intervals. two orders of magnitude, to perform it during the drilling operation it
Our implementation uses TensorFlow 2.1.0 with integrated Keras self. Considering current progress in the discipline this is unlikely to
library and Python 3.7. Model training was performed on Intel Core i7- happen in the next 10 years, unless new, more efficient algorithms are
8850H CPU, 32 GB of RAM and NVIDIA Quadro P2000 GPU with 5 GB of discovered.
GDDR5 memory providing Peak Single Precision FP32 Performance at 3
TFLOPS. Model training required 2–15 min (2–15 m of drilled well at 2.6. Overfitting
ROP of 60 m/h), depending on the simulated percentage size of the well
drilled. Predictions based on the trained model are for all intents and Proposed method was optimized for small datasets to provide useful
purposes calculated instantaneously. results as fast as possible. Small datasets are often prone to overfitting,
where a machine learning algorithm memorizes specific datapoints
2.5. Hyperparameter tuning instead of creating a method capable of generalizing. A number of
methods were applied to tackle this problem. Typical approaches to
Hyperparameter tuning is a process of adjusting various settings in overfitting are a dropout layer, where neurons are randomly dropped
the machine learning algorithm to increase its performance and is done while training, Gaussian noise layer, where artificial noise is added to
utilizing training and validation dataset. This poses a problem as our the signal and an architecture minimizing the number of neurons.
proposed method assumes no prior access to data. Performing hyper Another approach to overfitting reduction is an ensemble of models,
parameter tuning on similar dataset and with the same goals can be done which is explained in detail in the results section of this paper.
to overcome such issue. Such approach is utilized in other areas of
machine learning, for instance a neural network detecting cats and dogs 3. Results
will not call for new hyperparameter tuning when detection classes are
expanded to birds and rabbits since the problem at hand is technically Results from a single sample can be visualized by plotting the past
identical from the perspective of the neural network. This is not to be inclination data, predicted inclination data and ground-truth target
confused with requirement of training a model on a similar well. This values. The same method is used regardless of using nominal inclination
process is much more generic and likely not sensitive to geology or data or incremental inclination data. This gives a good representation of
equipment used. Hyperparameters found to be working well for our case the task at hand in terms of practical results that can be achieved. One
study are likely to provide good results when reused in model applica sample of such chart is shown in Fig. 8. Note how the inclination pre
tion to any bent sub directional drilling around the world. We were diction follows the same pattern and values relatively close to the actual
regrettably unable to evaluate and confirm this assumption due to lack data. The rotating potion of the bent sub drilling, where inclination is
of access to suitable dataset. temporarily constant is also well represented. Note that the complete
In our case study due to available data being limited to one well, cycle of build-hold-build takes approximately 20 m in our case study,
hyperparameter tuning was performed on the same well that was later and prediction window used is 23 m. The y-axis refers to a local coor
used for method evaluation. This was limited to layer size, dropout size, dinate system of a sample, where first, oldest inclination datapoint is
learning rate, kernel initialization variants, Gaussian noise levels and moved to zero.
batch size, hence should not artificially increase the performance of There are multiple ways of describing the error between the
6
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
Fig. 8. Sample result, 60% of the dataset for training and validation, Measured
Depth ca. 700 m, incremental inclination model.
7
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
Although the incremental model shows significantly lower MAE Fig. 13. Comparison between proposed approach and XGBoost.
8
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
While developing the method it became clear that the results often
time; increasing this value would continue to yield continuously smaller
vary between good and bad, even if calculated for identical data, only
improvements. Additionally, predicting inclination variant and pre
with different random seed. This is because training process has sto
dicting of inclination change was tested, resulting in total of four
chastic elements, such as weights and biases initialization. Distribution
different models. Results are shown in Fig. 16. When using the nominal
of MAE values for repeated runs is presented in Fig. 15 for both the
inclination model, the mean MAE dropped from 1.12 to 1.07, approxi
method predicting nominal inclination as well as predicting inclination
mately 5% improvement to the lottery ticket method. The inclination
change. Standard deviation at approximately 5 percent of mean MAE
change method also showed 5% improvement for the lottery ticket
suggests that it is possible to improve the accuracy. This is in line with
method, with the ensemble method actually increasing MAE, although
recent research discussing lottery ticket hypothesis, that network initiali
the standard distribution was significantly reduced. This suggests that
zation may be simply lucky and achieve better performance (Frankle
the lottery ticket brings tangible, modest improvements, and should be
and Carbin, 2018). Alternatively, average of models, otherwise known
used. There were approximately 40 simulations run for each ensemble
as ensemble, is often used to increase the prediction performance, which
and lottery ticket variant to find the distribution of the results.
is especially common in climate research (Goerss, 2000; Najafi and
The color refers to Mean Absolute Error; the scale was set such that
Moradkhani, 2015). Both approaches were evaluated by training the
yellow color is equivalent to 0.6 degree of average absolute error, a
model 10 separate times, and in one scenario selecting the model with
value tentatively deemed acceptable. Note that over the course of pre
best validation score, and in the other taking the average prediction of
dicted 23 m inclination can change value between minus 0.4◦ and plus
all the models. Repeating model training ten times was chosen as a
5◦ . What is worth highlighting is that some red areas are off the scale,
balance between increased performance and increased model training
above the 1.2◦ error visible as the deep red color. All predictions with
error above that threshold were considered useless.
9
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
Fig. 16. Lottery ticket and ensemble performance evaluation and b) showing nominal inclination method, and subplot c) and d) the incremental inclination method.
Lottery ticket approach is shown in subplots a) and c), and ensemble in subplots b) and d). The x-axis is the prediction distance, here 23 m, and y-axis refers to
measured depth drilled.
Multiple alternative models were discussed in this paper, namely two Fig. 17. Mapping best models.
including dynamic behaviour, nominal and incremental, and two stan
dard regression models, MLP and XGBoost. All of these approaches seem
proposed method achieve that goal for 23 m of prediction most of the
to have strong and weak sides related to how much training is necessary
time. With multiple inputs decomposed to only three via PCA method,
and how far the prediction could be done with acceptable results. To
the model can be applied with little analysis in terms of available at
indicate which one performs best in which area, a figure was created
tributes, significantly reducing the workflow related to hyperparameter
identifying the best out of four models.
tuning.
For each point relating to specific distance drilled and prediction
Further work is needed to verify the method’s applicability to pre
distance the best performing model was selected and plotted with an
dicting sensor readings of other attributes, such as gamma ray, neutron
individual color. Additionally, areas with Mean Absolute Error above
measurement, and others; and to fully quantify its potential in drilling.
0.6◦ were truncated indicating that none of the explored methods
Presented method may also find applications in non-petroleum areas
worked sufficiently well. Results are shown in Fig. 17. Note that the
such as weather forecasting and motion capture technologies, creating
marker size decreases with the rising error, giving an additional visual
models through continuous learning filling in data for failed sensors,
clue about the performance. The area of the chart is overwhelmingly
obscured markers, and data delayed for other reasons.
occupied by both proposed models with dynamic behaviour, with
simpler alternatives occupying very small portions of it, especially early
in the well. This again confirms previously stated conclusions, that the Declaration of competing interest
simpler models learn faster, but as the training set expands, the more
complex ones prevail. The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
5. Conclusion the work reported in this paper.
Presented method tailored for continuous learning shows good per Acknowledgements
formance in the case study of predicting sensor data during directional
drilling with bent motor. With existing methods being able to predict We would like to express gratitude to Equinor for providing funding
only nearest 7 m while keeping the mean absolute error under 0.6◦ , our for this research through Equinor Akademia Program.
10
A.T. Tunkiel et al. Journal of Petroleum Science and Engineering 196 (2021) 108128
References Koryabkin, V., Semenikhin, A., Baybolov, T., Gruzdev, A., Simonov, Y., Chebuniaev, I.,
Karpenko, M., Vasilyev, V., 2019. Advanced data-driven model for drilling bit
position and direction determination during well deepening. https://doi.org/
Ahmed, A., Ali, A., Elkatatny, S., Abdulraheem, A., 2019a. New artificial neural networks
10.2118/196458-MS.
model for predicting rate of penetration in deep shale formation. Sustainability 11,
Liu, B., 2017. Lifelong machine learning: a paradigm for continuous learning. Front.
6527. https://doi.org/10.3390/su11226527.
Comput. Sci. 11, 359–361. https://doi.org/10.1007/s11704-016-6903-6.
Ahmed, O.S., Adeniran, A.A., Samsuri, A., 2019b. Computational intelligence based
Mantha, B., Samuel, R., 2016. ROP optimization using artificial intelligence techniques
prediction of drilling rate of penetration: a comparative study. J. Petrol. Sci. Eng.
with statistical regression coupling. In: Proceedings - SPE Annual Technical
172, 1–12. https://doi.org/10.1016/j.petrol.2018.09.027.
Conference and Exhibition. Society of Petroleum Engineers (SPE). https://doi.org/
Amar, K., Ibrahim, A., 2012. Rate of penetration prediction and optimization using
10.2118/181382-ms.
advances in artificial neural networks, a comparative study. In: IJCCI 2012 -
Najafi, M.R., Moradkhani, H., 2015. Multi-model ensemble analysis of runoff extremes
Proceedings of the 4th International Joint Conference on Computational
for climate change impact assessments. J. Hydrol. 525, 352–361.
Intelligence, pp. 647–652. https://doi.org/10.5220/0004172506470652.
Nogueira, F.. Open source constrained global optimization tool for {Python}. https://gith
Anifowose, F., Khoukhi, A., Abdulraheem, A., 2011. Impact of training-testing
ub.com/fmfn/BayesianOptimization.
stratification percentage on artificial intelligence techniques: a case study of porosity
Osarogiagbon, A., Muojeke, S., Venkatesan, R., Khan, F., Gillard, P., 2020. A New
and permeability prediction. 5th Global Conference on Power Control and
Methodology for Kick Detection during Petroleum Drilling Using Long Short-Term
Optimization.
Memory Recurrent Neural Network. Process Safety and Environmental Protection.
Anifowose, F., Khoukhi, A., Abdulraheem, A., 2017. Investigating the effect of
Pearson, K., 1901. LIII. On lines and planes of closest fit to systems of points in space. The
training–testing data stratification on the performance of soft computing techniques:
London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2,
an experimental study. J. Exp. Theor. Artif. Intell. 29, 517–535.
559–572. https://doi.org/10.1080/14786440109462720.
Chen, T., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. In: Proceedings of
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
the ACM SIGKDD International Conference on Knowledge Discovery and Data
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Mining. Association for Computing Machinery, New York, NY, USA, pp. 785–794.
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine
https://doi.org/10.1145/2939672.2939785.
learning in Python. J. Mach. Learn. Res. 12, 2825–2830.
Chhantyal, K., Hoang, M., Viumdal, H., Mylvaganam, S., 2018. Flow rate estimation
Ribeiro, M.T., Singh, S., Guestrin, C., 2016. Why should i trust you?” Explaining the
using dynamic artificial neural networks with ultrasonic level measurements. In:
predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International
Proceedings of the 9th EUROSIM Congress on Modelling and Simulation, EUROSIM
Conference on Knowledge Discovery and Data Mining, pp. 1135–1144.
2016, the 57th SIMS Conference on Simulation and Modelling SIMS 2016. Linköping
Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-
University Electronic Press, pp. 561–567.
propagating errors. Nature 323, 533–536. https://doi.org/10.1038/323533a0.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
Sabah, M., Talebkeikhah, M., Wood, D.A., Khosravanian, R., Anemangely, M.,
Bengio, Y., 2014. Learning phrase representations using RNN encoder-decoder for
Younesi, A., 2019. A machine learning approach to predict drilling rate using
statistical machine translation. EMNLP 2014 - 2014 Conference on Empirical
petrophysical and mud logging data. Earth Sci. India 12, 319–339. https://doi.org/
Methods in Natural Language Processing. Proceedings of the Conference 1724–1734.
10.1007/s12145-019-00381-4.
https://doi.org/10.3115/v1/d14-1179.
Shi, X., Liu, G., Gong, X., Zhang, J., Wang, J., Zhang, H., 2016. An efficient approach for
Chollet, F., others, 2015. Keras. \url{https://keras.io}.
real-time prediction of rate of penetration in Offshore drilling. Math. Probl Eng.
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical evaluation of gated
2016, 3575380. https://doi.org/10.1155/2016/3575380.
recurrent neural networks on sequence modeling. http://arxiv.org/abs/1412.3555.
Soares, C., Daigle, H., Gray, K., 2016. Evaluation of PDC bit ROP models and the effect of
Equinor, 2018. Volve field data (CC BY-NC-SA 4.0). https://www.equinor.com/en/news
rock strength on model coefficients. J. Nat. Gas Sci. Eng. 34, 1225–1236. https://doi.
/14jun2018-disclosing-volve-data.html.
org/10.1016/j.jngse.2016.08.012.
Eren, T., Ozbayoglu, M.E., 2010. Real time optimization of drilling parameters during
Soares, C., Gray, K., 2019a. Real-time predictive capabilities of analytical and machine
drilling operations. In: SPE Oil and Gas India Conference and Exhibition. Society of
learning rate of penetration (ROP) models. J. Petrol. Sci. Eng. 172, 934–959. https://
Petroleum Engineers. https://doi.org/10.2118/129126-MS.
doi.org/10.1016/j.petrol.2018.08.083.
Frankle, J., Carbin, M., 2018. The lottery ticket hypothesis: finding sparse, trainable
Soares, C., Gray, K., 2019b. Real-time predictive capabilities of analytical and machine
neural networks.
learning rate of penetration (ROP) models. J. Petrol. Sci. Eng. 172, 934–959. https://
Goerss, J.S., 2000. Tropical cyclone track forecasts using an ensemble of dynamical
doi.org/10.1016/j.petrol.2018.08.083.
models. Mon. Weather Rev. 128, 1187–1193.
Tunkiel, A.T., Wiktorski, T., Sui, D., 2020a. Continuous drilling sensor data
Han, J., Sun, Y., Zhang, S., 2019. A data driven approach of ROP prediction and drilling
reconstruction and prediction via recurrent neural networks. In: Submitted to
performance estimation. International Petroleum Technology Conference. https://
Proceedings of the International Conference on Offshore Mechanics and Arctic
doi.org/10.2523/iptc-19430-ms.
Engineering - OMAE.
Hegde, C., Daigle, H., Millwater, H., Gray, K., 2017. Analysis of rate of penetration (ROP)
Tunkiel, A.T., Wiktorski, T., Sui, D., 2020b. Drilling dataset exploration, processing and
prediction in drilling using physics-based and data-driven models. J. Petrol. Sci. Eng.
interpretation using Volve field data. Submitted to Proceedings of the International
159, 295–306. https://doi.org/10.1016/j.petrol.2017.09.020.
Conference on Offshore Mechanics and Arctic Engineering - OMAE.
Hegde, C., Gray, K., 2018. Evaluation of coupled machine learning models for drilling
Wang, G., Long, S., Ju, Y., Huang, C., Peng, Y., 2018. Application of horizontal wells in
optimization. J. Nat. Gas Sci. Eng. 56, 397–407. https://doi.org/10.1016/j.
three-dimensional shale reservoir modeling: a case study of Longmaxi–Wufeng
jngse.2018.06.006.
shale in fuling gas field, Sichuan basin. AAPG (Am. Assoc. Pet. Geol.) Bull. 102,
Hegde, C., Gray, K.E., 2017. Use of machine learning and data analytics to increase
2333–2354. https://doi.org/10.1306/05111817144.
drilling efficiency for nearby wells. J. Nat. Gas Sci. Eng. 40, 327–335. https://doi.
Wang, H., 2017. Drilling trajectory prediction model for push-the-bit rotary steerable
org/10.1016/j.jngse.2017.02.019.
bottom hole assembly. Int. J. Eng. 30, 1800–1806.
Hegde, C., Wallace, S., Gray, K., 2015. Using trees, bagging, and random forests to
Wang, M., Li, X., Wang, G., Huang, W., Fan, Y., Luo, W., Zhang, J., Zhang, J., Shi, X.,
predict rate of penetration during drilling. In: Society of Petroleum Engineers - SPE
2020. Prediction model of build rate of push-the-bit rotary steerable system. Math.
Middle East Intelligent Oil and Gas Conference and Exhibition. Society of Petroleum
Probl Eng. 2020, 4673759. https://doi.org/10.1155/2020/4673759.
Engineers. https://doi.org/10.2118/176792-MS.
Yi, P., Kumar, A., Samuel, R., 2014. Real-time rate of penetration optimization using the
Jain, A.K., Mao, J., Mohiuddin, K.M., 1996. Artificial neural networks: a tutorial.
shuffled frog leaping algorithm (SFLA). In: Society of Petroleum Engineers - SPE
Computer 29, 31–44.
Intelligent Energy International 2014. Society of Petroleum Engineers (SPE),
Jiang, W., Samuel, R., 2016. Optimization of rate of penetration in a convoluted drilling
pp. 116–125. https://doi.org/10.2118/167824-ms.
framework using ant colony optimization. In: SPE/IADC Drilling Conference.
Zou, C., 2017. Unconventional Petroleum Geology. Elsevier.
Proceedings, Society of Petroleum Engineers (SPE). https://doi.org/10.2118/
178847-ms.
11