Spe 196428 Ms
Spe 196428 Ms
Mustafa A. Al-Alwani, Missouri University of Science and Technology; Larry K. Britt, NSI Fracturing;
Shari Dunn-Norman, and Husam H. Alkinani, Abo Taleb T. Al-Hameedi, Missouri University of Science and
Technology; Atheer M. Al-Attar, Enterprise Products; Mohammed M. Alkhamis, Missouri University of Science and
Technology; Waleed H. Al-Bazzaz, Kuwait Institute For Scientific Research
This paper was prepared for presentation at the SPE/IATMI Asia Pacific Oil & Gas Conference and Exhibition held in Bali, Indonesia, 29-31 October 2019.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Big data has become a major topic in many industries. Most recently, the oil and gas industry adopted a
special interest in data science as a result of the increasing availability of public domains and commercial
databases. Utilizing and processing such data can help in making better future decisions. The aim of this
work is to provide an example and demonstrate methodologies on how to collect and utilize big data to help
in making better future decisions in the oils and gas industry.
After reading a good number of papers and books about the applications of data analysis in the oil and gas
industry, in addition to other industries, and given that data analysis is the area of expertise of the authors,
this paper was written to demonstrate real examples of data processing and validation workflows. This work
is intended to cover the gaps in the literature were many of the publications only discuss the importance
of data-driven analytics.
This paper provides an overview of the diverse and bulk data generating sources in the oil and gas
industry, starting from the exploration phase to the end of the lifecycle of the well. It provides an example of
utilizing a public domain database (FracFocus) and demonstrates a step by step workflow on how to collect
and process the data based on the objective of the analytics. Two real examples of descriptive and predictive
analytics are also demonstrated in this paper to show the power of having a diverse and multiple resources
databases. A framework of data validation and preparation is also shown to illustrate data quality checks
combined with best practices of data cleansing and outlier detection methodologies.
This paper provides a clear methodology on how to successfully apply data analysis which can serve as
a guide for some future data analysis applications in the oil and gas industry.
Introduction
The anecdotal saying about data is knowledge is not necessarily true. Data are generally recorded events
as they take place. Processing and analyzing the data yields knowledge. With gained knowledge, an
understanding of why things are happening can be gained and then utilized in the desired direction to
2 SPE-196428-MS
optimize the outcomes. The route data analysis studies must go through several phases starting with
the collection and understanding the data parameters, passing through data visualization and descriptive
analysis, especially for large and complex datasets. Visualization of the data will help in conveying insights
and overall comprehending of the common trends buried within the data. Good data visualization will also
help the non-specialist or non-technical individuals to be able to understand technical and specialized data.
After understanding the data and investigating the trends, modeling and testing hypotheses follow through
as the phase of predictive analytics which leads to prescriptive analytics. The goal of data analytics is to
Drilling Phase
Most of the modern drilling rigs used in drilling oil and gas wells are normally equipped with many sensors
to continuously record all the operation from the time the well is spud till the well is completed and ready
for the production phase. The safety and the performance are the two main key performance indicators
that the drilling engineers and the operators are actively trying to achieve. Data such as the weight on
bit (WOB), the rotary speed of the bit (RPM), pumping pressure (psi), torque (lb.ft), etc., are constantly
monitored and studied to optimize the drilling time and reduce the non-productive time (NPT). Multiple
sources of data are also recorded and collected by service companies on location and all the data gathered
and stored for each drilled well. Well logs are run several times during the drilling phase to collect formation
and fluid information with open hole logs that can be performed using wireline logs or logging while
drilling techniques. The cement bond logs are run to check for the quality of the cement around the casing
to make sure zonal isolation is achieved. Stress and geomechanical tests generate a large amount of data
associated with each well. Well control and safety monitoring data are also an important part of the data
collection and monitoring. Data such as the percentage of the gas in the mud while drilling through an active
reservoir formation and the type of the gases coming to surface with drilling mud (gas chromatography)
are constantly monitored and modeled to prevent accidents from happening. A large amount of the drilling
data that is generated, stored, and interpreted are dramatically increasing with time and as the regulations
require close monitoring and capturing more data, data analytics are becoming more important as the time
progresses. Drilling data can be utilized to minimize the cost and NPT to achieve an optimum drilling
4 SPE-196428-MS
operation (Alkinani et al., 2018a; Alkinani et al., 2018b; Alkinani et al., 2018c; Alkinani et al., 2019; Al-
Hameedi et al., 2017a; Al-Hameedi et al., 2017b; Al-Hameedi et al., 2017c; Al-Hameedi et al., 2018).
Production Phase
Production data are very important and always coupled with other phases. The objective in many of the
reservoir management and all drilling and completion techniques is to improve the productivity of the wells.
Production data are used in data science as a gauging parameter (response) to other parameters that the
Operations Phase
Operations encompass a big stream of structured and unstructured data. The data are varied and consists of a
large range of formats from complex 3D models to sensors data. The speed accumulated is also challenging
as a result of the enormous number of sensors that are incorporated in most of the operations. Applying
big data in the oil and gas industry drives the reduction in wells' shutdown or productivity impairment. Big
data analytics also helps in extending the operational lifetime of equipment by predicting the failure rates
and suggesting condition-based preventative maintenance. Real-time data streaming enables worldwide
operational support which improved the quality of the jobs while minimizing crew in remote locations
such as offshore environments. Service providers analyze the operational conditions of their downhole tools
such as mud motor and logging while drilling (LWD) by continuously analyzing data from several sensors
that measures the temperature, pressure, and vibration to predict failure and provide surface warning to
change the operational parameters and hence prevent potential tool failure and consequential NPT and loss
of money. Overall, in all phases of the oil and gas industry, data analytics are used to leverage data-based
decision making in all operations and processes.
In all the E&P phases, the dramatic growth in data generation is not useful by itself. The ability to utilize
and integrate the diverse data sources to seek useful insights and provide data-driven actions and decisions
is the main target of using big data approaches in the oil and gas industry.
to search and assemble the data needed for their projects (BruléGroup, 2015). A recent survey conducted in
2018 by General Electric and Accenture among oil and gas executives and 81% of the participants indicated
that big data and data acquisitions are on their top three priorities for 2018 (Mehta, 2018)
To start a data collection project, the first step is to identify the scope of the project and search for the
potential sources. The state and the local governments in the United States regulate the oil and gas industry
and require the operators to reports their data to the state which are also made publicly available for the
community. Texas railroad commission website (2019) is one example of a public data source that contains
Data Validation
Data validation defined as the process of trying to validate and verify if the value of a data point comes from
a known finite or infinite set of defined acceptable values (UNECE, 2013). Data validation is also defined
as the process of ensuring that the final dataset complies with several predetermined quality characteristics
(Simon, 2013). Di Zio et al. (2013) adopted a definition that considers the communication between the data
records on the variable level and on the field domain level which defines data validation as the process of
verifying that the dataset combination of values whether or not belongs to a set of acceptable combinations.
1 Checking with the element dataset, statistical This stage implementing ad-hoc rules, i.e. is the
information included in the file itself measured depth always greater or equal to true
vertical depth? Is flow value always positive?
2 Checking the integrity with all similar In this check level, the files are checked whether
files from a statistical point of view they are similar or different revisions of each other
3 Checking the integrity with all similar files from a In this check level, the files are checked
statistical point of view but from a different data source whether they are similar or different revisions
of each other but from a different data source.
4 Checking data that describes the same phenomenon Checking the well TVD from
but from different data sources or domains FracFocus and from DrillinInfo
5 The consistency of the data within different providers For example, calculating the gross perforated interval
from one dataset as MD of bottom perf – MD of
top perf should come closer to the reported gross
perforated interval reported in a different dataset
and in fact, they most likely are not. Nobakht and Mattar (2009) mentioned some error sources related
to production and injection data include:
i. Averaging the rates of a specific well from an adjacent group of wells.
ii. Incorrect assumptions (single-phase flow while GOR is increasing for example).
iii. Incorrect location of the pressure gauge (i.e. it is downstream while the choke is not fully
opened).
Box Plot
Box plots are a useful and easy visualization tool to detect outliers. Three are five elements of box plots
which are maximum, minimum, median, first and third quartiles as shown in Figure 5. The difference
between the first and the third quartiles is called the interquartile range (IQR). Data points fall outsides the
maximum or the minimum of the box plot whiskers can be potential outliers (Kirkman, 1992).
Where:
d (A, B) is the distance between A and B i.e. |Y(A) -Y(B)|
B ∈ Nk (A). i.e. the reachability distance between two points A and B is the true distance and at least
k-distance (B).
The local reachability density lrd is defined as:
(3)
(4)
biocide, corrosion inhibitor, surfactant, and clay control. Figure 9 shows the prediction model versus actual
initial gas production in Marcellus using PLS.
Conclusion
Data analytics has become important for the oil and gas industry due to the large data that are available in
the exploration, drilling, production, and operations. Utilizing the available data will help to make better
future decisions. However, there are many challenges in the process of collection, formatting, validation,
managing, and analyzing the data that require close attention from the people who work on the data. The
following conclusions were made based on this study:
• Big data analytics and the revolution of datafication helped companies and public administrations
to better understand the data, find previously unnoticeable patterns, and provide better solutions
for existing and future operations.
• There is a substantial transition in the oil and gas industry towards data-driven operations.
• Many operators have incorporated data science as part of their organizational structure and are
training the next generation of engineers to be hybrid engineers that are expert in their area of
specialization and data science.
• In all the E&P phases, the dramatic growth in data generation is not useful by itself. The ability to
utilize and integrate the diverse data sources to seek useful insights and provide data-driven actions
and decisions is the main target of using big data approaches in the oil and gas industry.
• To join two different databases, a common entity must exist between the two databases to be
identified as the joining parameter.
• Datasets, especially oil and gas data are case-specific when it comes to data cleaning and
transformation.
• Although the process of the data validation seems very ad-hoc process, there are many aspects
of data validation remain valid and applicable across all the types of the datasets in oil and gas.
It is also recommended that a walkthrough framework is implemented especially for oil and gas
applications.
SPE-196428-MS 15
References
Al-Hameedi AT, Alkinani HH, Dunn-Norman S, Flori RE, Hilgedick SA, Amer AS (2017a) Limiting Key Drilling
Parameters to Avoid or Mitigate Mud Losses in the Hartha Formation, Rumaila Field, Iraq. J Pet Environ Biotechnol
8: 345345. doi:10.4172/2157-7463.1000345.
Al-Hameedi, A. T. T., Alkinani, H. H., Dunn-Norman, S., Flori, R. E., Hilgedick, S. A., Alkhamis, M. M., Alsaba, M. T.
(2018, August 16). Predictive Data Mining Techniques for Mud Losses Mitigation. Society of Petroleum Engineers.
Doi: 10.2118/192182-MS.
Al-Hameedi, A. T., Dunn-Norman, S., Alkinani, H. H., Flori, R. E., & Hilgedick, S. A. (2017b, August 28). Limiting
Simon A. (2013). Definition of validation levels and other related concepts v01307.
Working document. Available from https://webgate.ec.europa.eu/fpfis/mwikis/essvalidserv/images/3/30/
Eurostat__definition_validation_levels_and_other_related_concepts_v01307.doc.
Texas Railroad Commission, website link: https://www.rrc.state.tx.us/, retrieved on March/02/2019
UNECE (2013). Glossary of terms on statistical data editing. Retrieved March 24, 2019, from http://www1.unece.org/
stat/platform/display/kbase/Glossary.
Vega-gorgojo, Guillermo & Fjellheim, Roar & Roman, Dumitru & Akerkar, Rajendra & Waaler, Arild. (2016). Big Data
in the Oil & Gas Upstream Industry - A Case Study on the Norwegian Continental Shelf. Oil Gas European Magazine.