0% found this document useful (0 votes)
45 views8 pages

Ipc2022-87872 Structured, Systematic Threat Based Approach To Evaluate and Improve

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

Ipc2022-87872 Structured, Systematic Threat Based Approach To Evaluate and Improve

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the ASME 2022 14th International Pipeline Conference

IPC2022
September 26-30, 2022, Calgary, Alberta, Canada

IPC2022-87872

STRUCTURED, SYSTEMATIC THREAT BASED APPROACH TO EVALUATE AND IMPROVE


DATA QUALITY TO FACILITATE DIGITAL TRANSFORMATION

Pushpendra Tomar Betsy Kruse Samah Hasan Sergiy Kondratyuk


Dynamic Risk Enbridge Dynamic Risk Dynamic Risk
The Woodlands, TX Houston, TX Calgary, Alberta Toronto, Ontario

ABSTRACT The data quality assessment involved creating a


Pipeline operators are rapidly and increasingly moving comprehensive list of data elements required to assess a
towards digital transformation in order to harvest efficiencies particular threat, prioritizing data elements, and documenting
and achieve higher levels of reliability and safety. Fueled by data storage by the source system. The data quality was then
advances in technology such as cloud computing and machine evaluated using Key Performance Indicators (KPIs),
learning, data is considered a key asset, and pipeline operations establishing a baseline.
are increasingly driven by information and analytics. However,
successfully achieving a digital transformation toward reliable An organization's Process maturity varies from level one
and high-quality data requires mature processes for obtaining, (Initial) to level five (Optimized). The Process maturity of the
managing, evaluating, and continuously improving data quality. Operator was assessed on five evaluation areas: Governance,
Organization & People, Data Standards, Requirements &
During a review of pipeline risk assessment results, a Metrics, Process Efficiency, Technology & Tools. Results of the
pipeline operator (Operator) found that risk results for a evaluation led to the identification of actionable gaps.
particular pipeline were driven by the mainline coating type
being listed as "un-coated." However, further review of the The process, as developed, leverages guidance provided in
records showed that the pipeline, in fact, was coated. ISO (8000-8) [3] for data quality assessment and DNVGL-RP-
0497 [4] for Process maturity evaluation. This paper presents a
One of the Operator's foundational principles is 'data as an step-by-step approach developed for and successfully employed
asset'. Thus, the Operator understands the critical impact of such by the Operator as applied to pipeline integrity threats.
data inconsistencies across many potential receptors, from Keywords: Data quality evaluation, Data maturity, Digital
financial impacts to public safety. Additionally, mature processes transformation, Bayesian network, KPIs, Process maturity, a
enhance confidence in prioritizing the "right work." step-by-step approach

Data quality is essential for the use of historical data, NOMENCLATURE


interoperability across various data systems, and generation of
useful analytics. The data quality process maturity (Process BN Bayesian Network
maturity) evaluation aims to assess all processes, capabilities, CP Cathodic Protection
and governance required for ensuring high data quality. As a DQA Data Quality Assessment
result, the Operator decided to rigorously evaluate their data DQI Data Quality Index
quality and the maturity of data quality processes. EC External Corrosion
ECDA External Corrosion Direct Assessment

1 © 2022 by ASME
GIS Geographic Information System 1, Data quality is part of data management, and data management
ILI In-Line Inspection is part of data governance.
ISO International Organization for Standardization
KPI Key Performance Indicator
QA Quality Assurance
QC Quality Control
SME Subject Matter Expert
SoA System of Access
SOP Standard Operating Procedures
SoR System of Record
SoT Source of Truth

1. INTRODUCTION
The importance of having complete, consistent, and reliable
data for operations and risk assessment, particularly as the
industry moves towards a quantitative risk assessment approach,
is significant. Two essential components of data quality
management are continuous data quality assessment and
maturity of data quality processes and capabilities. The data
quality process maturity (Process maturity) and data quality are FIGURE 1: TYPICAL DATA GOVERNANCE HIERARCHY FOR
expected to have a cause-and-effect relationship, i.e., mature AN ORGANIZATION
processes are likely to lead to higher quality data in an
organization. Thus, data governance holds the key to ensuring efficient and
effective use of data within the organization. FIGURE 1 shows
High-quality data leads to confident decision-making related to that data quality management is achieved with data governance.
risk assessment and integrity investing across the organization.
In addition, as organizations are progressively driven by The approach presented in this paper was founded on two
information and analytics, mature data quality processes enhance essential components of data quality management:
confidence in prioritizing the "right work." • Data quality assessment
• Process maturity evaluation
This paper focuses on the approach developed to assess the data
quality and Process maturity for a gas pipeline operator 3. METHODOLOGY
(Operator). The approach was centered on the external corrosion The approach includes the following steps:
(EC) threat. The process, as developed, is fundamentally based 1. Step 1: Baseline data quality assessment: This
on the guidance provided in ISO (8000-8) [3] for data quality step involves performing a baseline data quality
assessment and DNVGL-RP-0497 [4] for Process maturity assessment that includes developing Data Quality
evaluation. Indices (DQIs) or Data Quality Key Performance
Indicators (KPIs).
2. BACKGROUND 2. Step 2: Process maturity evaluation: This step
During a review of pipeline risk assessment results, the involves the development of a framework to assess
pipeline operator (Operator) found that risk results for a all processes, capabilities, and governance required
particular pipeline were driven by the mainline coating type to ensure high data quality within an organization.
being listed as "un-coated." However, further review of the 3. Step 3: Implement process maturity
records showed that the pipeline, in fact, was coated. improvements (Future Steps): This step involves
identifying action items for Process maturity
One of the Operator's foundational principles is 'data as an asset'. improvements and executing the specified action
Thus, the Operator understands the critical impacts, from items.
financial to public safety impacts, of such data inconsistencies 4. Step 4: Periodic reassessment of data quality
across many potential receptors. (Future Steps): This step involves a periodic
reassessment of data quality to measure the
Data governance defines the policies, processes, roles, and effectiveness of Process maturity improvements.
responsibilities required for continuous monitoring and
improvement in data quality. FIGURE 1 shows a typical data An organization's Process maturity evaluation provides a
governance hierarchy for an organization. As shown in FIGURE measure of the processes, capabilities, and governance required
to ensure high data quality. Ideally, improvements in the maturity

2 © 2022 by ASME
level of an organization should translate to improvements in its  Source of Truth (SoT): The reference to which data
data quality assessment results. An efficient approach to do so users can turn when they want to ensure they have the
could comprise properly sequenced improvements in Process correct version of a piece of information.
maturity and data quality. Advances in Process maturity are
likely to result in improved results from data quality The SoR was determined for all the 329 EC data elements. In
assessments. A periodic data quality assessment could assess the addition, a comprehensive system map was created that
effectiveness of Process maturity improvements. Data quality demonstrated the flow of information between the different SoRs
metrics discussed further in this paper could be used to capture within the Operator's data systems.
the performance of continuous improvements in Process
maturity using trend lines, and activities could be adjusted The third task in the baseline data quality assessment was to
according to negative or positive trends. assign a priority from one (1) through three (3) to all the 329 EC
data elements. The priorities were assigned based on the impact
4. BASELINE DATA QUALITY ASSESSMENT of the data element. For example, data elements that are required
Data is defined in ISO (8000-8) [3] as: for regulatory compliance or essential for evaluating the threat
"Reinterpretable representation of information in a were given a priority of one (1). Whereas data elements that are
formalized manner suitable for communication, interpretation, nice to have information and would provide increased accuracy
or processing. The ability to create, collect, store, maintain, and granularity were assigned a priority on three (3). The priority
transfer, process, and present information and to support of the data elements was evaluated with respect to six EC
business processes in a timely and cost-effective manner requires processes: In-Line Inspection (ILI), Excavation, Cathodic
both an understanding of the characteristics of the information Protection (CP), External Corrosion Direct Assessment (ECDA),
and data that determine its quality, and an ability to measure, Geographic Information System (GIS), and Risk Process.
manage and report on information and data quality."
The results of the above three (3) tasks are presented in FIGURE
Information and data quality are defined and measured according 2. In addition, a list of observations was prepared to identify
to the following categories: action items that can help in an improved understanding of the
 Syntactic quality is the degree to which data conforms characteristics of the 329 EC data elements.
to its specified syntax, i.e., requirements stated by the
metadata.
 Semantic quality is the degree to which data
corresponds to what it represents.
 Pragmatic quality is the degree to which data is found
suitable and worthwhile for a particular purpose.

To understand the characteristics of all data elements required to


effectively manage EC threat within the Operator's gas
transmission system, the first task in the baseline data quality
assessment was preparing a comprehensive list of data elements
required to manage the EC threat. An extensive list of data
elements was created in consultation with the Operator's subject
matter experts, reviewing all relevant standard operating
FIGURE 2: RESULTS of SoR MAPPING OF EC DATA
procedures (SOPs) and reviewing appropriate regulations and
ELEMENTS
standards. This task resulted in a list with a total of 329 EC data
elements. The findings and results of the above three steps are of utmost
importance in performing the baseline data quality assessment as
The second task in the baseline data quality assessment was it guides the next steps. The next step was to design the structure
identifying the origin and transfer of the 329 EC data elements for performing the baseline data quality assessment. Two major
between the data systems. The three possible sources of data are: components of the data quality evaluation framework are data
quality metrics (metrics) and data quality dimensions.
 System of Record (SoR): An authoritative system
where data is created/captured and maintained through The metrics use a DQI or KPIs to quantify the relationships
a defined set of rules and expectations. between the total number of records in scope and the portion of
 System of Access (SoA): An authoritative system records passing a criterion. At the same time, the data quality
where data consumers can obtain reliable data to dimensions (dimensions) provide a classification of data
support transactions and analysis, even if the requirements that data and datasets should meet. The metrics are
information did not originate in the system of access. commonly grouped into dimensions, and the metrics define the

3 © 2022 by ASME
measurements to be performed for the selected dimensions.
Some examples of dimensions are Accuracy, Conformance to An organization's internal requirements, nature of business,
metadata/schema, Precision, Timeliness, Format/structural culture, and priorities affect how its data quality activities are
Consistency, and Completeness. FIGURE 3 shows a designed, built, operated, and monitored. As a result, an
representative relationship between metrics and dimensions. organization's Process maturity varies from Level 1 (Initial) to
Level 5 (Optimized). FIGURE 5 shows the characteristics of
Process maturity levels for an organization.
Dimension Completeness

Metrics OD WT Grade Coating Type

FIGURE 3: SAMPLE STRUCTURE of DATA QUALITY


ASSESSMENT

ISO (8000-8) [3], DNVGL-RP-0497 [4], and the Operator's


existing data quality dimensions guided the data quality
evaluations. The metrics were grouped into five (5) dimensions:
1. Completeness: A measurement of the availability of
required data attributes.
2. Coverage: A measurement of the availability of
required data records.
3. Uniqueness: A measurement of the availability of FIGURE 5: DATA MATURITY LEVELS FOR AN
unique data records for every data attribute. ORGANIZATION AS PER DNVGL-RP-0497 [4]
4. Consistency: A measurement of the degree to which
data records are consistent, i.e., in agreement across Organizations with a low level of maturity (e.g., Level 1) show
different data systems. a lack of data quality management. In contrast, higher levels of
5. Timeliness: A measurement of the degree to which data data maturity are associated with increased data quality
is representative of current conditions, i.e., collected awareness, continuous improvements, and well-governed,
and made available in a timely manner. enterprise-wide data quality processes. The Operator selected
Level 3 (Defined) as the target level for this assessment because
FIGURE 4 shows an example of results on a demonstration Level 3 represents a gradual transition between the two
database. endpoints (Level 1 versus Level 5).

The Process maturity evaluation was performed on the same six


EC Processes used for data quality evaluation, i.e., In-line
Inspection (ILI), Excavation, Cathodic Protection (CP), External
Corrosion Direct Assessment (ECDA), Geographic Information
System (GIS Systems) and Risk Process. FIGURE 6 shows an
overview of the Process maturity evaluation methodology. Every
EC process was subdivided into subprocesses and assessed in
five evaluation areas:

A1. Governance
A2. Organization & People
A3. Data Standards, Requirements & Metrics
A4. Process Efficiency
FIGURE 4: COMPLETENESS DATA QUALITY EVALUATION A5. Technology & Tools
ON A DEMONSTRATION DATABASE

5. PROCESS MATURITY EVALUATION


Process maturity evaluation is a framework that aims to
assess all processes, capabilities, and governance that are
required for ensuring high data quality. The Process maturity
evaluation was based on recommended practice DNVGL-RP-
0497 [4].

4 © 2022 by ASME
6. IMPLEMENT PROCESS MATURITY
IMPROVEMENTS (FUTURE STEPS):
A review of the Operator's Process maturity evaluation
results with the respective data owners for every EC process
resulted in the identification of opportunities for improvement in
all five evaluation areas across the six EC processes.

Identifying the correct work necessary to effect the desired


improvement is critical, but prioritizing the identified work for
efficient execution can be a complex challenge. Maturity
improvement decisions based on convenience, familiarity, or
arbitrarily may not be the most efficient. Therefore, a path
forward regarding maturity improvement should include a model
FIGURE 6: PROCESS MATURITY EVALUATION
that would support optimal decision-making by prioritizing
METHODOLOGY OVERVIEW
potential improvements to achieve the desired maturity target
A Process maturity evaluation tool (maturity evaluation tool) level and alignment with other integrity data work. In other
was developed in the form of a questionnaire. Data owners were words, opportunities that could provide the most improvement
identified for each EC process, and their feedback was requested with the least amount of relative cost/effort need to be identified.
on the maturity evaluation tool. The maturity evaluation tool
required the data owners to review two statements per maturity A Bayesian Network (BN) model was developed to support the
level [i.e., level one (1) through three (3)] per evaluation area optimal decision-making problem outlined in the preceding
[i.e., evaluation area one (1) through five (5)]. Therefore, data paragraph. Bayesian Network (BN) models [1] are probabilistic
owners were required to review a total of 30 (2×3×5) statements graphical models that represent a causal relationship between
per EC sub-process. The data owners could provide their repose successive nodes, where each node represents a variable. In the
as either "Correct" or "Incorrect," and their responses were context of this application, each node corresponds to a
scored to evaluate the maturity level for every EC subprocess. knowledge domain, and arrows correspond to the probabilistic
The data owners for every EC process consulted with the dependence. FIGURE 8 presents a snapshot of a BN model
appropriate SMEs to identify the correct response for every developed using the QGeNIe software application from
statement. A score of one (1) was considered a perfect score, and BayesFusion, LLC [2]. Each node in the BN model presented in
the lowest possible score was zero (0). The heat map shown in FIGURE 8 corresponds to a Process maturity score presented in
FIGURE 7 is an example of a high-level summary of the FIGURE 7. The probabilities of the state's True or False, as
combined Process maturity evaluation results. FIGURE 7 shows shown within the nodes, are approximate and rounded values.
Process maturity scores (from zero through one) in nine blocks,
where each block represents the Process maturity scores at a The BN model developed in the context of the Process maturity
maturity level (Level 1 through Level 3) for the five evaluations improvements represents the relationship between the Process
areas (A1 through A5). Data maturity results presented in maturity elements and levels. The BN model was based on the
FIGURE 7 show a consistent pattern across the five evaluation Process maturity results by evaluation area and level shown in
areas, i.e., higher scores at Level 1 (L1) and then decreasing FIGURE 6. The 15 nodes of the model correspond to the 15
scores at Level 2 (L2) and Level 3 (L3). "Level-Area" matrix elements presented in FIGURE 7. The
elements of the BN model enable the use of the model as a
Evaluation Area Evaluation Area Evaluation Area Evaluation Area Evaluation Area quantitative tool to determine the most effective prioritization of
A1
(Governance)
A2
(Organization &
A3
(Data Standards,
A4
(Process
A5
(Technology &
improvements to achieve specific goals for overall maturity
People) Requirements & Efficiency) Tools) levels or evaluation areas.
Metrics)
Maturity Level
L1
0.89 0.92 0.93 0.98 0.86

Maturity Level
L2 0.84 0.79 0.64 0.81 0.58

Maturity Level
0.26 0.34 0.13 0.23 0.26
L3

FIGURE 7: EXAMPLE OF PROCESS MATURITY EVALUATION


RESULTS BY EVALUATION AREA AND LEVEL

5 © 2022 by ASME
FIGURE 8: EXAMPLE OF BAYESIAN NETWORK MODEL FOR
PROCESS MATURITY IMPROVEMENTS

Two scenarios are presented below where the BN model


discussed above could be utilized for deciding the most efficient
steps for Process maturity improvement up to the respective
target level. The two scenarios presented below are hypothetical
and included here for demonstration purposes only. The
scenarios may not represent the experience of the Operator. FIGURE 9: EXAMPLE OF BAYESIAN NETWORK MODEL FOR
PROCESS MATURITY IMPROVEMENTS - SCENARIO 1
1. BN Model – Scenario 1:
Problem: Using the data maturity results "Evaluation Area and 2. BN Model – Scenario 2:
Level" presented in FIGURE 8, prioritize improvements with the Problem: FIGURE 10 presents a similar problem objective as in
objective to meet Level 2 requirements in all evaluation areas to Scenario 1, i.e., using the data maturity results "Evaluation Area
a desired target level. and Level" presented in FIGURE 7, prioritize improvements
with the objective to meet Level 2 requirements in all evaluation
Solution: FIGURE 9 shows the BN model output for Scenario 1. areas but with two additional pieces of evidence being available:
According to the BN model results, improving the degree of  Level 1 requirements have been met in all areas
achievement of Level 2 (L2) in evaluation area 5 (A5) will be  Target Level 3 has NOT been achieved in evaluation
most effective, followed by an improvement of the degree of area A3.
achievement of Level 2 in evaluation area 3 (A3). The BN results
for scenario 1 also put in perspective the fact that focusing on the The second additional evidence listed above is reflected in the
improvement of L2 in A3 will be almost twice as effective node "L3 achieved in A3" having the probability of state False
compared to focusing on efforts to improve either L2 in A2 or being set to 100%.
L2 in A4. The quantified effectiveness of each improvement is
shown below.

FIGURE 10: EXAMPLE BAYESIAN NETWORK MODEL FOR


PROCESS MATURITY IMPROVEMENTS (APPLYING EVIDENCE
FOR SCENARIO 2)

Solution: FIGURE 11 shows the BN model output for Scenario


2. The BN results for Scenario 2 show that the presence of the
two additional pieces of evidence has affected the order of most
effective actions. Improving Level 2 (L2) achievement in area 3

6 © 2022 by ASME
(A3) is now higher in priority than improving L2 in A5. This
reversal of priorities between Scenarios 1 and 2 illustrates of how
non-trivial the optimal prioritization problem is, where the
solution may depend on details of any evidence that may become
available. Generally, the BN model can update the overall
recommendations considering any evidence or relevant
information that may become available.

FIGURE 12: OVERVIEW OF THE DIGITAL TRANSFORMATION


APPROACH

Methods for improvements in data quality and processes are


undoubtedly an important future step. However, its discussion
lies beyond the scope of work presented in this paper.

FIGURE 11: EXAMPLE OF BAYESIAN NETWORK MODEL 8. CONCLUSION


FOR PROCESS MATURITY IMPROVEMENTS - SCENARIO 2 Understanding an organization's data quality and Process
maturity is necessary to manage impacts from financial to
7. PERIODIC REASSESSMENT of DATA QUALITY public safety. The digital transformation approach presented in
(FUTURE STEPS) this paper is an iterative and continuous improvement process.
The effectiveness of process maturity and the effects of An essential component of this approach is understanding the
improvements in the process maturity could be measured by characteristics of your data, which will allow you to evaluate
comparing the results of the baseline data quality assessment and the data quality and Process maturity. The BN model presented
the subsequent periodic data quality assessments. Periodic in this paper could be used to identify the most efficient efforts
assessments are an essential part of the digital transformation by learning from the outcomes of the periodic data quality
approach as they can provide avenues for recalibration of assessment using DQI/KPIs. Pipeline operators are at varying
Process maturity actions, and provide a reliable measure of data levels of digital transformation to support data quality demands
confidence. FIGURE 12 shows an overview of the digital due to regulations and increased use of analytics. The approach
transformation approach. The digital transformation approach is presented in this paper can play a vital role in any operator's
meant to be a continuous process where results are evaluated digital transformation process.
after every iteration to identify the most efficient next steps for
an increase in Process maturity and data quality. An increase in
Process maturity and data quality assessment scores could also ACKNOWLEDGEMENTS
lead to the evaluation of data using additional DQI/KPIs as per The work described herein is funded by Enbridge. The
the priorities of an organization. The organization may also authors wish to thank Enbridge for their financial support
change its target maturity level as per its progress, objectives, and permission to publish. Betsy Kruse is recognized for her
and available resources. ongoing support between Dynamic Risk and Enbridge.

REFERENCES
[1] Kjærulff, U. B., and Madsen, A. L. Bayesian Networks
and Influence Diagrams: A Guide to Construction and
Analysis (2-nd Edition), Springer, 2012

7 © 2022 by ASME
[2] BayesFusion, LLC provides artificial intelligence [4] DNVGL-RP-0497 Data Quality Assessment
modeling and machine learning software based on Framework, January 2017
Bayesian Networks: https://www.bayesfusion.com/
[3] ISO 8000-8 Data Quality – Part 8: Information and data
quality: Concepts and measuring, 2015

8 © 2022 by ASME

You might also like