Ipc2022-87060 - Estimating Measurement Performance With Truncated Data Sets

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Proceedings of the 2022 14th International Pipeline Conference

IPC 2022
September 26-30, 2022, Calgary, Canada

IPC2022-87060

ESTIMATING MEASUREMENT PERFORMANCE WITH TRUNCATED DATA SETS

Jason Skow1, ∗ , Joseph W Krynicki2 , Alex Fraser1 , Gustavo Gonzalez3

1
Integral Engineering, Edmonton, Alberta, CA
2
ExxonMobil Research and Engineering, Spring, Texas, US
3
Esso Petroleum Company Limited, Hythe Terminal, Hampshire, UK

ABSTRACT the reality that there are features below reporting threshold. The
steps required to format the results for use, and achieve more
In September 2021, the API released the third edition of the accurate measurement performance results (e.g., unity charts),
1163 Standard "In-line Inspection Systems Qualification". This are described in this paper.
edition brought many improvements over previous versions, in-
Keywords: truncation, measurement error, measurement
cluding more detail in Section 8 "System Results Validation",
validation, ILI, field validation, API 1163
which defines the methodologies used to validate ILI run toler-
ances. The standard describes three levels of validation, with
ABBREVIATIONS
’Level 3’ requiring the operator calculate ILI tool measurement
performance with real-world data measured in validation spools API American Petroleum Institute
and excavation sites. Real-world, inspection data sets have some Bc back-wall corrosion
characteristics that make them difficult to use to accurately es- CI credible interval
timate measurement performance, one of which is ’truncation’, FS front-surface
that is data with a lower- or upper-bound threshold above which FSc front-side corrosion
no data is reported. For example, most UTCD ILI tools have a HBM hierarchical Bayesian model
lower truncation level, such as 1 mm for crack height, which rep- ILI in-line inspection
resents a signal threshold below which measurements are either MFL magnetic flux leakage
not reliable, or not reported. Although small features below the NDE non-destructive evaluation
reporting threshold exist on the pipeline, they are not normally NDT non-destructive testing
reported by the ILI tool. OLS ordinary least squares
SNR signal-to-noise ratio
This paper describes a model to estimate ILI tool perfor- TR truncated regression
mance using API 1163 Level 3 methods when the data set has a UT ultrasonic testing
lower-truncation threshold. The model is tested with simulation UTCD ultrasonic crack detection
data to show how it responds over a wide range of feature pop-
ulation characteristics, and then applied to two real field data
sets. Comparisons are made between the truncation algorithm NOMENCLATURE
and the standard non-truncated version of the algorithm, to show ℎ validation height
where the new algorithm performs best and is most useful to im- 𝐿 truncation threshold
plement pipeline integrity mitigations. The model used in this 𝑁 (0, 𝜎𝜀2 ) normal probability distribution with mean 0
study is consistent with the example documented in API 1163 - and standard deviation 𝜎𝑒
Appendix C, the Bayesian inference method. The results of the 𝑋 exogenous variables including education
model produce measurement performance specifications that can 𝑌 household income
be used as inputs in a pipeline risk or reliability analysis. The 𝑦 measured height
influence of truncated data sets is common in the field of inspec- 𝛼 y-axis intercept
tion and NDE (including thickness measurements), as it reflects 𝛽 slope of the line
∗ Correspondingauthor: [email protected]
Document version: Final, 1.0, 2022/04/23 .

1 Copyright © 2022 by ASME


𝜀 y-axis measurement error
𝜆 likelihood function 10
Φ normal cumulative distribution function Truncated Data
Reported Data
𝜑 normal density function
OLS (Reported Data)
𝜎𝜀 standard deviation of measurement error 8 OLS (All Data)

Measurement (mm)
1. INTRODUCTION 6

In September 2021, American Petroleum Institute (API) re-


leased the third edition of the 1163 Standard In-line Inspection
Systems Qualification [1], which brought improvements over pre- 4
vious versions, including more detail in Section 8 System Results
Validation. The Level 3 validation allows operators to calculate
in-line inspection (ILI) tool run tolerance (measurement perfor-
2
mance) with real-world data measured from validation spools and
excavation sites, including data sets that exhibit truncation.
Truncation occurs in a data set when values below or above
a threshold are excluded from a reported data set due to some 0
0 2 4 6 8 10
physical or analytical limitation. Data can be upper-truncated,
lower-truncated or both. In the oil & gas integrity world, upper- Validation Measurement (mm)
truncation occurs when a measuring technology cannot produce a
reliable measurement for large or deep flaws (e.g. pits) compared FIGURE 1: TRUNCATED DATA EXAMPLE
to surrounding material thickness. Similarly, lower-truncation
occurs when the signal threshold of a measurement technology is
reached and the signal response is indistinguishable from noise. slope of the red dashed line is rotated clockwise around the cen-
Lower-truncation can even result by operator choice, where a troid of the data compared to the true fit. This pattern, which is
combination of measurement tools are run, and one or more partly attributable to truncation, is often observed in ILI data sets
report only larger features with depths greater than 30% of wall when plotted on a unity plot with a validation measurement. The
thickness. Figure 1, is an illustrative example of a data set with truncation algorithm provides a correction to the OLS method to
both upper- and lower-truncation. It is a constructed data set that recover the true fit, even when data is truncated.
we can artificially truncate to show its effects by comparing the One of the most common examples where upper and lower
differences between the full data set and the reported data set truncation can occur is with ultrasonic testing (UT) thickness
after truncation. The dotted green lines indicate the upper and measurements. Lower truncation may occur for a variety of
lower truncation limits. The dark blue points are reported data reasons and among the most common are reporting threshold
and the grey points above and below the thresholds are truncated (e.g., 10% of thickness) and size of pit. Small diameter and
data. Truncated data exists in reality, but are not reported by the shallow pits have a small impact on the interface echo used in
measurement technology, a shortcoming that must be accounted thickness measurements. Similarly, in ultrasonic crack detection
for when evaluating measurement performance. (UTCD) the low level amplitude signals are often associated with
The data set for Figure 1 is created using Equation 1 [2], noise in the data and filtered out due to the weak signals and
where the slope, 𝛽, is equal to 1.0, 𝑁 (0, 𝜎𝜀2 ) is a normal proba- negligible impact on integrity.
bility distribution with a mean of zero and variance, 𝜎𝜀 2 , equal to In some cases, lower truncation is a conscious decision, e.g.,
1.25. The lower truncation limit is 2.5 mm and the upper trunca- increasing the arbitrary minimum reporting threshold for mag-
tion limit is 7.5 mm. The line is defined using the ordinary least netic flux leakage (MFL) ILI data sets in order to focus on the
squares (OLS) regression algorithm that minimizes the squared analysis of data that have more relevance to repairs, without eval-
error of the distance between the line and each observation. uating the potential impact on the overall performance of the
technology.
An example of truncated ultrasonic thickness data are shown
𝑦 = 𝛼 + ℎ𝛽 + 𝜀 in Figures 2a and 2b which represent an immersion ultrasonic
𝜀 = 𝑁 (0, 𝜎𝜀2 ) (1) thickness measurement (similar to the technique used for ultra-
sonic ILI tools). The front-side corrosion (FSc) is shown in
The dark blue solid line is the best-fit regression line evalu- Figure 2a and back-wall corrosion (Bc) is shown in Figure 2b. In
ated using all the data. Notice the slope of the line is 1.0 and the both cases, the corrosion indications (FSc and Bc) are of small
line passes through the middle of the data set. The red dashed amplitude and are likely to be truncated for both cases shown
line is the best-fit regression line evaluated using only the re- (limited near side wall loss in Figure 2a and greater far side wall
ported data. Notice the slope of the line is shallower and the line loss in Figure 2b). In both cases the truncation is due to the low
overcalls small features while under-calling large features. The signal-to-noise ratio (SNR) associated with the wall loss features

2 Copyright © 2022 by ASME


(a) UT lower-truncation due to FSc (b) UT upper-truncation due to Bc

FIGURE 2: TRUNCATION DUE TO INSPECTION TECHNOLOGY

and because the wall loss signal can become contained within Once the experiment was running, there was no restriction
the larger front-surface (FS) echo. This type of truncation can on earnings. The experiment continued over several years and
occur for many types of ultrasonic measurements (both thickness incomes could increase and the subjects remained in the study.
and angle beam flaw inspection), other non-destructive testing Due to the cut-off in the earnings of subject selection, the data set
(NDT) methods, and applications beyond pipelines. For ultra- was upper-truncated – subjects initially with incomes higher than
sonic thickness measurements, truncation is less likely to occur the threshold were excluded from the study. In wider society,
on flat parallel surfaces. However, loss of reliable signal mea- these incomes existed, but they were truncated from the data set
surements (i.e., truncation) is more likely when the wall loss is used for the study. Consider the plot of earnings vs education in
more severe in terms of remaining thickness and complex surface Figure 3. The solid ‘true line’ represents the average relationship
morphology. between education and earnings. The dots around the line repre-
In summary, truncation of UT thickness readings is less likely sent the distribution around the mean for each level of education.
when the equipment has no degradation (e.g., smooth, flat and The dashed ‘estimated’ line is the best fit line when all data above
parallel surfaces). However, truncation becomes more likely as threshold 𝐿 are ignored. This underestimates the true effect of
the level of corrosion and degradation increases (e.g., localized education because it ignores data above the threshold.
pitting, pin holes, and significant wall loss with a complex pro- To properly account for the effect of education on income, a
file). correction to go from the dashed line to the solid line was needed.
A key insight into the solution is that truncation introduces a cor-
2. TRUNCATION ALGORITHM relation between the right-hand variables and the error leading to
In 1976, Hausman and Wise [3] published a study in the a predictable bias, see Figure 4. For any given value of education,
National Bureau of Economics Research on a social experiment the distribution of earnings is a distribution truncated at 𝐿, where
called the New Jersey Income Maintenance Experiment. In this 𝐿 depends on the level of education.
study, the researcher attempted to answer the question “will in- The authors developed an algorithm to ‘counter’ the effects of
come maintenance programs which aid the working poor induce upper truncation on a data set. First assume a linear relationship
this group to work less? And if so, how much less?” Subjects between the household income, 𝑌𝑖 , and participant education, 𝑋𝑖 ,
were selected for the experiment based on their household income as follows:
for the year prior to study. They had to have a household income
less that 1.5× the 1967 poverty threshold during the year prior to 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝜀 𝑖 (2)
the experiment to be accepted into the study. The density function of 𝑌𝑖 for a given value of 𝑋𝑖 𝛽 is zero

3 Copyright © 2022 by ASME


Truncated Data 6 Within ILI Tolerance
Experiment Data Outside ILI Tolerance
Estimated Line Best Fit Line (OLS)
True Line 5 Best Fit Line (TR)

ILI Height (mm)


4
Income

0
0 1 2 3 4 5 6
Education
Feature Height (mm)

FIGURE 3: NEW JERSEY INCOME EXPERIMENT (HAUSMAN, 1976) FIGURE 5: TRUNCATED REGRESSION WITH REAL DATA

is recommended to prevent an over-correction from the algorithm.


Income Error
𝑁
∏︂ 𝜑(𝑦 𝑖 )
𝜆= (4)
𝑖=1
Φ [(𝐿 𝑖 − 𝑋𝑖 𝛽)/𝜎]

3. APPLICATIONS

3.1 Manufactured Cracks in a Spool


FIGURE 4: RIGHT TRUNCATED INCOME DISTRIBUTION When pipeline operators measure features, they don’t remove
data, however, practical limitations such as SNR on the inspection
technologies used result in truncated data sets.
for values greater than the truncation threshold, 𝐿, and is a ratio The data in Figure 5 is an example of pipeline data with
of the density function and the truncated cumulative function for lower-truncation on the y-axis. It is derived from an ILI tool
all other values of 𝐿. run with validation measurements from a test spool that contain
manufactured crack-like flaws of precisely known size. The blue


⎪ 0 𝑦𝑖 > 𝐿𝑖 shaded band represents the ILI measurement specification. No-


𝑓 (𝑦 𝑖 ) = 𝜑(𝑦 𝑖 ) (3) tice there are no data points below 1 mm on the y-axis. Although
𝑦𝑖 ≤ 𝐿𝑖

⎪ 𝐿𝑖 𝜑(𝑦 𝑖 )𝑑𝑦 𝑖

⎪ some of the features are small (i.e. the feature heights shows
⎩ −∞ values below 1 mm), none are reported by the ILI tool as being
The distribution parameters in Equation 3 can be solved by smaller than 1 mm. This is because the ILI data is lower-truncated
maximizing the likelihood function of the product of the proba- along the y-axis at 1 mm and for true features <1 mm, only the
bility density functions evaluated at the observed attribute values, oversized ones are reported. Features exist below this value but,
see Equation 4. The denominator in Equation 4 normalizes the in this case, they are not reported by the ILI tool. The most com-
truncated distribution by dividing the probability density in the mon technical reason for measurement truncation is SNR and all
numerator by the cumulative distribution function truncated at systems are subject to this limitation. Due to the reliable fabri-
the threshold. This equation is solved by taking the negative of cation process used to produce features on the spool and use of
the logarithm of both sides and finding the minimum value using advanced ultrasonic techniques with high SNR, all features on
an optimization algorithm such as Newton-Raphson [4], Nelder- the spool are considered "true" and there is no truncation on the
Mead [5], or some other appropriate optimization algorithm. x-axis.
The model applies to an entire data set at once due to the The red dashed line shows an OLS line fitted to the data.
clever use of continuous truncation correction over the x-axis. The slope of the line is shallow indicating that features larger
There is no need to bucket data, however, 20 or more data points than 1 mm are undersized by the ILI tool, and small features are

4 Copyright © 2022 by ASME


oversized. This is not a fair representation of the ILI tool perfor- TABLE 1: SIMULATION CASES
mance because truncated data that should be on the plot has been
omitted. In many practical situations, it is not possible retrieve
these points and an alternative to OLS regression is needed. Parameter Values
If the OLS regression line is used to predict non-destructive
evaluation (NDE) size from the ILI reported size, the predictions 𝛼 −2 mm to 2 mm
would be overly-conservative, possibly leading to more excava- 𝛽 0.9 to 1.1
tions, repairs and maintenance costs, and overly-conservative risk 𝜎𝜀 0.9 mm to 2.5 mm
estimates. Clearly, a more accurate evaluation of measurement 𝐿 0.3 mm to 2.0 mm
performance is valuable.
The dark blue solid line shows a truncated regression algo-
rithm, developed to remove unintended error and bias in the red
dashed line. It is calculated using the truncation algorithm de- rithm functions as expected across a range of possible population
scribed in the previous section. The slope is steeper indicating a characterizations, a set of simulations is run. Each simulations
stronger relationship between flaw height and signal response, as defines a population of flaws, truncates the population, runs the
expected and it is more accurate. truncation algorithm and evaluate if the intercept and slope can
be recovered from a partial data set. Various parameter ranges
3.2 In-Line Inspection and Field Digs are used to test the algorithm, see Table 1. For each combination
The data in Figures 6a and 6b are examples of ILI mea- of parameters, a population of flaws is created using Equation 1.
surements validated with field excavation measurements. A hi- Flaws below the truncation limit (𝐿) are removed from the dataset
erarchical Bayesian model (HBM) that aligns with requirement to reproduce the real-world truncation effect. Using this dataset
described in API 1163-2021 is used in the place of a simple re- the slope and intercept are estimated with both the OLS and trun-
gression model, see [6] for details on the model. The advantages cated regression (TR) methods to determine how effectively they
of this form of the model are that it can estimate a credible in- recover the original parameters. This was repeated 100 times for
terval for performance, which is useful for scenario analysis and each parameter combination, and then averaged across the simu-
risk assessment. In addition, the model is flexible to account for lations estimates the models’ performance. Figure 8a shows one
field measurement error, different weighting for data points, prior example of such a simulation.
information such as vendor specification and differential signal Figure 7a shows a boxplot summary comparing the ability to
response error with size (trumpeting). recover the true slope for the TR and OLS methods. The param-
Both figures display the same data set with truncated data; the eter combinations are categorized into cases where the intercept
regression algorithm in Figure 6a does not correct for truncation, (𝛼) is less than or greater to 0, and where the random error (𝜎𝜀 )
whereas the regression algorithm in Figure 6b does. The light is below 1.75 (Low Error, red bars) or above 1.75 (High Error,
blue solid lines represent simulated regressions that could account blue bars). When the model performs perfectly, the ratio of esti-
for the observed data. The dark blue solid lines show the boundary mated slope to the true slope is 1.0. This figure shows that cases
for the 80% credible interval (CI) which can be used for failure with significant portions of the truncated data (𝛼 ≤ 0) the TR,
pressure estimates and risk assessments. Notice, the slope of the method performs significantly better than OLS at recovering the
regression lines in Figure 6a is shallower than Figure 6b, which true slope. Cases with significant random error, the performance
is almost parallel to the 45◦ unity line. Although there is a mild of the TR method is reduced, however, it still provides a better
conservative bias, the data in Figure 6b does not support the estimate than OLS. For cases where truncation significantly af-
greater undersizing as flaw size increases. fects the dataset (𝛼 > 0) the TR method collapses to OLS and
The 80% CI is interpreted as the probability that a mea- the performance of both models converges. Interestingly, both
surement falls within the boundaries. For example, using the methods begin to perform poorly for high values of 𝛼 due to an
TR algorithm an ILI measurement of 2.75 mm is expected to be effect called censoring. The maximum wall thickness used for
within the range 2.6 mm to 5.1 mm, 80% of the time. Using the simulating these examples is 10 mm; when the intercept is high,
OLS algorithm, an ILI measurement of 2.75 mm is expected to be several of the measured values are censored at the 10 mm max-
within the range 2.5 mm to 5.8 mm, 80% of the time. When using imum. See Figure 8b for an example case in which this occurs.
the OLS algorithm, the interval is almost 1 mm wider. By ac- This can occur due to a physical limit of wall thickness, like in
counting for truncation, the predicted sizing performance is now this example, but it can also occur more realistically when the
accurate as the slope increases from approximately 0.5 and to measurement method has a signal that becomes saturated and the
approximately 1, and we avoid adding excessive sizing tolerance tool cannot distinguish sizes above a threshold. Censoring of data
to the reported ILI features. This more accurate determination is an entirely different phenomenon than truncation that has been
of ILI sizing performance results in more accurate and efficient addressed by the authors of this paper and will be published in
repair plans and more accurate risk assessments. future work.
Figure 7b is the corresponding plot to Figure 7a, but shows
4. PARAMETER RECOVERY & DISCUSSION how well the models recover the intercept for the same range
The previous sections describe the effect of the truncated of parameters. In this plot, the x-axis is the difference between
regression algorithm on specific data sets. To ensure the algo- the estimated intercept and the true intercept, so a perfect model

5 Copyright © 2022 by ASME


6 Within 80.0% CI Outside 80.0% CI) 6 Within 80.0% CI Outside 80.0% CI)

80.0% CI 80.0% CI

5 5
ILI Height (mm)

ILI Height (mm)


4 4

3 2.75 3 2.75

2 2

1 1

2.5 5.8 2.6 5.1


0 0
0 2 4 6 0 2 4 6

NDE Height (mm) NDE Height (mm)

(a) ordinary least squares (OLS) (b) truncated regression (TR)

FIGURE 6: BEST-FIT REGRESSION LINES WITH ILI DATA

TR TR

Intercept > 0 Intercept > 0

Intercept ≤ 0 Intercept ≤ 0

OLS OLS

Intercept > 0 Intercept > 0

Intercept ≤ 0 Intercept ≤ 0

0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5

Ratio of Estimated Slope to True Slope Difference between Estimated Intercept and True Intercept

Low Error High Error Low Error High Error

(a) slope (b) intercept

FIGURE 7: PARAMETER RECOVERY

6 Copyright © 2022 by ASME


10 10

8 8
Measured Height (mm)

Measured Height (mm)


6 6

4 4

Fit Data Fit Data


Truncated Data Truncated Data
2 2
True Line True Line
Best Fit Line (OLS) Best Fit Line (OLS)
Best Fit Line (TR) Best Fit Line (TR)
0 0
0 2 4 6 8 10 0 2 4 6 8 10

True Height (mm) True Height (mm)

(a) example simulation with lower truncation (b) example simulation with upper censoring

FIGURE 8: PARAMETER RECOVERY SIMULATIONS

yields zero. This shows a similar trends to the slope recovery, 2. Cases that more realistically represent ILI validation dig
with the TR method performing very well for simulations in data. Typically, verification datasets are clustered in the
which truncation removes significant portions of the dataset. The bottom left of the unity plot since larger defects are much
difference in performance between TR and OLS is more drastic less common. This can be addressed by operators using
for the intercept recovery, with OLS performing very poorly, a well-designed verification spool program (see Figure 5).
particularly for the high error, low intercept cases. In addition, real datasets typically contain fewer flaws, and
validation spools can provide results more quickly, over a
5. CONCLUSION broader range of flaw sizes, and lower cost compared to
verification digs [7].
Truncation is a phenomenon that exists in datasets measured
with inspection tools. The purpose of data gathering is often to
predict reliability, risk, measurement performance, and mainte- REFERENCES
nance requirements. Without considering truncation, these esti- [1] API. “API 1163: In-line Inspection Systems Qualification.”
mates can be overly-conservative leading to inefficient use of the Standard API 1163-2021. American Petroleum Institute.
time and resources needed to maintain safety and reliability. The 2021.
algorithms described in this paper can help account for truncation [2] Fuller, Wayne A. Measurement Error Methods. Wiley se-
and increase the accuracy of model predictions. ries in probability and mathematical statistics, John Wiley &
Testing the TR method across a wide range of parameters Sons, Inc.
illustrates that it performs very well for cases where the data
[3] Hausman, Jerry A and Wise, David A. “The Evaluation of
has been truncated and reverts to performance similar to OLS for
Results from Truncated Samples: The New Jersey Income
datasets with no truncation. In future work, additional testing will
Maintenance Experiment.” Vol. 5 No. 4 : p. 26.
be performed to determine the performance of the TR method in
these cases: [4] Al-Khafaji, Amir Wadi and Tooley, John R. Numerical Meth-
ods in Engineering Practice, 1st ed. Harcourt Brace Jo-
1. Cases in which the measurement error (𝜀) is not a normal vanovich, Inc (1986).
distribution. It is expected that performance will be worse [5] Mathews, John H. and Fink, Curtis D. “Numerical Methods
since this is an assumption of the model. However, the Using Matlab.” Numerical Methods Using Matlab, 4th ed.
authors have plans to address this assumption, which may Prentice-Hall Inc.
be published in future work. [6] Skow, Jason, Krynicki, Joseph W and Peng, Lujian. “Man-

7 Copyright © 2022 by ASME


ufactured Cracks in Pipe Used to Evaluate ILI Measurement Thirumalai, Neeraj. “Use of Synthetic Flaws to Assess
Performance.” IPC2020-9400: p. 14. ASME. Pipeline Seam Weld Inspection Performance.” PVP2021-
[7] Krynicki, Joseph W, Peng, Lujian, Gonzalez, Gustavo and 61294. 2021. ASME.

8 Copyright © 2022 by ASME

You might also like