There is rapidly expanding interest in application of statistical analysis in the field of integrity management, particularly with respect to use of data from, and for planning of, inspections. Integrity practitioners are increasingly seeing the benefits of the added insights that statistical analysis brings to both efficiency and effectiveness. The language used in integrity management is starting to reflect this, with increasing references to terms such as Data Science and Predictive Analytics. As with many other fields, integrity management is beginning a transformation that is essential to realising the true value of data and analysis driven decision support. Part of this transformation is in the skills and disciplines needed in integrity teams. There is clearly a growing role for Statisticians and Data Scientists. At the same time, existing integrity practitioners, including those in inspection and corrosion disciplines, can benefit by building an understanding of key elements of the new approaches. A challenge to widening uptake lies in making key concepts, which have direct practical relevance, readily understandable. We’re putting together a series of short articles, under the title Analysis Matters, to help with this – the second of these is below.

# Analysis Matters: What's your bias?

** **By: Mark Stone BSc Eng (Mech) PhD December 2020

One of the important applications of inspection is to identify areas with greatest degradation so that actions can be planned to avoid failures. Non-destructive testing (NDT) methods are used to provide a quantitative measure of wall thickness or wall loss. The focus for immediate action and decisions on future requirements will be on areas identified as having the smallest margins on allowable thickness, the highest corrosion growth rates and/or the shortest estimated remaining life. These parameters are typically calculated directly from the measurements with no consideration of measurement error or uncertainty.

In practice, thickness or wall loss measurements include errors [1]. Integrity management decision making can be negatively impacted unless due consideration is given to the nature and impact of these errors.

As mentioned above, the drivers for integrity decisions are often the “worst cases” as determined from the measurements. It is therefore useful to consider how measurement error affects these cases and what the impact on the decisions arising might be. It is useful as a starting point to consider a hypothetical situation in which we have a series of degradation features of identical depth. This kind of situation might be as represented by the true wall loss maps, here for 10 identical features, shown below.

Figure 1: Degradation features all with identical true wall loss

In a perfect world, inspection over these 10 features would reveal that each has exactly the same depth. However, in reality our inspection system will have some random error component and each feature will be reported as having a different depth. We will then get a range of reported depths. Since the true depth is the same for every feature, if we sequence the reported depths from minimum to maximum that will also define the sequence of measurement error from numerical minimum to maximum. The reported maximum depth will inevitably be that associated with the maximum of this sequence.

As such, in cases where there is no systematic error and there are a reasonable number of features, the feature reported as deepest will be that with the largest oversizing error. Note that this means the behaviour of the errors for the deepest reported feature does not match the behaviour of errors for the inspection system. For the inspection system the mean error will be zero (since there is no systematic error). However, we can expect the error for the feature reported as deepest to be positive, i.e. there is some inherent positive bias. This behaviour is a statistical property that is readily demonstrated at a conceptual level as described above and it is also provable formally. It is very easy to explore for oneself and reach the same conclusions by developing a spreadsheet in which one looks at the maximum of a number of cells whose values are random variates from any distribution type symmetric about zero. Please feel free to get in touch, via This email address is being protected from spambots. You need JavaScript enabled to view it., if you have any questions on how to do this.

The difference in behaviour of the errors is illustrated in Figure 2, which considers the case of a normal distribution of measurement error (which has a standard deviation designated σ_{m}) and the error in the maximum measured value over N features with identical true depths. Further details on how to calculate the distributions are provided in Note 1 at the end of the article. The measurement system error is represented by the N=1 case. Note that the errors are plotted as probabilities of exceedance. As there is no systematic error here, for this case the distribution is symmetric about the mean (of zero at the 0.5, i.e. 50%, probability level).

Figure 2: Errors for reported maximum depths

It is evident that, as per the preceding explanation, there is biasing of the errors, towards positive values, when considering a number of features and the extent of this biasing increases with the number of features. In the case of 10 features, we would expect, in the sense of corresponding to the 50% probability level, the error in the maximum reported depth to be 1.5σ_{m}. The probability that the error exceeds 1σ_{m} is 80% and the probability that the error exceeds 2σ_{m} is 20%. It is also evident that the probability that the error will be negative, i.e. the reported maximum depth is undersized, is very small. The extent of biasing is even greater for 100 and 1000 features.

Biasing as described above has a substantive impact on results used for integrity decision making yet still appears not to be widely considered by integrity practitioners. This article highlights the nature of biasing that can occur in practical situations and how this can drive up the cost of integrity management when ignored. It emphasises the benefits of including analysis of biasing when evaluating inspection results in support of integrity decision making.

Clearly it would be unusual in practice to have situations, as in Figure 2, where all degradation features have the same depth. More typically, degradation features in an item or system will have a range of depths. Biasing of the errors for features reported as deepest remains inherent, however, with there being a tendency for the features reported as deepest to be those with the largest oversizing errors. The extent of the biasing depends on the number of features as well as the nature of their depth distribution. For the purposes of illustration here we consider the results for exponential and gamma distributions as these are representative of wall loss distributions encountered in practice. Some further background on the distributions used in the illustrative examples that follow is provided in Note 2 at the end of the article.

Figure 3 shows the distributions of errors in the case of 100 features which follow an exponential distribution. It considers four different maximum depth cases, with increasing depth to error ratios, i.e. true maximum depths of 1σ_{m}, 5σ_{m}, 10σ_{m} and 15σ_{m}. Note that the distributions here are, as in Figure 2, shown as probabilities of exceedance. It is evident that there is biasing towards oversizing in all cases but that the biasing tendency is less than shown in Figure 2. The biasing reduces as the maximum depth increases relative to the system measurement standard deviation. For example, in the case where the maximum depth is 15σ_{m}, the biasing is quite small, e.g. the probability that the reported maximum is oversized, i.e. >0, is only 56%. However in the case where the maximum depth is 5σ_{m}, there is a 90% probability that the maximum is oversized and approximately 52% probability that the oversizing is more than 1σ_{m}. Figure 4 shows the same situation as in Figure 3 but now considering 1000 features. It is evident, that as one would expect, the extent of biasing increases with the number of features measured.

Figure 3: Errors in reported maximum depth for 100 features with exponential distribution of depth

Figure 4: Errors in reported maximum depth for 1000 features with exponential distribution of depth

Figures 5 and 6 show the distributions of error in reported maximum depth for 100 and 1000 features respectively but now considering a gamma distribution of wall loss rather than exponential. Gamma distributions accommodate a faster drop off in the tail relative to exponential distributions and are often used to model so-called “light-tailed” behaviour. In practice, feature wall loss distributions are quite frequently observed to be light-tailed and are well represented by a gamma distribution (or sometimes other light-tailed distributions such as Weibull). The results in Figures 5 and 6 indicate much stronger biasing towards oversizing than for the corresponding cases shown in Figures 3 and 4. For example, in the case of 100 features, with a true maximum depth of 5σ_{m}, there is approaching 100% probability that the maximum is oversized and approximately 96% probability that the oversizing is more than 1σ_{m}. The biasing is stronger still for the case of 1000 features.

Figure 5: Errors in reported maximum depth for 100 features with gamma distribution of depth

Figure 6: Errors in reported maximum depth for 1000 features with gamma distribution of depth

These results indicate that there is potential for significant biasing towards oversizing of the features reported as deepest. A key question for integrity practitioners will of course be how this relates to practical applications. It is useful at this point to consider a few simple but representative examples based on the results shown above.

**Example 1**: An ILI run reports 1000 corrosion features and analysis indicates their depths follow approximately an exponential distribution. The ILI tool has a stated 80% tolerance on depth sizing of 12.5% of wall thickness (WT). This corresponds to a tool measurement standard deviation (σ_{m}) of approximately 10%WT. The feature listed in the results as deepest has a reported depth of 60%WT. As such the reported maximum depth is 6σ_{m} and as a starting estimate, bearing in mind the potential biasing, we can say the true maximum depth might be something around 5σ_{m}. Hence, for the purpose of illustration, we can reasonably refer to the orange line in Figure 4. This is replicated in Figure 7 below for convenience.

Figure 7: Estimated distribution of errors for reported maximum depth for Example 1

Point B in Figure 7 indicates that the probability that the reported maximum depth is oversized is approximately 98%, i.e. there is only a 2% probability that the reported value is less than the true value. Point A indicates that the expected, in the sense of 50% probability, situation would be that the deepest reported feature is oversized by approximately 1.6σ_{m}, i.e. given a reported maximum of 60%WT we would expect that feature to actually have a depth of 60%-16%=44%WT. This is substantially less than the reported value.

It was - and largely remains - common practice, when coming to a view on fitness for purpose margins, to add the tool tolerance to the maximum reported depth. Doing this here, the value used for fitness for purpose assessment would be 60%+12.5% = 72.5%WT. Hence it is most probable that the value used in the assessment represents an oversizing of 28.5%WT. Here the conclusion, based on the reported result plus tolerance, might be that the feature requires repair or even that the condition of the line means replacement will be necessary sometime soon. Clearly, using the measurements without an understanding of the biasing effect means significant costs could be incurred in premature and unnecessary follow up action.

This example is for illustration only but there is substantial evidence that a lack of understanding of the inherent biasing effect has materially impacted the costs of operation of pipelines for many operators. Fortunately, in the case of pipelines - and largely in response to the publications from Huyse and co-authors, e.g. [2] and [3] - there has been an increasing recognition of the problem and changes in practice to consider biasing are gaining traction while improvements in inspection tool performance have been made in parallel.

**Example 2**: Manual ultrasonic inspection is carried out on 100 corrosion monitoring locations (CMLs) on a line on a production facility. The inspection system has been determined to have a thickness measurement error standard deviation of approximately 0.8 mm for field deployment. The maximum depth of corrosion across the 100 CMLs is reported as 4 mm, i.e. 5σ_{m}. This exceeds the corrosion allowance for the line which is stated as 3.2 mm. Analysis of the dataset indicates that the depth distribution is considerably lighter tailed than exponential and likely follows a gamma distribution (which for the purposes of illustration has the same parameters as those considered in Figure 5).

We can use the orange (5σ_{m}) curve in Figure 5 as a basis for estimates. This is replicated in Figure 8 below for convenience.

Figure 8: Estimated distribution of errors for reported maximum depth for Example 2

The point marked A in Figure 8 indicates an expected (50% probability) oversizing of approximately 2.1σ_{m} for the location reported as deepest. This corresponds to an oversizing of 2.1 x 0.8 = 1.7 mm. Hence, we would expect the depth associated with the location reported as deepest to be 4 - 1.7 = 2.3 mm. Integrity decisions based on the measured value of 4 mm, which exceeds the corrosion allowance, would likely incur unnecessary additional cost compared to the case where consideration of biasing gives one an improved basis for evaluation of the likely true maximum depth for the feature reported as deepest.

Note that here we can estimate the probability that this depth exceeds the corrosion allowance as approximately 4% (since, as indicated by point B in Figure 8, there is a 96% probability that the error will be larger than 4 – 3.2 = 0.8 mm = 1σ_{m}). This emphasises that integrity decisions made directly on the basis of the reported result, without consideration of the biasing effect, would very likely be inappropriate for the true situation.

The above examples consider inspection data sets related to the current condition. In many cases there is also a need to estimate degradation growth rates based on comparison to data from previous inspections. The biasing effect is present in the comparison too and there is a tendency for the highest reported growth values to be those that have the largest errors. These effects are often substantial when comparing repeat data because the standard deviation of error is typically a larger fraction of the change in depth compared to the depth itself. Note that the “amplification of error” as discussed in a previous blog post [1] also plays a key role in this. In practice, inspection intervals or other related follow up activities that impact on the cost of integrity management, e.g. additional inspection, are often directly defined by the highest reported values of degradation growth rates. This tends to drive up the costs unnecessarily, yet remains prevalent in current integrity practice.

Continuation of practice in which biasing is not acknowledged can be defended on the basis that it tends to be more conservative. This is true, but is in effect making an argument for incurring additional cost of integrity management without understanding the incremental margins introduced or justifying these. It is unlikely the same decisions would be arrived at or the associated costs seen as justifiable if due consideration is given to the biasing effect.

Understanding the potential for biasing and gaining insights into its effects through analysis can substantially improve the efficiency of integrity management. There are a number of simple steps that integrity practitioners can take:

- Learn to recognise situations in which biasing is likely to be at a level where it can materially impact on decisions, e.g.
- Where there are a large number of locations/features measured
- Where the inspection system errors are likely to be a reasonable proportion of the true depth of degradation
- When doing comparisons of repeat inspections

The series of papers, of which [2] and [3] are examples, by Huyse and co-authors provide an excellent basis for building an understanding of biasing, particularly in the way it affects the results of in-line inspection of pipelines. Biasing is also considered in Section 5.2 of the HOIS Recommended Practice for Statistical Analysis of Inspection Data [4].

- Ensure that appropriate analysis is carried out where it appears biasing has the potential to affect decisions. In general, simple analytical solutions are not available but results are relatively straightforward to obtain by statistical simulation. If you would like further information on the types of analysis methods applicable, please feel free to get in touch.

- Introduce corrosion growth analysis methods which do not rely on feature to feature, or location to location, comparison. The HOIS Guidelines on More Effective Pipework Inspection [5] provide a good overview of such methods.

- Ensure the measurement performance of the inspection systems used is sufficiently quantified. This is fundamental to fully realising the benefits offered to integrity management by advanced analytics, particularly as inspection systems are giving access to increasingly large datasets. The metrics used should reflect field performance as far as possible, hence the approach to testing performance is relevant.

- Consider using measurement systems with improved performance when analysis indicates that this will make a material difference. Keep in mind that as inspection technologies are advancing, so the cost of access to systems with enhanced performance is reducing.

- More rigorously assess the need for follow up activity such as more detailed inspection or verification of individual locations of concern. In traditional practice where biasing is ignored there is a strong drive to follow up with additional inspection at locations of concern. Where the conditions are such that the biasing effect is substantial, much of this activity turns out unnecessary. Analysis of the biasing effect provides an understanding of the need for, and likely value of, follow up inspection. It provides a sound justification for verification in some cases and a clear basis for saving the cost of verification in others. The results of analysis also help understand issues such as the probability that the depth of the feature reported as deepest is within a certain margin of the true maximum depth across all features. This can drive better informed decisions on the number of features selected for additional inspection.

In summary, this article highlights how biasing towards oversizing of the worst reported damage arises and shows how this can contribute to integrity decisions that incur considerable unnecessary cost. Taking steps to account for biasing and its potential effects can substantially enhance efficiencies in integrity management. This is another example of where analysis acts as a very powerful tool for improved decision support.

**References**

- Stone, M. “Analysis Matters 1: Errors Amplified”, ESR Technology - Analysis Matters - Errors Amplified.
- Huyse, L and van Roodselaar, A. “Effects of inline inspection sizing uncertainties on the accuracy of the largest features and corrosion rate statistics”, Proceedings of the 8th International Pipeline Conference, pp 403-413, 2010.
- Dann, M and Huyse, L. "The effect of inspection sizing uncertainty on the maximum corrosion growth in pipelines." Structural Safety, Vol 70, pp 71-81, 2018.
- HOIS Recommended Practice for Statistical Analysis of Inspection Data, HOIS(12)R8, 2013.
- HOIS Guidance for More Effective Pipework Inspection, HOIS-G-010, 2018.

**Note 1: Distributions of maxima **

Figure 2 considers the distribution of the maximum value of samples with N elements from a normal distribution (with a mean of zero). This can be considered as the n^{th} ranked value, with n=N, in the sample after sorting from minimum to maximum, whereas the minimum value would be the 1^{st} ranked value. The distribution of values at different ranks is addressed in the field of Order Statistics. A key result in Order Statistics is that the cumulative probability for the maximum of a sample with size n follows a Beta distribution as follows

where *G*(*x*) is the cumulative probability for *x*, e.g. here for the normal distribution defining the behaviour of the measurement error.

**Note 2: Depth distributions considered**

The results presented consider that the depths follow exactly the Exponential and gamma distributions as shown in Figures 9 and 10 below. Note that the depth values are not made up of random variates from these distributions but are the determined directly from the sequence of probability levels in each case. The value of the scale parameter for the gamma distribution is 0.005 times the value of the shape parameter.

Figure 9: Depth distributions considered in cases with 100 features

Figure 10: Depth distributions considered in cases with 1000 features