To the Editor:
Patriarca et al. (1) recently presented in this journal “a working example” for the calculation of measurement uncertainty (lead in blood) according to the rules of the ISO Guide to the Expression of Uncertainty in Measurement (GUM) (2). Although the authors presented many estimates of imprecision and trueness, they failed to correctly combine the estimates. They estimated the overall longterm SD from qualitycontrol samples (SD_{control}) and from human samples (SD_{human}) and combined them by taking the “square root of the sum of the squares”. Instead, however, they should have used the pooled SD of both estimates, to be calculated as:
with n the number of samples. They presented four estimates of trueness, i.e., measurement of certified reference materials (CRMs), recovery, and results from two external quality assessment surveys (EQAs) and combined them into one mean index (R_{m}), disregarding the fact that the estimates conflicted with each other and were dependent on the concentration of lead.
Generally, one should not use second (recovery) or thirdchoice estimates (EQA results with poorly defined target values) for the uncertainty calculation when one has available firstchoice estimates (comparison with CRMs). Consequently, from the data presented, we would conclude that the method is biasfree in the high concentration range (∼130 μg/L), has a considerable bias in the mid concentration range (∼40 μg/L), and has an unknown bias in the low range (less than ∼30 μg/L). Moreover, we question their approach of including a bias in an uncertainty calculation [square root of the sum of the squares of imprecision and trueness components; note that for estimation of the trueness component they used a rather uncommon procedure, as described by Barwick and Ellison (3)]. Although many believe this is the approach recommended by GUM for treating a bias, it is not. GUM encourages the analyst to search for the cause of a bias and to correct it. This is what the authors should have done.
As long as the cause for the bias remains unknown, it may be prudent to report results in the low concentration range as, for example, <25 μg/L. In addition, if a bias is considered small compared with the overall uncertainty, it simply may be neglected. Furthermore, in exceptional cases (“the small letters of GUM”), a bias may be included in the expanded uncertainty (U) as U + bias [point F.2.4.5 on page 57 of GUM (2), and point 2.5.8 (treatment of uncorrected bias) in the NIST/SEMATECH eHandbook of Statistical Methods (4)].
We want to mention that, different from the GUM, there is a tradition of squaring of bias in the statistical literature (5), e.g., for calculating the root mean squared error (RMSE) used to rank competing statistical estimation procedures (RMSE = ).
Footnotes

Editor’s Note: The erratum mentioned in this reply appears on page 281.
 © 2005 The American Association for Clinical Chemistry
The authors of the article cited above respond:
To the Editor:
We would like to thank Stöckl et al. for their interest in our work (1) and provide some comments to clarify the points they have raised.
The description of how the SDs of longterm precision experiments were averaged was misleading. We actually applied the rule for averaging variances, as indicated by Stöckl et al. If possible, we would like to see the sentence “square root of the ratio between the sum of the squared SDs (each multiplied by its degrees of freedom) and the total number of degrees of freedom” published as an erratum.
As stated in the article, our work was not based directly on the GUM (2), but on the second edition of the EURACHEM/CITAC Guide Quantifying Uncertainty in Analytical Measurement (3). This document describes simpler approaches (topdown vs bottomup) consistent with the GUM and more suitable for application by routine analytical laboratories.
Because few certified reference materials (CRMs) are available, other approaches to the estimate of trueness are often necessary. Recovery studies and participation in welldesigned interlaboratory comparisons enable assessment of the performance of a method over the whole range of application. We chose the example of lead in blood to compare the outcomes of these approaches, but only results traceable to SI were combined in the mean ratio (R_{m}).
Stöckl et al. object to the inclusion of a “bias” in the calculation of the combined uncertainty. It should be noted, however, that the term 1 − R_{m}/k is not a bias (4) but rather describes the size of uncertainty for which the observed bias would not be statistically significant (5).
In principle, all sources of bias should be identified and corrected, but this is not always possible or practical [see “Treatment of uncorrected bias” (point 2.5.8) in the NIST/SEMATECH eHandbook of Statistical Methods (6)]. In our work, the observed bias of 5.4 μg/L lead has little significance in most clinical settings but is not negligible in terms of overall uncertainty and may affect the outcome of the comparison of an individual result with stated action limits (7). The treatment of uncorrected bias according to GUM (2) (point F.2.4.5) and the NIST/SEMATECH ehandbook (6) (point 2.5.8.1), as mentioned by Stöckl et al., is slightly more complex than just adding the bias to the estimated expanded uncertainty (U + bias) and applies only to wellidentified and significant systematic effects. The GUM approach (2) requires that the bias affecting each individual result be determined, or if an average bias is used as a simpler approach, U must be recalculated to include the uncertainty components associated with the estimate of the average bias. A different approach is described in the NIST/SEMATECH ehandbook (6) (point 2.5.8.1) on the basis of an original report by Phillips et al. (8), where asymmetric upper (U_{+}) and lower (U_{−}) limits for the expanded uncertainty are calculated as: U_{+} = U − bias and U_{−} = U + bias. Such methods are certainly appropriate when a bias can clearly be defined (e.g., when sets of traceable measurement standards, such as calibrated weights, are available), but are less so when the assessment of a bias relies on the analysis of two CRMs with target values that have relatively larger U values (3.1% and 3.7%, respectively). In such cases of biases that may be real but cannot be estimated reliably because of the paucity of data, the NIST/SEMATECH ehandbook (6) (point 2.5.3.3.3; “Bias with sparse data”) suggests applying a “zero” correction to the result but to include an additional uncertainty component, associated with the uncorrected bias, in the uncertainty budget.