## Abstract

Application of Deming regression analysis to interpret method comparison data presupposes specification of the squared analytical error ratio (λ), but in cases involving only single measurements by each method, this ratio may be unknown and is often assigned a default value of one. On the basis of simulations, this practice was evaluated in situations with real error ratios deviating from one. Comparisons of two electrolyte methods and two glucose methods were simulated. In the first case, misspecification of λ produced a bias that amounted to two-thirds of the maximum bias of the ordinary least-squares regression method. Standard errors and the results of hypothesis-testing also became misleading. In the second situation, a misspecified error ratio resulted only in a negligible bias. Thus, given a short range of values in relation to the measurement errors, it is important that λ is correctly estimated either from duplicate sets of measurements or, in the case of single measurement sets, specified from quality-control data. However, even with a misspecified error ratio, Deming regression analysis is likely to perform better than least-squares regression analysis.

The comparison of measurements by two analytical methods, for example, *x* and *y*, is usually based on a form of regression analysis. The most frequently used approach, ordinary least-squares regression (OLR) analysis is subject to several shortcomings when both measurement sets are subject to random errors, which represents the typical situation. Because the procedure assumes that only the *y* measurements are associated with random measurement errors, the slope estimate becomes biased, and the testing of hypotheses becomes erroneous (1)(2)(3). Therefore, alternative regression procedures are more appropriate to apply in situations with random errors in both *x* and *y* measurements. The Deming method, also called the errors-in-variables model or the functional or structural relationship model in the statistical literature, takes measurement errors for both sets of measurements into account and is therefore more generally applicable than OLR (1)(2)(3)(4). Application of the Deming method requires specification of the ratio between squared analytical SDs for the two analytical methods. The analytical SDs may be known from quality-control data, and specification of the ratio is then possible. Furthermore, if duplicate sets of measurements are available, as is generally recommended, the ratio may be estimated from the actual data set. However, frequently duplicate data sets are not available, and for a new method, quality control data may be sparse. Therefore, application of the Deming regression analysis may be halted by lack of information concerning the ratio between analytical SDs. By default, one may then choose to set the ratio equal to one and carry out the regression analysis on this assumption. In the present study, I consider the performance of Deming regression analysis when the ratio between squared analytical SDs is set to one in simulated situations that should be realistic in the field of clinical chemistry. The cases correspond to a comparison of electrolyte methods and a comparison of glucose methods. Using this approach, it is possible to evaluate the size of a possible slope bias and the performance of hypothesis-testing in situations with an unknown analytical SD ratio.

## Materials and Methods

### model of method comparison situation

The study is based on simulated method comparison situations. A given analytical method measures analyte concentrations with some uncertainty. One may distinguish between the measured value (*x*_{i}) and the target value (*X*_{i}) of a sample subjected to analysis by a given method. The latter is the average result we would obtain if the given sample was measured an infinite number of times. The measured value is likely to deviate from the target value by some small “random” amount (ε or δ). For a given sample measured by two clinical chemistry methods, the following relations exist: The dispersions of measured values around the target value is determined by the analytical standard deviation (SD_{a}) of the method. A linear relationship between the target values of the two methods is assumed: By correctly assuming that measurement errors are present for both *x* and *y* measurements, the Deming regression procedure provides an unbiased estimate of the slope β (see *Appendix*). This is in contrast to the case for OLR analysis. By ignoring measurement errors for *x* measurements, the latter procedure provides a downward biased slope estimate (see *Appendix*).

To estimate the regression line by the Deming method it is, however, necessary to estimate or assign a value to the ratio between squared analytical standard deviations for the *x* and *y* method, (see formula in *Appendix*). λ determines the angle in which to project points onto the line to minimize the sum of squared deviations (5) (Fig. 1⇓ ).

For analytes with values extending over a considerable range, the analytical SD usually increases with the measurement level (6). Often a proportional relationship approximately exists over the major part of the range, which implies that the analytical coefficient of variation (CV_{a}) is approximately constant. In this situation, we have the relation . Deming regression may in this case be carried out either in the simple, unweighted form or in a weighted form in which weights are introduced that are inversely proportional to the squared analytical standard deviation at a given measurement level (see *Appendix*). The latter procedure is the optimal one (3).

### simulation procedure

The performance of regression methods is dependent on the relation between the dispersions of measurement errors and the dispersions of target values. The larger the dispersion of measurement errors is in relation to the dispersion of target values, the larger is the imprecision of the slope estimate. For the ordinary least-squares procedure, the bias of the slope estimate depends on the ratio between the dispersions of *x* measurements and *X* target values (7). In clinical chemistry, one observes a wide range of ratios between measurement error and target value dispersions. High ratios occur for compounds that are tightly regulated in the body, e.g., electrolytes, whereas small ratios may occur for substances with a very wide physiological variation, e.g., various hormones whose serum concentrations may span several decades. Intermediate ratios occur for various metabolites such as glucose, urea, and others. In the present study, the focus is on two prototype situations: an “electrolyte-like situation” with a small dispersion of target values and a “metabolite-like case” with a span of target values close to one decade. The performance of the two regression methods was evaluated for various measurement error combinations that should be realistic in clinical chemistry. Random numbers were generated by a computer according to specified distributions. The measurement error distributions were supposed to be gaussian. The regression estimates were computed and subjected to statistical tests as described (3)(8). The computational methods were effected using a modification of the software program CBstat, which is a Windows application for Deming regression analysis developed by the author. Each simulated situation was repeated 5000 times to obtain reliable performance measures. The performance measures were as follows:

*(i)* *Bias of the slope estimate*. The bias is the difference between the true value (β) and the average of estimated slope values for 5000 *(nrun)* simulation runs (*b*_{m}). The true value was set to 1 in the simulated situations corresponding to the null hypothesis situation of identity.

*(ii)* *Root mean squared error of the slope estimate*. The root mean squared error (RMSE) is an estimate of the total error of the slope estimate *(b)* and includes a systematic part (bias) and a random error part (standard error): *(iii)* *Real standard error of the slope [SE(b)]*. This is the standard error observed in the simulation study, i.e., the standard deviation of the distribution of slope estimates for the 5000 simulation runs.

*(iv)* *Average estimated standard error of the slope [SE(b)]*. In each simulation run, a standard error of the slope is estimated as a result of the statistical analysis, so that a *t*-test of the null hypothesis can be performed. A prerequisite for a correct test is that the estimated standard error, on average, agrees with the real one.

*(v)* *Hypothesis testing.* The performance of hypothesis testing can be evaluated by comparing the observed and expected frequencies of rejection of the null hypothesis on the basis of the *t*-test for the slope carried out in each simulation run. Under the null hypothesis, one expects 50 rejections out of 1000 trials, when the nominal type I error is set to the usual level of 5%. Thus, if the observed number of rejections is 200, the actual type I error is four times the nominal or expected value, and testing of the null hypothesis is not performed in the correct way.

## Results

### electrolyte case

The comparison of two methods for the measurement of serum sodium was simulated. The target values were supposed to be gaussian distributed with a mean value of 140.5 mmol/L, corresponding to the mean of the reference interval, stated to be 136–145 mmol/L (9).The SD of the target values was set to 3.8 mmol/L. This corresponds to a range exceeding the reference interval by ∼5 mmol/L in each direction, corresponding to the inclusion of some pathological samples. The analytical SDs of the methods were set to 1.405 or 2.81, corresponding to CVs of 1% or 2% at the mean level. For this model, the Pearson product–moment correlation coefficient was 0.88 when the analytical SDs both were 1.405, and 0.75 when one SD was 2.81. It was assumed that the analytical SDs were unknown, and the Deming regression analysis was performed with an assigned SD ratio of one for all cases. OLR analysis was carried out for the sake of comparison. Fig. 2⇓ , A-C, and Table 1⇓ refer to SD ratios of 1:1, 1:2, and 2:1 in the simulated examples. Fig. 2A⇓ shows that the average Deming regression line coincides with the diagonal as a sign of an unbiased slope estimate for this situation with a correctly specified analytical SD ratio. The OLR procedure has a downward bias of 12% because of the neglect of measurement errors for *x*. Accordingly, the RMSE of the least-squares method is almost twice the size of the Deming case.

For the situations with analytical SD ratios different from 1, both types of slope estimates become biased (Fig. 2⇑ , B and C). The worst case is that for the least-squares method and an analytical CV of 2% for the *x* measurements (bias, −35%). However, the Deming method also gives slope results with a considerable bias in the situations with a misspecified SD ratio, up to 24% in the example shown here. The bias of the Deming method is positive when SD_{ay} exceeds SD_{ax} and negative when SD_{ay} is smaller than SD_{ax}. In contrast, the bias of the OLR method is always negative and only depends on the relation between *x* measurement errors and the dispersion of *X* target values (see *Appendix*).

A more detailed account on the relation between analytical error ratio and bias is displayed in Fig. 3⇓ . In Fig. 3A⇓ , SD_{ax} is fixed at 1.405 (CV_{ax}, 1%), and SD_{ay} increases from 1.405 to 2.81 (CV_{ay}, 2%). In this situation, the bias of the Deming method gradually increases to a maximum value of 24% as the ratio deviates more and more from the assumed value of one, whereas the bias of the OLR method is constant (−12%). In Fig. 3B⇓ , SD_{ay} is kept fixed at 1.405 (CV_{ay}, 1%), and SD_{ax} increases from 1.405 to 2.81 (CV_{ax}, 2%). Now the biases of both procedures increase with increasing SD_{ax} value, so that the maximum biases are observed at a SD_{ay}/SD_{ax} ratio of 0.5 (Fig. 3B⇓ ). The bias of the Deming procedure increases from zero to a maximum of −18%, whereas the bias of OLR starts at −12% and increases to −35%.

When the null hypothesis has been tested in cases where no difference exists between the target values, the bias has led to the null hypothesis being rejected too frequently. For the OLR method, the frequency ranges from 19% to 98%. In the latter situation, the test will almost always produce a rejected null hypothesis, i.e., the conclusion of the statistical analysis is that there is a systematic difference between the two methods although no real difference exists. With regard to the Deming method, this frequency ranges from 5.9% to 44%.

In the example described above, the target values of the 50 samples were randomly drawn from a gaussian distribution with a mean corresponding to the mean of the reference interval. Therefore, the majority of sample values are located in the middle of the reference interval with relatively few extreme sample values. In method comparison studies, the investigator may choose to deliberately select samples with values in the periphery of the range of interest to obtain a more uniform distribution over the studied interval and thereby increase the precision of estimated regression parameters. To study such a procedure, one may simulate drawing of sample sets from a uniform distribution of target values covering the range of interest (131–150 mmol/L) instead of a gaussian distribution, such as the cases dealt with above. After this change, the Pearson coefficient of correlation increased to 0.94 when both analytical SDs were 1.405 and to 0.87 when one of the analytical SDs was equal to 2.81. For this model, the same general trends are observed, but in a reduced scale. The Deming procedure with λ specified to one yields slope biases of 10% and −9% for SD_{ax}/SD_{ay} ratios of 1:2 and 2:1, respectively. Similarly, the slope biases of the OLR procedure are also reduced to about one-half of the values, −6% and −20%, respectively. Thus, by assuring a more even distribution of sample values over a given interval, the bias problems can be reduced but not avoided.

### metabolite case

Many clinical chemical analyses have range ratios in the interval 5–10, e.g., serum glucose determinations. For this analyte, patient samples may typically range from ∼2.5 to 20 mmol/L. In a method comparison study, most of the values are likely to be located in the lower half of this interval. Thus, in the simulation model it is supposed that three-quarters of the observations are located in the lower half and one-quarter in the upper half of the interval. The measurement errors were assumed to be proportional to the concentrations, which is a more common situation than that of constant analytical errors for analytes with a considerable range (6). Proportional measurement errors imply that the analytical coefficients of variation, CV_{ax} and CV_{ax}, are constant over the measurement range. Their values were set to 4% and 8%, respectively, so that the ratios of analytical coefficients of variation were 1:1, 1:2, or 2:1 (Table 2⇓ ). The case with a ratio of 2:1 is shown in Fig. 4⇓ . The Pearson coefficient of correlation was 0.99 when both CVs equalled 4% and 0.98 when one CV was 8%. In accordance with the specified model, the weighted form of Deming regression analysis was carried out assuming proportional errors (see *Appendix*). Again, the sample was composed of 50 single observations for each method. All slope estimates obtained by the least-squares regression method are biased, but because of the wider range of target values, the bias is either negligible (0.7%) or small (2.7%). Hypothesis-testing, however, is clearly misleading with type I errors four to six times larger than the nominal one of 5%. This relies partly on the bias, partly on an underestimated standard error.

The weighted Deming method yields an unbiased slope estimate in the case with an analytical CV ratio of 1:1 and biases up to only 0.7% in the other cases. Thus, in the present example, the maximum bias of the weighted Deming procedure amounts to only about one-fourth of the maximum bias of the OLR method and so may be regarded as negligible. Furthermore, estimation of the standard error of the slope, and thus hypothesis-testing, is rather insensitive to a misspecified error ratio. The observed type I error values (5.3–5.5%) are very close to the nominal value of 5%. Thus, although the analytical SD ratio might have been misspecified, hypothesis-testing is reliable in this situation, with a relatively wide range of target values compared with the dispersion of measurement errors.

## Discussion

OLR analysis is still the most frequently applied procedure to evaluate method comparison data despite several well-known shortcomings, such as a bias associated with slope estimation, incorrect testing of statistical hypotheses, and low efficiency in cases with proportional errors over one or more decades of values (3). The bias problem may be overcome with the Deming regression procedure, and hypothesis-testing can be done correctly on the basis of this regression procedure if standard errors of slope and intercept are estimated on the basis of modern statistical resampling methods (8)(10). Finally, by applying a weighted modification of the Deming procedure as done here for the glucose example, full efficiency may be attained in cases with proportional analytical errors (8). Therefore, application of the Deming procedure is generally preferable to OLR analysis. Specification of the squared analytical SD ratio λ may, however, pose problems. The easiest way to attain a correct λ value is to use duplicate measurements in the method comparison study, allowing for simultaneous estimation of analytical SDs or CVs and so λ. The use of duplicate sets of measurements in method comparison studies is generally recommended (11). However, taking into account that most method comparison evaluations appear to be based on single measurements, one should strive to specify λ correctly. In most cases, quality-control data will be available, and therefore, λ may be specified as the ratio between recorded squared analytical SDs. As shown, misspecification of λ may induce a considerable slope bias for the Deming method in electrolyte-like situations with a short range of data. Hypothesis-testing is also seriously affected. Thus, a correct specification of λ is important when applying the Deming method in situations with a limited range of values. With regard to the metabolite-like case, the bias problem is negligible; also with regard to the testing of hypotheses, the Deming procedure appears to be rather robust towards a misspecified error ratio in these examples.

More generally, when considering the sensitivity of the Deming procedure to a misspecified error ratio in a given situation, one may to some extent be guided by the value of the correlation coefficient. Electrolyte-like cases are characterized by low values for the correlation coefficient, whereas the correlation coefficient is higher in situations with a wider range of values. In cases with a misspecified error ratio, the bias of the Deming method is smaller than the maximum bias observed for OLR analysis. A value of 0.975 for the correlation coefficient has been suggested as the lower limit for acceptable performance of OLR in method comparison studies (11). Thus, according to this point of view, the Deming method may be regarded as relatively insensitive to misspecified error ratios in situations with correlation coefficients exceeding 0.975.

The impact of a misspecified error ratio on the performance of Deming regression analysis has also been considered in the theoretical literature (12). A theoretical evaluation shows that the RMSE of the Deming slope is at a minimum when the error ratio is correctly specified and increases the more the specified ratio deviates from the true ratio. However, for reasonable parameter examples, the RMSE stays lower than that of OLR (12).

Some other regression methods that take measurement errors for both *x* and *y* into account apparently do not exhibit the problem with specification of the SD_{ax}/SD_{ay} ratio. In the so-called standardized principal component analysis, the slope is computed in a slightly different way. The procedure actually presupposes that the error ratio is related to the slope, that is (SD_{ax}/SD_{ay}) = 1/β (13). Only when this assumption is given does the method provide a slope estimate free of any bias. This procedure is thus not very flexible and, therefore, not as useful as the Deming procedure. Similarly, the rank regression method of Passing and Bablok operates with the same rigid assumption (14). If the relation does not hold true, the slope estimate becomes biased (3).

### estimation of the deming regression line

In situations with constant analytical errors, i.e., analytical SDs that are independent of the measurement level, the unweighted form of Deming regression analysis is appropriate. The procedure relies on computations of sums of squared deviations and cross-products: The subscript *m* refers to the mean of the variable.

In case of duplicate sets of measurements each *x*_{i} and *y*_{i} represent the mean of individual measurements (*x*_{i} = (*x*_{1i}+*x*_{2i})/2 and *y*_{i} = (*y*_{1i}+*y*_{2i})/2). Analytical standard deviations for methods *x* and *y* are estimated as: and The Deming regression line is estimated as: *X*_{est}_{i} and *Y*_{est}_{i} refer to estimated target values.

Standard errors of *a*_{0} and *b* are estimated on the basis of a resampling principle, the so-called jackknife procedure (8).

In weighted Deming regression analysis, the slope and intercept are estimated analogously, with the modification that weights are introduced (8). The weights (*w*_{i}) are inversely proportional to the squared analytical standard deviations at a given concentration. Given a proportional relationship between analytical standard deviations and analyte concentrations and supposing that α_{0} = 0, we have: *X*_{i} and *Y*_{i} are unknown target values, which must be replaced by estimates obtained by an iterative procedure as described (8): If the proportional relationship does not hold true in the lower part of the analytical range, which often is the case and is indicated by an increasing analytical CV in this region, one may truncate the relationship at a suitable lower limit *L*, so that *w*_{i}=*L*^{−2} for *x* and *y* less than *L* (3) .

Now, weighted averages are computed: and weighted sums of squares and cross-products: The weighted Deming regression line is estimated as: λ here refers to CV , which in case of single sets of measurements is assigned a value and in case of duplicate sets of measurements is estimated from the data.

### estimation of the olr line

The parameters of the OLR line are estimated as: and *p* and *u* refer to the sums of cross-products and squared deviations, respectively, as described above.

Standard errors of *b* and *a*_{0} are calculated as described (7).

The bias of the slope estimate is determined by the relation: β_{OLR} is the average slope obtained by OLR analysis; β is the true slope between *X* and *Y* target values; and SD_{X} refers to the dispersion of *X* target values (7).

- © 1998 The American Association for Clinical Chemistry