## Abstract

*Background:* Various forms of least-squares regression analyses are used to estimate average systematic error (bias) and its confidence interval in method-comparison studies. When assumptions that underlie a particular regression method are inappropriate for the data, errors in estimated statistics result. In this report, I present an improved method for regression analysis that is free from the usual simplifying assumptions and is generally applicable to linearly related method-comparison data.

*Methods:* Theoretical equations based on the Deming approach, further developed by physicists and extended herein, were applied to method-comparison data analysis. Monte Carlo simulations were used to demonstrate the validity of the new procedure and to compare its performance to ordinary linear regression (OLR) and simple Deming regression (SDR) procedures.

*Results:* Simulation studies included three types of data commonly encountered in method-comparison studies: (*a*) constant within-method SDs for both methods, (*b*) constant within-method CVs for both methods, and (*c*) neither SDs nor CVs constant for both methods. For all cases examined, OLR produced unreliable confidence intervals of the estimated bias. However, OLR point estimates of systematic bias were reliable when the correlation coefficient was >0.975. SDR produced reliable estimates of systematic bias for all cases studied, but the confidence intervals of systematic bias were unreliable when SDs of methods varied as a function of analyte concentration.

*Conclusion:* Only iteratively reweighted general Deming regression produced statistically unbiased estimates of systematic bias and reliable confidence intervals of bias for all cases.

The objective of method-comparison studies for quantitative assays in laboratory medicine is to estimate average systematic bias and the confidence interval (CI)1 for estimated bias at medical decision levels. These estimates are then compared with a manufacturer’s claims or internal criteria to judge acceptability of the method under evaluation. When the test and comparative methods are linearly related in accordance with the linear statistical model (*y* = *a* + *bx*), linear regression analysis is commonly used to estimate the average bias and its CI. The n test method results *(y*_{i}*)* and corresponding comparative method results *(x*_{i}*)* throughout the data range are used to estimate parameters of the model (*a*, the intercept; and *b*, the slope) and their standard errors, SE*(a)* and SE*(b)*. The estimated bias (*B̂*_{C}) at medical decision level, *X*_{C}, is given by:

The CI of the estimated bias is given by:

where *t*_{(1 − α/2;n − 2)} is the Student *t*-statistic at the desired confidence level (1 − α) with (n − 2) degrees of freedom. The variance of the bias estimate is given by:

where *Var**(a)* and *Var**(b)* are the variances of the estimated intercept and slope, respectively. Note that *Var**(a)* = [SE*(a)*]^{2} and *Var**(b)* = [SE*(b)*]^{2}. The calculation of _{w}, the weighted mean of the comparative method values, is described below.

The reliability of the estimated bias and its CI depend on the appropriateness of the regression procedure for analysis of the particular set of experimental data. In current practice, regression procedures are selected based on what assumptions can justifiably be made about the data. For example, in ordinary linear regression (OLR), the most commonly used regression procedure for method-comparison calculations, it is assumed that comparative method values are without random error and that test method random error is constant throughout the range of the data. Although these assumptions are never strictly justified, results of OLR are of acceptable accuracy and precision when the random error of the comparative method is small compared with the range of the data and when the test method data are not “significantly” heteroscedastic. When OLR cannot be used because of substantial violations of its assumptions, an appropriate form of Deming regression may be selected.

Deming regression is the term used in laboratory medicine to refer to linear regression analysis in which the random error of both the comparative and test methods is taken into account. Although Deming’s approach to a generalized regression procedure was basically sound, he oversimplified the problem by expanding the straight line function in a Taylor series about assumed values of slope, intercept, and adjusted points. Because squared and higher terms in the expansion were neglected, Deming’s original general exposition can lead to significant errors in some instances, as he recognized.

Nevertheless, Deming presented an exact solution for the particular case in which both *x* and *y* are subject to random error but in such a way that the ratio λ = *Var**(x)*/*Var**(y)* is constant and not infinite or zero throughout the data range (1). With this constraint, he derived equations for the slope and intercept for a weighted least-squares regression model. When the variance of *x* is constant throughout the data range, the variance of *y* must be constant, and the equations for the Deming slope and intercept reduce to the well-known formulae for simple Deming regression (SDR) called to our collective attention by Cornbleet and Gochman (2). More recently, Linnet (3) independently rederived the cited formulae for the Deming slope and intercept and focused on a special case in which random errors in both *x* and *y* are proportional to the overall average value of the test and comparative results for each sample. For convenience, we refer to the Linnet procedure as “constant CV Deming regression”, although it is not precisely so. When assumptions about constant SDs or proportional SDs do not apply to the data, CIs of the bias are unreliable for the cited Deming methods.

We present here a generally applicable statistical procedure for Deming regression without constraints on random error of the test or comparative method. We then use Monte Carlo simulations to demonstrate the validity of the new procedure and to compare its performance to other regression procedures frequently used in current practice.

## Materials and Methods

### theory

Guided by Deming’s basic concepts, York (4) developed the foundations of an exact general treatment of the problem. York’s procedure, which contained errors in equations for SEs of slope and intercept, was used for certain method-comparison calculations by Gerbet et al. (5). Williamson (6) and, later, Reed (7)(8) corrected and refined York’s work to derive equations for the linear regression parameters and their SEs. To our knowledge, these corrected results have not been used for method-comparison calculations; therefore, the complete equations are reproduced here. As presented by Williamson (6), the slope and intercept are given by:

where:

Because *z*_{i}, *w*_{i}, _{w}, and _{w} are functions of *b*_{D}, an iterative calculation procedure is required.

Unbiased estimates of *a*_{D} and *b*_{D} are obtained with these equations when the true weights of the observed points (*x*_{i}, *y*_{i}**)** are known. In method-comparison work where weights are typically some function of concentration, weights corresponding to observed points are not optimal because the observed points are subject to random error of the method. We therefore extend the procedure described above by estimating improved weights based on the adjusted values ( _{i}, _{i}), which are those points through which the least-squares line is drawn and which represent our best estimates of the true values (*X*_{i}, *Y*_{i}). Linnet (3) used a similar approach for weighting observed points in his development of constant CV Deming regression. The relationships between observed and adjusted points were given by York (4):

Weights for each observed point are calculated iteratively. After an initial estimate of *a*_{D} and *b*_{D} based on weights derived from observed points, revised weights are computed using adjusted points, which in turn are used to calculate new values for *a*_{D} and *b*_{D}. The process is repeated using updated estimates of adjusted values and weights until the correction to *b*_{D} is <0.0001. In our experience, four or fewer iterations are required for convergence, even for extremely imprecise methods. We refer to this overall procedure for obtaining the unbiased slope and intercept as iteratively reweighted general Deming regression (IRGDR).

Williamson (6) derived the variances of the estimated slope and intercept from first order derivatives of *b*_{D} and *a*_{D} with respect to the observed points (*x*_{i} and *y*_{i}):

where:

The derivatives used to calculate *Var**(b*_{D}*)* and *Var**(a*_{D}*)* may also be evaluated at adjusted points. For well-correlated data typically encountered in method-comparison studies, the difference in variances estimated by the two procedures is small, with values based on observed points being slightly larger.

### simulation studies

We compared the performance characteristics of the IRGDR procedures to those of OLR and SDR using Monte Carlo simulations. For each simulation run, we set the true slope at 1.0, the true intercept at 0.0, and n = 50 samples with duplicate values for test and comparative methods at each point. The random error for each simulated result had a gaussian distribution. Regression calculations were performed by each procedure using only the first replicate of each analytical method to estimate the average bias and 95% CI of the bias at medical decision levels. For SDR calculations, duplicates of the test and comparative method results for each sample were used to estimate SDs, and SEs of the SDR slope and intercept were computed using the general Deming regression relationships with the constant SDs. Computations were performed with an adaptation of a Windows^{®} application developed by the author (EP_Suite 9A, a module in EP_Suite for Windows^{TM}) that contains components for each of the regression procedures.

## Results

The results of three representative simulations are summarized in Table 1⇓ : Table 1A presents a sodium evaluation with data in the range 132–155 mmol/L and constant SDs; Table 1B presents an albumin evaluation with data in the range 15–50 g/L and constant CVs; and Table 1C presents a glucose evaluation with data in the range 2.2–27.8 mmol/L (40–500 mg/dL) with neither SDs nor CVs held constant. In the last case, SDs varied linearly from 0.055 at 2.2 mmol/L to 0.166 at 27.8 mmol/L for the comparative method, whereas test method SDs ranged linearly from 0.111 to 0.555 mmol/L over the same concentration interval.

For each of 5000 simulation runs for each case, the slope, intercept, their respective SEs (based on observed values), and the 95% CI of the systematic bias were computed. The means and SDs of the 5000 slopes (and intercepts) are listed as the “average slope” (intercept) and SD of slopes (intercepts). The root-mean-square of the 5000 computed SEs [SE*(a)* and SE*(b)*] are tabulated in their respective rows. The proportion of calculated CIs of systematic bias that did not include the true value of bias (0.0) at each medical decision level is represented by α̂. Thus α̂ is the confidence coefficient estimated from the simulation runs.

An adequate regression procedure must provide *(a)* statistically unbiased estimates of slope and intercept to compute unbiased point estimates of systematic bias at each medical decision level, and *(b)* an estimated confidence coefficient (α̂) equal to the preestablished value (0.05 in our study), thus indicating reliable CIs of the bias at each level. Review of the data presented in Table 1⇑ leads to the following conclusions regarding the adequacy of the different regression procedures for the various cases:

Only IRGDR produced unbiased estimates of systematic bias and reliable CIs of bias for all cases.

SDR produced unbiased estimates of systematic bias in all cases, but its CIs of the bias were unreliable when data were heteroscedastic (cases B and C). (Note: Results shown in Table 1⇑ for SDR are based on method SDs calculated from method duplicates. When method variances are constant, as in Table 1A, equivalent results are obtained whether true or estimated SDs are used for the calculations.)

OLR led to statistically biased estimates of systematic bias and marginally reliable CIs of the bias for the constant SD simulation (case A).

When random error of the comparative method

*(x)*is small compared with the analytical range of the data (cases B and C), OLR yielded unbiased (or nearly so) estimates of systematic bias, but CIs of bias were unreliable.

Relative to point 4, we evaluated the use of the correlation coefficient *(r)* as a criterion for assessing whether the range of data is adequate for use of the OLR procedure. Among others, Westgard (9) and NCCLS (10) have indicated that if the correlation coefficient is ≥0.975, OLR may be used to estimate systematic bias. In our studies, *r* was <0.975 in 99.8% of simulation runs for case A, correctly indicating that OLR should not be used. For cases B and C, *r* was ≥0.975 in 99.8% and 100.0% of cases, respectively, indicating that OLR should be adequate under these conditions. Thus, our data support the usual correlation coefficient criteria for using OLR to estimate average systematic error. However, when the range of data is very broad, heteroscedasticity is likely and the CI of the bias based on OLR should be considered suspect, as revealed in cases B and C.

## Discussion

We conclude from these simulations that IRGDR yields unbiased estimates of the regression parameters and their SEs without constraints on random errors of test or comparative methods. The procedure will therefore be generally useful in estimating systematic difference (bias) and its CI in method-comparison studies whenever a linear relationship exists between the test and comparative methods. The main advantage of the new IRGDR technique over SDR is the reliability of the CI for test method bias. With reliable CIs for bias, we can now depend on conclusions regarding the probability of acceptability of test method bias.

From a practical point of view, the improved reliability of general Deming regression comes at the cost of knowing (or determining) imprecision profiles for both test and comparative methods. In SDR, the constant imprecisions calculated from sample duplicates or from external imprecision studies serve the purpose well. When imprecision of one or both methods varies across the concentration interval, the task is less straightforward. Although several procedures exist for determining weighting functions directly from the method-comparison data (11)(12), such procedures may be risky in view of the not uncommon paucity of imprecision information at the extremes of the data. Furthermore, when weights are estimated directly from the data, calculated results are somewhat less reliable because estimating weights introduces another source of variability.

We have, therefore, preferred to use imprecision profiles from data external to the method-comparison experiment (e.g., imprecision studies, reportable range studies, quality control, or manufacturer’s documentation). There are two primary requirements on such external imprecision data: *(a)* data must accurately reflect imprecision on authentic patient samples, and (*b)* the imprecision profile must span the entire range of data collected in the method-comparison experiment. With imprecisions on three to seven levels, we then use cubic spline calculations to create a continuous imprecision profile for each method.

As with any weighted regression procedure, performance depends on the adequacy of the imprecision profiles. To validate the IRGDR procedure, the simulation studies presented here used known (true) imprecision profiles, which in practice are rarely available. Thus, the simulation results presented may give a somewhat optimistic impression of performance of the method.

One of our Windows programs that performs IRGDR calculations is available as a data supplement from the *Clinical Chemistry* Web site. The file can be accessed by a link from the on-line Table⇑ of Contents (http://www.clinchem.org/content/vol46/issue1/). Executing the downloaded file named “Deming.exe” will create a Windows program group that includes the statistical program (GDR), instruction manuals (in Adobe^{®} Acrobat^{®} pdf format), and a Readme text file that contains important information about the system.

## Footnotes

MarChem Associates, Inc., 325 College Rd., Concord, MA 01742. Fax 978-371-9055; e-mail bobmartin{at}marchem.com

↵1 Nonstandard abbreviations: CI, confidence interval; OLR, ordinary linear regression; SDR, simple Deming regression; and IRGDR, iteratively reweighted general Deming regression.

1 5000 simulation runs for each case.

2 Significantly different from 1.0 (

*P*<0.001).3 Significantly different from 0.0 (

*P*<0.001).

- © 2000 The American Association for Clinical Chemistry