## Abstract

We describe a data reduction procedure to assign statistically accurate estimates of unknown hormone concentrations, with associated uncertainties, based on experimental uncertainties in sample replicates and the fitted calibration curve. Three mathematical calibration curve functions are considered. The one providing optimal statistical characterization of reference calibrators is chosen for unknown evaluation. Experimental error is addressed by assigning and propagating uncertainty estimates for each measured response (including zero-dose responses) by an empirically determined discrete uncertainty profile and by propagating calibration curve uncertainty. Discrete uncertainty profiles account for both response precision (replicability) and accuracy (deviation from predicted calibration curves) without relying on assumed theoretical response variance–assay response relations. The validity of assigning variable response weighting by this procedure was assessed by Monte Carlo simulations based on chemiluminescence growth hormone calibration curves. Much-improved accuracy and estimated precision are achieved for unknown hormone concentrations, particularly extremely low concentrations, by using this variable response weighting procedure.

Since the development of RIAs in the early 1970s and their widespread application to clinical chemistry, enhanced sensitivity and precision have been achieved by nonradioactive indicator techniques such as fluorometry and chemiluminescence. Empiric fitting of the calibration curves so generated has not always kept pace with the enhanced physical precision of these methods. For example, common assay data reduction procedures used in clinical and research laboratories often disregard the experimental uncertainties in the calibration curve replicates, the fitted parameters of the calibration curve, the replicates of unknown samples, and (or) the variability in zero-dose calibrators. Beyond intrinsic physical imprecision in the instrumentation (e.g., time-resolved fluorescence or photon counting), the foregoing sources of experimental variation impart finite uncertainties to each unknown sample determination. Such uncertainty should be known to a reasonable degree of accuracy to distinguish true assay sensitivity from low-end assay noise, i.e., the apparent “blank” of the assay, and also for numerous subsequent applications of the assay results. For example, calculation of endogenous or exogenous hormone kinetics, endogenous secretion rates, and statistics based on the regularity of hormone pattern reproducibility typically rely on variably weighted nonlinear fits or Monte Carlo-based estimation of asymmetric confidence intervals for parameters with within-sample uncertainty predictions. The latter are commonly estimated from duplicate, or occasionally singlet or triplicate, measurements of the unknown sample analyte concentration.

Here, we present a comprehensive effort to define sample uncertainty based on combined experimental variations inherent in: *(a)* replicates of the calibration curve; *(b)* replicates of the zero-dose calibrator; *(c)* uncertainty in the calculated calibration curve parameters; and *(d)* replicate measurements carried out on unknown samples (e.g., duplicate, triplicate, etc.). In assigning response uncertainty estimates, however, we depart from the conventional approach in which some theoretical variance function is used to attempt to analytically relate response variance to assay response (1)(2)(3)(4)(5)(6). Instead, because the distribution of response variance to assay response is so highly variable and therefore generally poorly analytically determined (1)(2)(3)(4)(5)(6), we adopt an approach in which response errors are estimated on a case-by-case basis via an empirical procedure. And, because our means of estimating response errors is empirical, we present extensive empirical validation of our concept of error propagation via Monte Carlo simulations, which examine the properties and behavior of this data reduction procedure. The basis for the Monte Carlo simulation experiments is a set of highly consistent assay data from 14 growth hormone (GH) chemiluminescence assays (7), wherein high sensitivity and well-defined experimental uncertainty estimates were desirable.1

**Methods**

To accommodate the largely monotonic increases or decreases in immunologically based calibration curves, we initially evaluated three empiric algebraic functions, with the forms given below. These functions include modifications of the widely used four-parameter logistic function (1)(8)(9) and a modified four-state Adair expression.

### three calibration curve functions

Three functional forms are considered for analyzing hormone assay calibration curves.

The most flexibly accommodating monotonically sigmoid function of the three is a modified four-state Adair expression (referred to as MONOTONE): Here, <*R*_{i}> is the estimated assay response of reference calibrator i, *Y*_{0} is the response at zero hormone concentration, *Y*_{∞} is the response at infinite hormone concentration, the *O*_{j} are fitting parameters, *[H]*_{i} is the concentration of reference calibrator i, and N may assume values of 1, 2, or 3 (defining the “order” of fit). Order N = 3 is automatically tried first. If unable to fit to the N = 3 condition, an order N = 2 fit is automatically tried next. If unable to fit to the N = 2 condition, an order N = 1 fit is automatically tried finally.

A second function is a modified four-parameter logistic function (1)(8)(9) (referred to as 4PARMS): *A* is the response at zero hormone concentration, *D* is the response at infinite hormone concentration, and log*B* and log*C* are fitting parameters. Fitting to the logarithms of parameters *B* and *C* obviates estimating unrealizable negative values.

A third function is a modification of 4PARMS (referred to as MOD4P):

All calibration curve parameter estimations are performed by a modified Gauss–Newton nonlinear least-squares parameter estimation algorithm (10)(11) to a convergence criterion of <10^{−6} relative change in variance of fit.

### estimating assay response error (empirical discrete response uncertainty profile)

Replicate-based assay response uncertainties (response SDs) are estimated for each response at every reference concentration. The process is performed iteratively in parallel with successive evaluations of calibration curve parameter values. The first estimation of calibration curve parameter values is performed with unit-weighted responses at each calibrator dose. A model-independent discrete response uncertainty profile is then calculated at each reference concentration i by first calculating the root-mean-square response deviation, SD_{i}, relative to the expected response predicted by the current calibration curve, <*R*_{i}>, as where n_{i} is the number of replicate responses at concentration i, and *R*_{ij} is the jth response at concentration i. Distance-proportional nearest-neighbor smoothing is then used to generate smoothed estimates for a discrete uncertainty profile, <SD_{i}>, by smoothing interior points as and endpoints as Here, the indices 1 and n refer to the lowest and highest reference concentrations, respectively.

Repeated rounds of calibration curve parameter estimation are then performed with variably weighted assay response values based on the current estimated response uncertainty profile. Iterative, parallel estimation of calibration curve parameter values and response uncertainty profiles is continued until approximately no change in calibration curve variance of fit is observed between rounds. Typically, the relative change in variance of fit has been observed to be less than ∼10^{−6} after 10–11 rounds of estimation. The protocol is currently implemented by using a fixed number of 30 coupled iterative rounds of estimation.

At the conclusion of these 30 rounds of estimation, the uncertainty profile is multiplicatively adjusted so as to produce a final calibration curve variance of fit of unity, so that uncertainty estimates for calibration curve parameter values can be then evaluated.

### uncertainty in the calibration curve

At the conclusion of the last calibration curve parameter estimation (to variably weighted assay response values that produce unit variance of fit), approximate nonlinear asymmetric joint confidence limits are evaluated for each calibration curve model parameter at a confidence probability level of 68.26% (the probability corresponding to 1 SD) according to Here, α is the maximum likelihood vector of calibration curve parameter values, *p* is the number of parameters being estimated, n is the number of calibration curve data points, *prob* is 68.26%, *F* is Fisher’s *F*-distribution, α′ is a vector of parameter values statistically different from α at probability level *prob*, and *WSSR* refers to weighted sum of squared residuals.

Vectors α′ (4*p* of them) are sought by searching each parameter dimension bidirectionally as well as by searching both directions along each axis of the *p*-dimensional hyperellipsoid given by where the elements of *H*^{T} *H*, the Hessian or information matrix, are given by Here, the summation is over all n data points in the caibration curve, ς_{i} is the estimated response SD for reference concentration *[H]*_{i}, and the partial derivatives are of the calibration curve function, SC, with respect to the jth and kth fitting parameters, α_{j} and α_{k}, respectively.

The 4*p* sets of parameter values, α′, identified in this way constitute an approximate mapping of a 68.26% constant probability contour in the *p* 1-dimensional calibration curve parameter-variance space. Estimated SDs of derived hormone concentrations in unknown samples are then generated by calculating concentrations corresponding to each of the 4*p* 1 sets of identified parameter values (the vector α and the 4*p* vectors α′) for each observed response as well as at the observed response ± the estimated response SD (as estimated from the discrete response uncertainty profile). One-half the difference between maximum and minimum calculated concentrations is recorded as the estimated hormone concentration SD.

### combining concentration estimates and uncertainties in multiple-replicate samples

The above description applies to any single assay replicate. Most unknown samples are assayed in duplicate, or sometimes as higher-order replicates. Mean hormone concentrations of multiple-replicate samples, <*[H]*>, are calculated as variance weighted means (12) where the summations are over all n replicates, *[H]*_{i} is the hormone concentration estimate for replicate i, and ς_{i} is the corresponding single replicate hormone concentration uncertainty. The joint experimental hormone concentration uncertainty associated with multiple-replicate means, ς_{mean}, is calculated from the individual replicate hormone concentration uncertainties, ς_{i}, as (12)

### calibration curve assay response conditions for monte carlo simulations

Fourteen highly consistent chemiluminescence GH calibration curve data sets (7) were the basis for the conditions outlined below and used in Monte Carlo simulation experiments. A broad range of simulation conditions was examined to validate the data reduction protocol and to elucidate the performance characteristics to be expected when analyzing calibration curves constructed with one, two, three, four, or five replicates per reference concentration. Additionally, for each number of replicates, three data reduction methods were examined with *(a)* variably weighted assay responses (via the above discrete response uncertainty profile), *(b)* uniformly weighted responses, and *(c)* uniformly weighted responses excluding zero-hormone-concentration calibrators.

### monte carlo simulation experiments

One thousand synthetic calibration curve data sets were produced with one, two, three, four, and five replicates per reference concentration to simulate the desired assay performance reported in the preceding table. Gaussian distributed assay response values with the above specified means and SDs were randomly generated by summing 12 uniformly distributed random variables in the range 0 to 1 and subtracting 6 (producing a standard normal deviate with zero mean). This standard normal deviate was multiplied by the specified target response SD and added to the corresponding chemiluminescence response value to produce a value for inclusion in the synthetic calibration curve data set being constructed.

Each of the 5000 calibration curve data sets was subjected to nine data reduction analyses. The functions MONOTONE, 4PARMS, and MOD4P were applied to each data set in which assay response values were weighted either variably, uniformly, or uniformly excluding zero(es). For each of the three response weighting schemes, the function producing the smallest absolute sum of squared residuals (SSRs) was selected as the preferred model. (Absolute SSRs refers to the SSRs of the fitted curve to the calibration curve response values when applying unit weight to each response value.)

For each of the 5000 calibration curve data sets, an additional single-replicate data set was randomly generated as described above. This data set was treated as an “unknown” set of assay responses to which data reduction by the selected calibration curve analysis was applied. Estimated hormone concentrations (and associated uncertainty estimates, in variable weighting scheme analyses) were recorded and summarized to characterize the performance of these data reduction protocols.

**Results**

### calibration curve analysis of a synthetic gh calibration curve

Simulations were based on 14 well-characterized GH chemiluminescence assays (7). Fig. 1⇓ illustrates our comprehensive calibration curve analysis with the discrete response uncertainty profile applied to five uniformly distributed replicates.

### model selection results

Based on 5000 simulated calibration curves, Table 1⇓ summarizes the best-fitting functional forms. Of the calibration curves, 62–90% were best fit via the MONOTONE function (lowest absolute SSRs), and 9–28% by MOD4P. A minority (1.3–12%) showed optimal fitting via the 4PARMS model. All of the 5000 replicated calibration curves, independent of numbers of replicates per dose, were fitted by at least one of the three models. There was a tendency for variable (vs uniform) weighting to favor fitting with MOD4P, although the MONOTONE function was still the most adopted function in about two-thirds of the fits.

### prediction of assay (gh) concentrations

Figure 2⇓ illustrates the prediction of GH concentrations by the three different weighting schemes. Predictions are shown for one to five simulated replicates. Median and 68.26% confidence intervals are shown compared with different target ranges. Variable weighting reduced experimental uncertainty at the lowest hormone concentrations and in the zero-dose calibrators.

### estimated hormone concentration standard deviations

Figure 3⇓ shows estimated hormone concentration SDs for variable weighting calibration curve analysis based on 1000 Monte Carlo simulations at each replicate level. For GH reference concentrations <1.33 μg/L, increasing replication number beyond singlets increased the reliability of experimental uncertainty estimation as evidenced by the narrower range of [GH] SDs at higher replicate numbers. This effect of sample replication was lost at high GH concentrations. The horizontal lines in Fig. 3⇓ correspond to SD estimates obtained directly from evaluating the distributions of hormone concentration estimates produced by the simulations (as in Fig. 2⇑ ). The generally higher SDs produced by the variable weighting protocol reflect the additional uncertainty introduced to concentration estimates as a result of considering the error in the calibration curve itself.

### calibration curve analysis of high-replicate-number caibration curves

Figure 4⇓ shows results of high-replicate-number calibration curve analysis of a GH chemiluminescence assay (GH Chemi) and a lutropin (LH) IRMA. For each set of calibration curve data, analysis was performed by using each of the three calibration curve functions in which assay responses were weighted variably, uniformly, and uniformly excluding zeroes. For both the GH Chemi and the LH IRMA, the MONOTONE calibration curve function provided the lowest absolute SSRs for each of the three assay response weighting schemes. Plotted in Fig. 4⇓ are the three resulting MONOTONE calibration curves for each weighting scheme. The three curves are nearly superimposable in each case, but do deviate somewhat at the lowest (zero) hormone concentration, particularly for the GH Chemi. The error bars on the points and the plots of estimated assay response error (insets) are those for the variably weighted assay response analysis. A detailed comparison of the back-calculated results obtained by each of the three assay response weighting schemes is presented in Table 2⇓ .

**Discussion**

As part of a systematic characterization of experimental uncertainty inherent in assay measurements, we have examined the performance of three monotonically varying algebraic forms for the assay dose–response function, and delineated the joint experimental uncertainties inherent in the fitted curve, assay replicates, calibrator replicates, and zero-dose tubes. Among the three common fitting functions explored (modifications of the logistic and Adair expressions), the modified four-state Adair equation (MONOTONE) was favored by the majority of calibration curve realizations evaluated here, whether composed of one, two, three, four, or five replicates per dose. A modified four-parameter logistic function [4PARMS], however, very commonly used for calibration curve analysis (1)(8)(9), was also adaptable. We further demonstrated that the variable response weighting protocol exhibited performance superior to that of either the uniform response weighting protocol or the protocol involving uniform response weighting without consideration of zero-hormone-concentration calibrators, since variable weighting provided greater accuracy and precision, especially in determining low-end hormone concentrations. Indeed, single-replicate conditions with variable weighting provided better performance than even the quintuple-replicate cases involving uniform weighting with or without utilization of zero calibrators at concentrations below ∼0.04 μg/L in the Monte Carlo-simulated GH chemiluminescence assay. Approximately equivalent performance among the three protocols was seen at higher hormone concentrations.

We further observed that all of the protocols were inaccurate and imprecise at 45 μg/L in the Monte Carlo-simulated GH chemiluminescence assay, illustrating the importance of characterizing critically the relevant operating range of any given assay configuration. With few replicates, as might be anticipated intuitively, there was a greater tendency to underestimate hormone concentrations on average because of a larger number of out-of-range points. In addition, we observed that whereas the variable weighting protocol can provide estimates of response and hormone concentration uncertainty even with single replicates, significant improvements in both the accuracy and precision of these estimates are achieved by the use of duplicates, with less remarkable further improvements evident on going to higher numbers of replicates. Thus, cost constraints vs precision requirements by the clinical chemist, clinician, and investigators will determine the desired replication density.

The present work also shows that an empirically based discrete response uncertainty profile is effective for estimating response errors at all except the highest reference concentration, with greater reliability achieved at higher replicate conditions. Perhaps unexpectedly, despite a general preference for the use of duplicates or higher numbers of replicates, even single determinations are moderately reliable below approximately the inflection point of the sigmoid calibration curve. This remains a persistent consideration in the (repeated) sampling of infants or children with limited blood volumes when assay miniaturization is imperfect.

Our analysis further indicates that estimating and propagating the effects of calibration curve uncertainty contribute noticeable effects on concentration uncertainty estimates beyond that due solely to response variability. Evidence of this is provided by our observation that GH concentration SDs were conservatively estimated by the variable weighting procedure when compared with directly estimated Monte Carlo [GH] SDs. Monte Carlo estimates were generally lower than those provided by the variable weighting protocol because calibration curve parameter uncertainty is not propagated as a contributor in the direct Monte Carlo estimates. That the Monte Carlo procedure is valid is supported, however, by the observation that Monte Carlo estimates of assay response SDs were extremely consistent (a situation in which agreement should indeed be expected). To our knowledge, uncertainty in the fitted parameters of the calibration curve is not reflected in sample uncertainty estimates in most available data reduction methods. Hence, earlier procedures for calculating within-sample SDs underestimate sample variance. This bias is especially significant in defining assay sensitivity, leading potentially to inferred greater sensitivity than actually achievable. In addition, underestimation of (low-end) assay uncertainty may have nontrivial impact on computer-assisted curve fitting of (weighted) neurohormone time series, presumptively promoting false-positive (type I) statistical errors. Lastly, estimating the precision of inferred statistics from a time series, e.g., the SD of an approximate entropy estimate for any given time series, will lead to an overstatement of precision.

In summary, the variable weighting data reduction protocol described here provides greater accuracy and precision than most commonly used hormone concentration data reduction procedures, particularly at extremely low hormone concentrations. Three monotonically sigmoidal functional forms for evaluating calibration curves are examined, after which selection of a preferred model is based on empirical grounds (lowest absolute SSRs). Assay responses are variably weighted by an empirically derived discrete assay response uncertainty profile that *(a)* is specifically tailored to the particular calibration curve response profile being considered yet free of any constraints applied by assuming a particular functional form for a variance profile (1)(2)(3)(4)(5)(6), *(b)* accounts for both response precision (replicability) and accuracy (relative deviation from predicted calibration curve), and *(c)* is generated in a manner maximally consistent with the most probable derived calibration curve. We show that, in principle, uncertainty estimates for both assay responses and hormone concentrations can be obtained from even single-replicate assay protocols. However, the reliability of measures rises significantly upon increasing to duplicates. Uncertainty in determination of the calibration curve is also evaluated and subsequently propagated as a contribution to derived concentration uncertainty estimates. The explicit use of zero-hormone reference information during evaluation of calibration curves also contributes to better determination of low hormone concentrations. Efforts are currently under way to fully implement this data reduction protocol into a 32-bit Windows operating environment in a manner that will facilitate maximal ease of user interaction as well as maximal data throughput capabilities.

## Acknowledgments

We acknowledge support from: NSF DIR8920162 (National Science Foundation Center for Biological Timing; M.S., M.L.J., J.D.V.); NIH RR00847 (General Clinical Research Center at the University of Virginia; M.L.J., J.D.V.); NIH DK38942 (Diabetes and Endocrine Research Center at the University of Virginia; M.L.J., J.D.V.); NIH RR08119 (Center for Fluorescence Spectroscopy at the University of Maryland at Baltimore; M.L.J.); NIH GM35154 (M.L.J.); NIH RCDA1K04 HD00634 (J.D.V.); NIH P30 HD28934 (Reproduction Research Center at the University of Virginia; J.D.V.); Baxter Healthcare Corp., Round Lake, IL (J.D.V.); The NIH-supported Clinfo Data Reduction Systems; The Pratt Foundation; and The University of Virginia Academic Enhancement Fund.

## Footnotes

↵1 Nonstandard abbreviations: GH, growth hormone; SSR, sum of squared residual; and LH, lutropin.

- © 1998 The American Association for Clinical Chemistry