## Abstract

Manufacturers and users of medical diagnostic devices are provided a statistical decision tool for investigating a claimed minimal detectable concentration (MDC). The MDC is defined by setting two respective probabilities: that the blank sample being analyzed is determined to have analyte and that the device fails to determine a low concentration of analyte at the MDC. The statistical procedure for simultaneously testing the two aforementioned analytical decision errors assumes that signal responses follow a gaussian distribution but does not require a fitted calibration curve, knowledge of distribution parameters, or the assumption of constant variance in the low assay range. Evaluation of the operating characteristics of the procedure requires knowledge only of the variance ratio between the MDC and zero-dose signal distributions, which usually is well known.

When using a medical diagnostic device to assay samples with very low analyte concentrations, it is important to know the minimal detectable concentration (MDC)1 for that device, in that samples with analyte concentrations below the MDC cannot be reliably measured. Examples in which sample concentrations can fall below the MDC include serum thyroid-stimulating hormone concentrations in hypothyroid patients and HIV virus antibody concentrations in pre-AIDS patients.

In this report, the signal detection limit (SDL) of a chemical measurement process is the signal at which the analyte is judged detectable (analyte determined not zero). The detection limit should be set such that the probability of judging a blank sample signal detectable has a prescribed value, α. In the methodology of this report, the MDC should be set such that the probability for a real signal output to be judged not detectable (falls below the SDL) is β. Both α and β typically are set to be small probabilities. The MDC model used here is schematically illustrated in Fig. 1⇓ , which is based on an example using absorbance values from an interleukin-2 receptor (IL-2R) data example to be discussed later.

Clayton et al. (1) note that traditional methodologies for determining SDLs have generally been concerned only with providing protection against type I errors (i.e., reporting that an analyte is present when it is not) without concern for similar protection against type II errors (i.e., reporting that an analyte is not present when it is) (2)(3)(4)(5)(6). This has led to the fallacious practice of defining the MDC at the SDL, which gives a 50% type II error rate.

As also noted by Clayton et al. (1), previous attempts to define MDCs with realistic type II errors have suffered from one or more of the following deficiencies: perfect knowledge of the calibration curve is assumed; the measurement error variance is assumed known; all signal distribution parameters are assumed known; and mathematical or logical fallacies have occurred in the methodological development. Altschuler and Pasternak (7) were first to place the MDC problem in a statistical hypothesis testing framework, but they assumed perfect knowledge of the calibration curve and of the measurement error variances. Currie (8) followed the approach of Altschuler and Pasternak (7) but avoided having to assume knowledge of the calibration curve by placing the SDL on the signal axis. Currie also assumed knowledge of all signal distribution parameters.

Many of the current, popular methods use a fitted calibration curve to estimate the MDC (1)(9)(10)(11)(12). These methods have in common the following problems. The fitted calibration curve is assumed to be unbiased (i.e., correctly specified), which is sometimes an unrealistic assumption, especially in the lower assay region around the MDC where there may be only a single calibrator nearby. The component of variance attributable to fitting a calibration curve must thus be considered in the planning of the MDC experiment and in the method of drawing statistical inferences from the resulting data.

This report expands on Currie’s MDC model (8), which is schematically illustrated in Fig. 1⇑ , and provides statistical methodology to manufacturers and users of medical diagnostic devices for testing a claimed MDC under gaussian signal distributions. The methodology tests that the probability of the two aforementioned analytical decision errors, α and β, are no larger than as claimed. A general MDC model is formulated. This MDC model tests relationships between means of the assumed gaussian signal distributions and, unlike many of the previous methods, does not require knowledge of distribution parameters, fitted calibration curves, or SDL estimates. Separate statistical tests are developed for the cases of homogeneous and heterogeneous low assay range variances.

Although this procedure assumes that the signal responses follow a gaussian distribution, this is not a severe restriction. It has been our experience that the signal distributions for many medical diagnostic instruments tend to be well approximated by the gaussian distribution. An example of the T CELL Science IL-2R assay is presented later, where 75 units/mL was the claimed MDC to be tested and where the normality assumption of the absorbance signal data from this assay was tested statistically and not rejected.

To test a claimed MDC, the methodology requires that a blank and a low-dose sample (with analyte added to the claimed MDC) be assayed in sufficiently large numbers of replicates, which are determined statistically. A blank sample should be simulated by one of the following two methods: *(a)* using a blank control that mimics human serum; or *(b)* adding or deleting reaction steps in the assays of a real human serum sample, so that only background noise can be measured.

Throughout the remainder of this report, standard statistical notations are used. The reader is referred to the *Glossary of Abbreviations*, *Notations, and Definitions*, located just before the reference list. Furthermore, all signal distributions are assumed to be gaussian distributions.

## The MDC Model

The MDC model used in this report was previously developed by Currie (8) and requires the assumption of gaussian signal distributions. Under this model, the user defines the MDC through the parameters α and β, which are explained below and control the overlap between the blank and MDC signal distributions. This MDC model is shown schematically in Fig. 1⇑ .

Currie’s MDC model defines SDL as the signal response needed to control the proportion α of zero-dose samples judged detectable. For a monotonically increasing (or nondecreasing) signal response curve, any signal response (using a blank analyte sample) that is equal to or greater than the SDL is judged detectable. Thus: where *Y*_{0} is the signal for a zero-dose sample, and the proportion α is set by the user, and where the SDL is the upper αth percentile of the blank analyte signal distribution. That is, for monotonically increasing signal response curves: where μ_{0} is the mean of the signal distribution when a blank sample is used; σ_{0} is the corresponding standard deviation, and *z*_{α} is the (1 − α)th percentile of the standard gaussian distribution (mean = 0 and standard deviation = 1). As shown in Fig. 2⇓ , in a gaussian distribution, the absolute distance between the mean μ_{0} of the blank signal distribution and the SDL is *z*_{α}σ_{0}, from which Eq. 2 follows for the monotonically increasing signal response curve case.

The true MDC distribution is defined such that the probability is β for a fixed α that the signal from its distribution is judged not detectable when a MDC sample is used. For a monotonically increasing signal response curve, any response from the MDC signal distribution that is equal to or less than the SDL is judged not detectable. Thus,

where *Y*_{MDC} is the signal for a sample at the true MDC. Thus, for a monotonically increasing calibration curve and under a gaussian signal distribution, the SDL is the lower βth percentile of the MDC signal distribution. That is, for monotonically increasing signal response curves:

where μ_{MDC} is the mean of the signal distribution when a sample at the MDC is used, σ_{MDC} is the corresponding standard deviation, and *z*_{β} is the (1 − β)th percentile of the standard gaussian distribution. Fig. 3⇓ shows the absolute distance between the mean μ_{MDC} of the MDC signal distribution and the SDL is *z*_{β}σ_{MDC}, from which Eq. 4 follows for the monotonically increasing response curve case.

If the above-defined MDC model holds and under the assumed gaussian signal distributions, then for monotonically increasing signal response curves, Eqs. 2 and 4 can be combined to give:

For monotonically decreasing (or nonincreasing) signal response curves, the inequalities in Eqs. 1 and 3 are reversed, and the following equation results from an argument similar to the monotonically increasing case: From Eqs. 5 and 6, it follows for both the monotonically increasing and decreasing signal response curves that: where μ_{0} and μ_{MDC} are the signal means at the zero dose and MDC, respectively; σ_{0} and σ_{MDC} are the corresponding standard deviations; and *z*_{α} and *z*_{β} are set to control α and β, respectively.

Note that under this MDC model, it is not necessary to know the SDL or the means of the signal distributions μ_{MDC} or μ_{0} (all of which could be changing over time or assay setups, as long as Eq. 7 remains true), which is an underlying assumption of Currie’s MDC model (8). In a practical sense, it is only important that the inequality | μ_{0} − μ_{MDC} | ≥ *z*_{α}σ_{0} + *z*_{β}σ_{1} hold over a set of prespecified conditions (e.g., over the claimed shelf life of the assay reagents) for some claimed MDC, where μ_{1} and σ_{1} are the mean and standard deviation, respectively, of the corresponding signal distribution. A common example in which Eq. 7 holds is when both the shape of the low end of the calibration curve and the intraassay signal precisions, σ_{0} and σ_{MDC}, remain constant over a set of prespecified conditions, although there might be substantial interassay variation, which would affect the values of μ_{0} and μ_{MDC} but not the magnitude of their difference. For cases where | μ_{0} − μ_{MDC} | or σ_{0} and/or σ_{MDC} are changing with respect to certain systematic factors, the user might want to use the methodology to characterize how the MDC changes. Another possibility would be to test a claimed MDC at the worst case set of conditions, which might occur, for example, at the claimed assay system expiration date when the calibration curve might be shallower and the signal response more imprecise compared with the day of reagent manufacture.

Three cases will be discussed: *(a)* a situation in which the standard deviations are unknown but equal; *(b)* a situation in which the ratio of the unknown standard deviations is known; and *(c)* a situation in which the standard deviations are unknown and unequal. In general, the ratio of the standard deviations will not be known exactly, but in some cases accurate estimates of σ_{MDC} and σ_{0} can be derived from historical data, from which the user can determine the value to use as the fixed, known ratio. In all three of these situations, it will be shown that the user can determine the required sample sizes to achieve a user-specified power for the proposed test. From our experience, it is expected that most of the time, the user will be dealing with the third case, in which the variances are unknown and unequal.

## Testing a Claimed MDC under Gaussian Signal Distributions

For any of the three aforementioned variance cases, the following sequential procedures are required for testing a claimed MDC under gaussian signal distributions. These procedures will be illustrated later with an actual data example for the most common case of unknown and unequal variances.

1. Specify a maximum acceptable concentration for a claimed MDC and define the mean of its corresponding signal distribution to be μ_{1}.

2. Fix α and define a maximum acceptable value of β, e.g., β_{0}.

3. Calculate the required number of replicates (as explained later) for testing the null hypothesis: against the alternative hypothesis: where σ_{1} is the standard deviation of the signal distribution for a sample at the analyte concentration that is claimed to be the MDC. Furthermore, if we consider α fixed, this is equivalent to testing: (shown schematically in Fig. 4⇓ ), vs:

(shown schematically in Fig. 5⇓ ).

4. If H_{0} or equivalently H^{*}_{0} is rejected, the claimed MDC is rejected.

5. If H_{0} or equivalently H^{*}_{0} is not rejected, then the claimed MDC is judged not invalid.

6. If H_{0} is not rejected, a one-sided confidence interval should be constructed for β. If this upper limit is smaller than β_{0}, this would suggest that a lower claimed MDC should be tested.

In all three cases discussed below, if H_{0} is not rejected, the user may wish to repeat the above procedures on an additional independently generated data set with an even lower claimed MDC. This will be illustrated later in the discussion of an example.

### case 1: unknown and equal variances

The blank and claimed MDC signal distributions are assumed to follow a gaussian distribution with unknown means, μ_{0} and μ_{1}, and unknown standard deviations, σ_{0} and σ_{1}, respectively. Testing for a claimed MDC that β ≤ β_{0} under an a priori fixed α is equivalent to testing the null hypothesis:

against:

as can schematically be seen by studying Figs. 4⇑ and 5⇑ .

If σ_{0} = σ_{1}, then H_{0} can be tested using the noncentral *t*-statistic: where *Ȳ*_{0} and *Ȳ*_{1} are sample signal means based on *N*_{0} and *N*_{1} replicates at the zero dose and the claimed MDC, respectively; and *S*_{0}^{2} and *S*_{1}^{2} are the corresponding sample variances. Note that the calculation of the noncentral *t*-statistic is identical to that of the commonly use two-sample *t*-statistic. The only difference between these two cases is that the noncentral *t*-statistic is testing for a non-zero mean difference between μ_{0} and μ_{1} as shown by the above-stated hypotheses. For the monotonically increasing calibration curve, the variable *t* has a noncentral *t*-distribution, as described by Graybill (13) with noncentrality parameter:

and degrees of freedom given by *f* = *N*_{0} + *N*_{1} − 2. For a test with type I error of size γ (usually γ is approximately the same size as α and relatively small) and for monotonically increasing signal response curves, the distribution of *t*, whenever the null hypothesis is true (i.e., β = β_{0}), has a positive noncentrality parameter and is shifted to the right of zero. If the alternative hypothesis is true, then values of the observed *t* will tend to be in the lower tail of that null distribution. Thus, H_{0} would be rejected if and only if *t* < *t*(γ; δ_{0}, *f*), where *t*(γ; δ_{0}, *f*) is the γth lower percentage point of the noncentral *t*-distribution with noncentrality parameter δ_{0} (Eq. 9 evaluated with β = β_{0}) and degrees of freedom *f*. For monotonically decreasing signal response curves, the same *t*-statistic in Eq. 8 is used, but H_{0} is rejected if and only if −*t* < *t*(γ; −δ_{0}, *f*) (13).

For the computation of *t*(γ; δ_{0}, *f*), there are several commercially available software packages available, such as SAS^{TM} (14), NCSS2000^{TM} (15), IMSL^{TM} (16), and StaTable^{TM} (17), all of which provide routines that compute these percentage points. If a good historical estimate of σ (=σ_{1} = σ_{0}) is available, then sample size calculations can be performed by using the noncentral *t*-distribution and iterating on *N*_{0} and *N*_{1} in Eq. 9.

### case 2: known variance ratio

In many clinical settings, there is a wealth of information about the parameter ρ (=σ_{1}/σ_{0}). If one is willing to assume that this large amount of empirical evidence provides us with sufficient information to assume that ρ is known, then:

has a noncentral *t*-distribution with noncentrality parameter:

and degrees of freedom *f* = *f*_{0} + *f*_{1}, where *f*_{0} and *f*_{1} are the respective degrees of freedom of *S*_{0}^{2} and *S*_{1}^{2}. To test the null hypothesis H_{0}: β ≤ β_{0} vs H_{a}: β > β_{0}, compute δ_{0} using Eq. 11 with β = β_{0} and find the critical point *t*(γ; δ_{0}, *f*). For monotonically increasing signal response curves, H_{0} is rejected in favor of H_{a} if and only if *t* < *t*(γ; δ_{0}, *f*). For the monotonically decreasing signal response curves, H_{0} is rejected if and only if −*t* < *t*(γ; −δ_{0}, *f*). If a good historical estimate of ρ is available, sample size calculations can be performed by iterating on *N*_{0} and *N*_{1} in Eq. 11.

### case 3: unknown and unequal variances

If σ_{0} ≠ σ_{1}, the null hypothesis H_{0}, as defined above, can be tested using the statistic: where *Ȳ*_{0} and *Ȳ*_{1} are sample signal means based on *N*_{0} and *N*_{1} replicates, respectively, at zero dose and the claimed MDC, and *S*_{0}^{2} and *S*_{1}^{2} are corresponding sample variances.

The distribution of *t*, which is a Welch-type statistic (18), is approximated by the noncentral *t*-distribution (18)(19)(20)(21) with estimated noncentrality parameter: and estimated degrees of freedom: where *f*_{0} and *f*_{1} are the corresponding degrees of freedom for *S*_{0}^{2} and *S*_{1}^{2}, and ρ̂^{2} = *S*_{1}^{2}/*S*_{0}^{2}. The Welch *t*-test for the Behrens-Fisher problem (a similar type of problem) has been investigated by many authors, including Cochran (19) and Davenport and Webster (20)(21).

The quantities ρ̂ and *f̂* are computed from the observed data (through the values of *S*_{0}^{2} and *S*_{1}^{2}), and the lower percentage point *t*(γ; δ̂, *f̂*) from this noncentral *t*-distribution is then calculated. For a monotonically increasing signal response curve, the null hypothesis H_{0} is rejected if and only if *t* < *t*(γ; δ̂, *f̂*), where δ_{0} is computed from Eq. 13 with β = β_{0}. Likewise as described above, for a monotonically decreasing signal response curve, the null hypothesis is rejected if and only if −*t* < *t*(γ; −δ̂, *f̂*).

It should be noted that the estimated degrees of freedom given by *f̂* in Eq. 14 will usually not be an integer. Many algorithms for finding the percentiles of the noncentral *t*-distribution will not allow non-integer degrees of freedom. However, the user can usually make a decision as to reject or fail to reject the hypothesis by using the integer value immediately below the value of *f̂*. If one needs to compute a value for *t*(γ; δ̂, *f̂*), then harmonic interpolation (briefly described below) is recommended. The value *f̂* is first truncated to its integer portion (i.e., *f̃* =*[f̂]* where [arg] is the integer portion of “arg”). The percentiles *t*(γ; *δ̂*, *f̃*) and *t*(γ; δ̂, *f̃* + 1) are then found using an appropriate algorithm. The percentile *t*(γ; δ̂, *f̂*) is then found by harmonic interpolation [i.e., linear interpolation on the reciprocals of the percentiles; see Laubscher (22)].

It is important for the user to understand that the test just described is not a usual test of hypothesis. That is, the *t* test is a conditional test based on the observed value of ρ̂, which enters into the estimates of the degrees of freedom (Eq. 13) and of the noncentrality parameter (Eq. 14) for the noncentral *t*-distribution that is used to judge the statistical significance of the test. In contrast, the *t*-tests discussed in the previous sections, where ρ was assumed equal to 1 in case 1 and was assumed known but not necessarily equal to 1 in case 2, are unconditional tests because the parameters of their *t*-distributions (degrees of freedom and noncentrality parameter) are fixed quantities and do not need to be estimated from the data as with the *t* test. Whatever decision is made (reject or fail to reject H_{0}) is a function of the specific value of ρ̂ realized from the data, and there exist values of ρ̂ that will lead to the opposite decision. A major impact of the conditionality of this test, unlike the standard Student *t*-test, is that the probability of a type I error for this conditional test varies with the values of ρ̂. The value of γ used to find the percentiles described above is not the probability of the type I error for this test; γ is the nominal size of the test (i.e., in name only). The true probability of type I error (size) for this conditional test (i.e., for the specific value of ρ̂ derived from the data) is given by: *Pr* {*t* < *t*(γ; δ̂_{0}, *f̂*|ρ̂} = *W*, which of course, involves the conditional distribution of *t* given ρ̂. If this conditional probability is evaluated with the null hypothesis being true (i.e., β = β_{0}), then this is defined to be the conditional size of the *t* test. On the other hand, if this probability is found for a value of β > β_{0} under the alternative hypothesis, then this is defined to be the conditional power of the *t* test.

This conditional probability (the random variable, *W*) has a frequency distribution of its own. The exact nature of this distribution is so complex that its usefulness would be quite limited. However, the moments of this distribution (specifically the mean and standard deviation) can be examined to assess the appropriateness of using such an approximate test as the *t* test. If the mean of *W* (E*[W]*; the expected value of the conditional probability) departs from the nominal size of the test, namely γ, by large deviations, then such an approximate test would be of little usefulness. One the other hand, if E*[W]* remains relatively “close” to the value of γ, then the user can confidently use such an approximate test. Throughout the remainder of this report, E*[W]* will be referred to as either “the expected size of the test” or “the unconditional size of the test” whenever H_{0}: β = β_{0} is true. Under the alternative hypothesis, E*[W]* will be referred to as either “the expected power of the test” or “the unconditional power of the test”.

Of equal importance is the standard deviation of the distribution of *W*. This provides the user with information about how variable the conditional size (or power) is about its mean. One can form “margin of error” type intervals [i.e., mean ± 3 × (standard deviation)] to obtain an interval of “reasonable” values to expect for the conditional probability, *W*.

Thus it is important to compute for the *t* test both the expected unconditional size and power and the standard deviations of the conditional size and power. This could be accomplished using modern simulation methods on a computer. Random samples of sizes *N*_{0} and *N*_{1} are first generated from the appropriate gaussian distributions, and the values of ρ̂, δ̂_{0}, *f̂*, and finally the value of *W*_{1} = *Pr* {*t* < *t*(γ; δ̂_{0}, *f̂*|ρ̂} are then calculated. If this is repeated many times to produce a random sample of values of *W*, namely *W*_{1}, *W*_{2}, *W*_{3}, … , *W*_{k}, where k is large, then reasonably accurate simulated estimates of E*[W]* and the standard deviation of *W* can be obtained. The reader who is concerned with the practical interpretations of the expected unconditional size and power and the standard deviations of the unconditional size and power of the *t* test should proceed to the last paragraph of this section.

An analytical method using numerical analysis algorithms has been developed. This method is based on evaluating for various values of β the unconditional size (or power) of rejecting H_{0}, and is given by the following equations:

where the definitions of *Q* and *K* are given in the *Glossary of Abbreviations*, *Notations, and Definitions*. Note that the first two integrals are over an unbounded interval of values of the variable of integration. This can be stabilized by the probability integral transformation to the interval (0, 1) by letting *p* equal the cumulative probability obtained for the corresponding value of ρ̂ and then defining *dp* = *f*(ρ̂)*d*ρ̂. The third integral then follows.

The derivation of Eq. 15 is very similar to that given in Cochran (19) and Davenport and Webster (20)(21). The actual numerical computation of this integral can be accomplished using the numerical integration routines in any numerical analysis software package such as IMSL (16).

Whether one chooses to use the simulation method or the numerical algorithms method, the sample sizes *N*_{0} and *N*_{1} can be iterated to find the appropriate combination of sample sizes and parameter values that will produce the desired unconditional power of the *t* test at a specific value of β > β_{0} under the alternative hypothesis.

## An Upper Confidence Limit for β

An upper limit of a (1 − γ) (100)% confidence interval for β can be determined by finding the largest value of β in Eqs. 9, 11, or 13 such that the *p* value of the test statistic in Eqs. 8, 10, and 12, respectively, is equal to γ. If this upper limit is smaller that β_{0}, this would suggest that a lower claimed MDC should be tested.

## Sample Size Allocation

For the case of homogeneous but unknown variances, the sample sizes should be allocated equally and their value determined by calculating the power of the noncentral *t*-test for various values of β > β_{0} for fixed α substituted into the noncentrality parameter in Eq. 9. Because the case of homogeneous variances is a special case of the known ρ, it follows directly from Eq. 16 below that the equal sample size allocation gives maximum power.

The power of any test involving the noncentral *t*-statistic varies directly with the absolute value of the noncentrality parameter. In the case of known ρ, the power of this test will be maximized when the sample sizes are allocated proportional to the standard deviations because this allocation maximizes the value of the noncentrality parameter. This result is demonstrated in *Appendix 1*. Thus, for known ρ, sample sizes should be allocated according to the following formula: For the case of unknown ρ, it is also recommended that the two sample sizes be allocated proportional to their standard deviations, as given by: where *S̃*_{0} and *S̃*_{1} are the sample standard deviations computed from historical data and are the estimates of the population standard deviations σ_{1} and σ_{0}, respectively. To empirically study the rule in Eq. 17 in the case of unknown ρ, evaluations were made under the following three sample size allocation rules: *(a)* sample sizes proportional to the standard deviations; *(b)* equal sample sizes; and *(c)* sample sizes proportional to the variances. The latter two allocation rules were chosen for comparison with Eq. 17 because they are commonly used. Eq. 15 was evaluated to determine the sample size requirements to achieve a specified power for the following conditions. The results are presented below in Table 1⇓ . These sample sizes are those necessary to achieve a power of 0.90 at the alternative value of β = 0.05 > β_{0} = 0.01. Values of α = 0.01 and γ = 0.05 were chosen; and the values of ρ =σ_{1}/σ_{0} were representatively chosen to be 0.33, 0.67, 1.00, 1.50, and 3.00. The information provided in Table 1⇓ indicates that the total sample size was minimized for the first allocation plan (allocating sample sizes proportional to the standard deviations). We have investigated other arrangements of parameters, and all such findings are consistent with these results.

As shown by the computer simulations in a following section, the user is rewarded for information on ρ. To be conservative in determining sample sizes from historical data, the user should use the lower confidence limit of a one-sided 90% or 95% confidence interval on ρ, denoted by LL_{ρ} (illustrated in *IL-2R Assay Data Example* below). As the simulation results show, underestimating ρ with LL_{ρ} will tend to overestimate the required sample sizes. The excess sample sizes beyond what would be required for the actual but unknown ρ could be considered as insurance against overestimating ρ and thus lowering the power. It should be noted that the more historical data that are gathered for estimating ρ, the smaller the difference between ρ and LL_{ρ}.

## An IL-2R Assay Data Example

The T CELL Science IL-2R Bead Assay was used as an example where 75 units/mL was the claimed MDC to be tested. IL-2R is an enzyme immunometric assay for the quantitative determination of IL-2R concentrations in human serum. The assay is used as an aid in monitoring response to treatment in hairy cell leukemia patients in whom increased concentrations of serum IL-2R have been confirmed. For the MDC model, both α and β_{0} were set at 0.01, and the size of the nominal type I error was set to γ = 0.05. The experimental protocol dictated that 0 and 75 units/mL calibrators were to be run in 45 replicates, which could be accommodated in one assay setup. The absorbance signal data (multiplied by 1000) from this experiment are provided in Table⇑ A2-1 in *Appendix 2*, in which the sample order reflects the linear and quadratic trend robust sample sequence that was used *(abbabaab … )*. This sample sequence is highly efficient for the detection of linear and quadratic assay drift (23), which would violate the assumption of normality. With only linear drift, the data could be adjusted using this design. However, the presence of a nonnegligible amount of nonlinear drift might invalidate the methodology because the actual functional form of the nonlinear drift often is uncertain.

In regression analysis using commercially available software, the data showed no statistically detectable drift. The normality assumption of the absorbance signal data at the blank analyte concentrations was supported by both the Shapiro-Wilk test (24) (*W* = 0.964; *P* = 0.19) and the Martinez-Iglewicz test (25) (*MI* = 0.994; 5% critical value = 1.162). For the 75 units/mL calibrator, the normality assumption of the absorbance signal was also supported by both the Shapiro-Wilk test (*W* = 0.982; *P* = 0.73) and the Martinez-Iglewicz test (*MI* = 0.971; 5% critical value = 1.162).

The following summary statistics were extracted from the data: *Ȳ*_{0} = 8.41; *Ȳ*_{1} = 43.57; *S*_{0}^{2} = 12.76; *S*_{1}^{2} = 21.69; *N*_{0} = 44; and *N*_{1} = 44, where *Ȳ*_{0}, *S*_{0}^{2}, and *N*_{0} correspond to the sample mean, sample variance, and sample size for the data from the blank calibrator; and *Ȳ*_{1}, *S*_{1}^{2}, and *N*_{1} similarly correspond to the data from the 75 units/mL calibrator.

Because ρ̂^{2} = *S*_{1}^{2}/*S*_{0}^{2} = 1.70 is statistically detectably >1 at *P* = 0.05 (by way of the standard *F*-test for comparing variances), the variances of the two calibrators are judged to be heterogeneous, which justified using the *t* statistic in Eq. 12 to test the claimed MDC of 75 units/mL for the IL-2R assay. Based on Eqs. 12–14, *t* = 39.73; δ̂ = 21.64; and *f̂* = 80.64.

Using the integer degrees of freedom 80 and 81, we can calculate the percentage points, using the NCSS2000 Probability Calculator^{TM} (15), for the noncentral *t*-distribution, and they are, respectively, *t*(0.05; 21.64, 80) = 18.74 and *t*(0.05; 21.64, 81) = 18.75. Because *t* = 39.73 is certainly not smaller than these values, one can conclude that the null hypothesis, H_{0}: β ≦ β_{0} = 0.01 is not rejected. Equivalently, the *P* value for this observed value is >0.9999999981.

The following procedure shows how to calculate the upper 99% confidence limit for β. Requiring the value of *t* = 39.916 to be the 0.01 lower percentage point from a noncentral *t*-distribution with 80 degrees of freedom, we determined, using the software NCSS2000 Probability Calculator, that the noncentrality parameter must have the value 47.425. With 81 degrees of freedom, this value is 47.381. Setting the value of ρ̂ = 47.4 in Eq. 13, one can back-solve for *z*_{β} and find *z*_{β} = 7.22, which corresponds to a β value, or rather the upper 99% confidence limit for β, which is essentially zero. Because the upper 99% confidence limit for β is less than β_{0} = 0.01, this would therefore suggest that the actual MDC is lower than 75 units/mL.

To illustrate how to perform sample size calculations, suppose a second experiment for the IL-2R assay was planned to test 50 units/mL as the claimed MDC. To determine the sample sizes needed, all that is required are values of α, β_{0}, γ, and a value of β > β_{0}. The resulting unconditional expected size and unconditional expected power are, of course, functions of ρ, *N*_{0}, and *N*_{1}. For this illustration, it will be assumed that the experimentally estimated value of *ρ* = 1.30 from testing a calibrator of 75 units/mL would be applicable for testing 50 units/mL as the claimed MDC. To be conservative in the sample size calculations, the lower limit of a 95% confidence interval for ρ is computed by: ρ̂^{2}/*F*_{(1−γ; N1}_{−1, N0}_{−1)} = ρ̂^{2}/*F*_{(0.95; 43, 43)} = 1.02, where *F*_{(0.95; 43, 43)} = 1.66 is the 95th percentile of the *F*-distribution with 43 numerator and 43 denominator degrees of freedom. Taking the square root of this produces an estimate of LL_{ρ} = 1.01, with which the unconditional expected sizes and power of the proposed *t* are calculated using Eq. 15 and the methodology described in *Appendix 1*. The sample sizes needed to achieve the desired power are, respectively, *N*_{0} = 122 and *N*_{1} = 124. These results are presented in Table⇑ A2-2 in *Appendix 2*. At the assumed value of ρ = 1.01 and β = β_{0} = 0.01, the expected size is 0.0476, with a standard deviation of the conditional size equal to 0.0032. To be conservative, the user can use three standard deviations [or 3 × (0.0032) = 0.0096] as a “margin of error”; thus, whenever the true value of ρ equals 1.01, the conditional size of the test can roughly be expected to be contained in the interval 0.0476 ± 0.0096 (or from 0.0380 to 0.0572). One must be careful with this interpretation because the distributions of conditional size (or power) are not symmetric, but this does give a rough idea of what to expect. At the same value of ρ = 1.01 and at β = 0.05, the expected power is equal to 0.9010 with a standard deviation of the conditional power equal to 0.0059. Thus, the conditional power of the *t* test at β = 0.05 and ρ = 1.01 can roughly be expected to be contained in the interval 0.9010 ± 0.0177 (or from 0.8833 to 0.9187).

## Sensitivity Analysis of the *t* Test for the MDC Model with Respect to the Nuisance Parameter, ρ

The reader who is interested only in the practical conclusions from the computer simulation studies should skip to the next section.

Computer simulations were performed to study the sensitivity of the sample size calculations to ρ. Several parameters are involved in dealing with this research problem. The parameters α and β_{0} are probabilities of errors in the MDC model that were discussed above. The values of these are specified by the user as a part of the MDC model and are typically small. In this report, we have used 0.01 as the value of the probabilities of both of these two errors, but other values were investigated, and similar results were obtained. The values of β > β_{0} under the alternative hypothesis should also remain small. The “target” value of β reflects the deviation from the hypothesized MDC model at which the user requires a sufficiently high probability (≥0.90) of being able to detect that deviation. Because these constraints on the power function require for most practical problems sample sizes that are at least moderately large (which is born out by our experience), this research was limited to moderate to large sample sizes. Thus, the small sample size case is beyond the scope of this report.

For these simulation runs, we chose an expected power of 0.90 at the value of β = 0.05 under the alternative hypothesis. Because the expected power is a function of the unknown parameter, ρ, one must specify the value of ρ at which this expected power is achieved. We chose 0.33, 0.67, 1.00, 1.50, and 3.00, which in our experience cover the range of what is typically observed for the ratios of the signal distribution standard deviations to the blank distribution standard deviations in the low assay range of the MDC for most in vitro medical diagnostic devices. In fact, we have never encountered a value of ρ <1. Finally, the nominal size of the type I error of the null hypothesis was chosen to be γ = 0.05. We used other values for γ in other simulations and observed similar results.

Five tables that illustrate the behavior of the *t* test are presented in Appendix 3, which is available at *Clinical Chemistry Online*, where it can be accessed through the September Table⇑ of Contents (http://www.clinchem.org/content/vol46/issue9); Table⇑ A2-2 in *Appendix 2* is similar, but for the IL-2R example. These five tables correspond to the five “targeted” values of ρ and proceed from very large sample size requirements for the value of ρ = 0.33 to somewhat more moderate sample size requirements for the value of ρ = 3.00.

In all simulation cases considered, the expected size of the *t* test varied as a function of ρ but was also bounded from above and below (i.e., the values of the expected size stayed very close to the value of γ, the nominal size of the type I error). In fact, in every case the expected size was below the value of the nominal size of the type I error and never fell below 0.017. The value of the expected size was at its maximum when the value of the nuisance parameter ρ was at the target value and hence was closest to the nominal value of the type I error γ. The expected size was generally at its minimum value when the value of ρ deviated from the target value. Although it is not immediately obvious from the limited results given in *Appendices 2* and *3*, it was also observed from more extensive simulations that the expected size of the test asymptotically approaches the value of γ as ρ approaches zero and as ρ increases without bound [a property of such tests noted previously by Davenport and Webster (20)(21)].

The standard deviations of the conditional size of the *t* test also varied as a function of ρ, and were at a minimum when the value of ρ was at the target value (never larger than 0.005). The value of the standard deviations of the conditional size increased as ρ departed from the targeted value in either direction. Hence, the empirical evidence seen from these simulation studies indicates that the user can reasonably expect the conditional size of the *t* test, under restrictions on the parameters similar to those discussed above, to be close to the nominal size of the test.

The expected power of the test was certainly an increasing function of the parameter β > β_{0}, but it also seemed to be an increasing function of ρ. As is illustrated in *Appendices 2* and *3*, the expected power of the test was monotonically increasing above 0.90 (the specified power requirement) as ρ increased from the targeted value of ρ, and was monotonically decreasing below 0.90 as ρ decreased from the targeted value. As explained in the next section, the user can use this knowledge to her or his benefit. The standard deviation of the conditional power was at a minimum whenever the value of the nuisance parameter ρ was at the targeted value. As expected, the standard deviation was larger for values of the expected power in the range of 0.20–0.80 than when the power was 0.90, but as the expected power moved closer to 1, the standard deviation of the conditional power decreased dramatically. These simulations provide further evidence that the user should base sample size calculations on a good historical estimate of ρ.

## Recommendations

To calculate the required sample sizes needed to achieve the desired power for the *t* test, the user must have a good prior estimate of ρ. To take advantage of the behavior of this test as discussed above, the user should always conservatively underestimate the true value of ρ. We recommend that the user always work with a 90% or 95% lower confidence limit as this conservative estimate. Thus, the more historical data that enter into the estimate of ρ, the less conservative the user will have to be in the ultimate choice of required sample sizes.

If the user always calculates the sample size requirements using an expected power of at least 0.90 (to detect small deviations from β_{0}, e.g., ≤0.05), then he or she should not have to be concerned about the small sample behavior of the *t* test. Moderate to large sample sizes should be necessary for most applications to achieve this expected power.

The two sample sizes, *N*_{0} and *N*_{1}, should be allocated proportional to the estimated value of ρ. The simulation studies that we performed showed that the power function generally has less variability with this rule. Two other sample size allocation rules were considered (namely, equal allocation and allocation proportional to the variances), but they produced greater variability in the standard deviations of the conditional power and required larger overall sample sizes; hence they are not recommended and are not discussed further.

## Glossary of Abbreviations, Notations, and Definitions

μ_{0} = mean of the signal distribution of the device when the blank analyte sample (zero dose) is used.

σ_{0}^{2} = variance of the signal distribution of the device when the blank analyte sample (zero dose) is used.

μ_{1} = mean of the signal distribution of the device when a low-dose calibrator is prepared at the claimed MDC.

σ_{1}^{2} = variance of the signal distribution of the device when a low-dose calibrator is prepared at the claimed MDC.

ρ^{2} = σ_{1}^{2}/σ_{0}^{2} = ratio of the variance of the low-dose signal distribution to the variance of the blank signal distribution. μ_{MDC} = mean of the signal distribution at the true minimal detection concentration of the device.

*z*_{α} = (1 − α)(100)th percentile of the standard gaussian distribution (0 < α < 1).

*z*_{β} = (1 − β)(100)th percentile of the standard gaussian distribution (0 < β < 1).

*Y*_{x} = signal response of the device at analyte concentration *x*.

SDL = signal response above (below) which for a monotonically increasing (decreasing) calibration curve, an observed signal is judged detectable.

*Ȳ*_{0} = sample mean of the simple random sample of size *N*_{0}, from the zero-dose distribution, *N*(μ_{0}, σ_{0}^{2}).

*S̃*_{0} = value used as an estimate of the standard deviation based on historical data from the zero-dose distribution, *N*(μ_{0}, σ_{0}^{2}); does not include the data from the current experiment.

*S*_{0} = sample standard deviation of the simple random sample of size *N*_{0} from the zero-dose distribution, *N*(μ_{0}, σ_{0}^{2}) for the current experiment.

*Ȳ*_{1} = sample mean of the simple random sample of size *N*_{1} from the low dose distribution, *N*(μ_{1}, σ_{1}^{2}).

*S̃*_{1} = value used as an estimate of the standard deviation based on historical data from the zero-dose distribution, *N*(μ_{1}, σ_{1}^{2}); does not include the data from the current experiment.

*S*_{1} = sample standard deviation of the simple random sample of size *N*_{1} from the low-dose standard distribution, *N*(μ_{1}, σ_{1}^{2}) for the current experiment.

ρ̂ = *S*_{1}/*S*_{0} = ratio of the estimates of the sample standard deviations based on the current experiment.

ρ̃ = S̃_{1}/S̃_{0} = ratio of the values of the standard deviations based on historical data; does not include the data from the current experiment.

*t*(γ; δ, *f*) = lower γth percentage point of a noncentral *t*-distribution, i.e., if the random variable *t* has a noncentral *t*-distribution with noncentrality parameter δ and degrees of freedom *f*, then *t*(γ; δ, *f*) is that percentage point such that P[T < *t*(γ; δ, *f*)] = γ. has a noncentral *t*-distribution with degrees of freedom given by *f* = *f*_{0} + *f*_{1} and noncentrality parameter: where the *t* Welch-type approximate *t*-statistic given in Eq. 12 can be shown to be:

### Derivation of Optimal Sample Size Allocation for Known ρ

In the case of known ρ and a monotonically increasing calibration curve, the noncentrality parameter is given by:

where *N* = *N*_{0} + *N*_{1} (the total sample size in both the blank sample and the calibrator prepared at the claimed MDC). Without loss of generality, only the monotonically increasing case needs to be considered to derive the optimal sample size allocation. An optimal sample size for this situation is given by finding the value of *N*_{0} that maximizes δ (and thus maximizes the power) for a fixed value of *N*.

By taking the first derivative of δ with respect to *N*_{0} and setting the result equal to 0, the following equation is obtained: which yields the result:

The resulting allocation formula maximizes δ for fixed *N* because the second derivative of δ with respect to *N*_{0} can be shown to be negative whenever *z*_{α} + *z*_{β}ρ > 0. By definition of the MDC model, this latter inequality will always be true for monotonically increasing calibration curves. For the monotonically decreasing case, the allocation formula is the same, and the derivation is similar.

If ρ̃ from historical estimates converges almost surely to ρ as both *N*_{0} and *N*_{1} become large, then the following allocation rule can be considered as asymptotically optimal; namely:

Thus, allocating the sample sizes proportional to the standard deviations is an asymptotically optimal allocation rule, where *S̃*_{1} and *S̃*_{0} are the historical sample standard deviations and are the estimates of the population standard deviations σ_{1} and σ_{0}, respectively.

### IL-2R Example

## Footnotes

↵1 Nonstandard abbreviations: MDC, minimal detectable concentration; SDL, signal detection limit; and IL-2R, interleukin-2 receptor.

1

^{a}Value missing.2

^{a}Sample sizes are*N*_{0}= 122 and*N*_{1}= 124, chosen such that the ratio (*N*_{1}/*N*_{0}) of the sample sizes is equal to the estimated value of ρ = LL_{p}= 1.01 and the power of 0.90 is achieved at β = 0.05. α = 0.01; γ = 0.05.

- © 2000 The American Association for Clinical Chemistry