## Abstract

**BACKGROUND:** Reliable estimates of within-person biological variation and reference change value are of great importance when interpreting test results, monitoring patients, and setting quality specifications. Little information has been published regarding what experimental design is optimal to achieve the best estimates of within-person biological variation.

**METHOD:** Expected CIs were calculated for different balanced designs for a 2-level nested variance analysis model with varying analytical imprecision. We also simulated data sets based on the model to calculate the power of different study designs for detection of within-person biological variation.

**RESULTS:** The reliability of an estimate for biological variation and a study's power is very much influenced by the study design and by the ratio between analytical imprecision and within-person biological variation. For a fixed number of measurements, it is preferable to have a high number of samples from each individual. Shortcomings in analytical imprecision can be controlled by increasing the number of replicates.

**CONCLUSIONS:** The design of an experiment to estimate biological variation should take into account the analytical imprecision of the method and focus on obtaining the highest possible reliability. Estimates of biological variation should always be reported with CIs.

Estimates of biological variation are of great importance in biology and medicine. They are used for interpreting test results, monitoring patients, and setting diagnostic limits as well as for setting analytical quality specifications (1). However, it is not uncommon for different sources to report conflicting estimates of within-person biological variation (SD_{WP})^{4} for the same constituents, for example, glycohemoglobin (2) and prostate-specific antigen (3).

It is noteworthy that an estimate of the uncertainty of the SD_{WP} is seldom given (4, 5), although this information is usually required for other types of estimates (6–8). Use of the SD_{WP} in a sensible way requires reliable estimates. The CI is a good indicator of the reliability of an estimator and should always be included.

Studies on SD_{WP} are typically based on a group of individuals who are considered to be in a steady-state condition with regard to the analyte of interest. It is important to consider the number of individuals as well as the number of samples because the estimated SD_{WP} and its uncertainty are highly dependent on the choices made. Typically, a series of samples are collected from each individual at regular intervals, stored appropriately, and analyzed in series in duplicate (1). Calculations on the variances of interest are usually performed by using ANOVA with a nested random design in 2 levels (9). This method gives the between-person (interindividual) variation (SD_{BP}), the within-person (intraindividual) variation (SD_{WP}), and the analytical imprecision (SD_{A}) (9). A presupposition for using this ANOVA model for hypothesis testing and calculating CIs is that the model terms are normally and independently distributed with constant variances (10, 11). Furthermore, most biological data are logarithmically distributed, so the calculations are often performed on the natural logarithms of the observations to make the data distribution closer to gaussian (9).

The variances are most often presented as the CV. However, very few papers and databases on SD_{WP} present CIs for the estimated values of the components (4, 5, 12, 13), but there are some exceptions (14, 15). It is therefore difficult to compare the estimates from different publications. The formulas for calculating CI for the estimated components are not commonly described in statistical books or available in statistical software, but these formulas can be found in Burdick and Graybill (16).

Investigations on biological variation have typically been performed with 10–20 individuals (9) and up to 10 samples from each individual, or with up to 100 individuals and only 2 samples from each, although experiments have been described that have used up to 274 individuals and 6 samples from each (17). Fraser and Harris assert that “the components of variation can be obtained from a relatively small number of specimens collected from a small group of subjects over a reasonably short period of time,” but do not provide any evidence or explanation for this.

The aim of this investigation was to assess how changes in study design concerning the number of replicates, the number of samples, and/or the number of individuals, as well as changes in the SD_{A} of the instruments used, influence the width of the CI of the estimated SD_{WP} and thereby the reliability of the estimate. In addition, we introduce the concept of power in estimating the SD_{WP}. Information about CI and power can be implemented in the study design and also used to evaluate the reliability of estimates of SD_{WP} from published reports.

## Methods and Statistics

### MODEL

The following model for a nested 2-level balanced random design describes the typical model used in analysis of biological variation and the one used for calculations and simulations in this paper (16):
*B _{i}, W_{is}*, and

*A*are mutually independent normal random variables with means of 0 and SDs SD

_{i}_{BP}, SD

_{WP}, and SD

_{A}, respectively;

*I*is the number of individuals;

*S*is the number of samples from each individual; and

*R*is the number of replicate measurements of each sample. The simulated data values used for this model are by definition independent and normally distributed and have homogenous variance, which is an assumption for calculating CI (11). The design is also balanced, meaning that there is the same number of samples from each individual, and the same number of replicates from each sample. Fig. 1 illustrates how the SD

_{A}and the SD

_{WP}and SD

_{BP}are connected.

### CALCULATIONS AND SIMULATIONS

To assess the effects of changing the SD_{A}, number of replicates, number of samples, and/or number of individuals on the uncertainty of the estimated SD_{WP}, we calculated the expected CI_{95} (95% CIs) for the SD_{WP} for the different designs and with varying SD_{A}. To evaluate the power, we simulated 10 000 intervals for each combination of designs and SD_{A}.

In all the calculations and simulations, the input grand mean, μ, was set to 100, so the SDs were equal to the CV%. Formulas for calculating the CIs (16) were implemented in SPSS (IBM Software).

For the analysis of the widths of the CI_{95} we did not use simulations; instead we calculated the expected intervals from the formulae described in (16), replacing the mean of the sum of squares (MS) with the expected mean squares (EMS), based on these formulas:
_{95} in percentage of the SD_{WP} was calculated as the width of the central 95% interval divided by the SD_{WP}. For example, a width of 50%, when the SD_{WP} is 5, calculates to 2.5. The width of the CI in percentage of the SD_{WP} and the power of the design are independent of the size of the SD_{WP}.

The skewness of the CI_{95} for the SD_{WP} was evaluated by calculating the ratio between the lower part of the interval (the distance between the input SD_{WP} and the lower limit of the CI_{95}) and the upper part of the interval (the distance between the input SD_{WP} and the upper limit of the CI_{95}).

We chose to define the power of an ANOVA model designed for estimating the SD_{WP} as the probability to detect an SD_{WP} different from 0. Statistically, this is equivalent to the ratio between the χ^{2}-distributed mean squares for the SD_{WP} and the χ^{2}-distributed mean squares for the SD_{A}. This ratio is F distributed, and we reject the hypothesis H_{0} that SD_{WP} equals 0 at a significance level of α (1-sided test) if (16):
_{WP} but the model fails to distinguish it from 0. We used the simulated data sets and calculated the corresponding MS for different designs. We then counted the times we rejected H_{0} and divided it by the total number of simulations. This ratio equals 1 − β.

The data used in the analysis of power for the SD_{WP} were simulated by a Monte Carlo method with normally distributed values, having a set mean and variances for the required number of individuals, samples, and replicates according to the model (Eq. 1). Increasing the number of simulations beyond 10 000 for each case did not significantly change the results, and therefore 10 000 simulations were used.

The reference change value (RCV) in % is defined as (18):
*Z* is the number of SDs appropriate for the chosen probability. The RCV defines circumstances under which a difference between 2 consecutive results can be explained by a given SD_{A} and SD_{WP} with a certain probability. The width of the CI in percentage of the RCV is calculated as the central 95% interval divided by the RCV (Eq. 4). The width of the CI in percentage of the RCV is independent of the chosen *Z* and the size of the SD_{A} and SD_{WP} and is dependent only on the ratio between the SD_{A} and SD_{WP}.

### SOFTWARE

The calculations and simulations were performed in IBM® SPSS® Statistics Version 19, release 19.0.0 for Mac, and the tables and figures were made in Microsoft® Excel® 2011 for Mac.

## Results

### WIDTH OF CI

The width of the CI for the estimated SD_{WP} changes with the different experimental designs and with the ratio between SD_{A} and SD_{WP} (Table 1). When the SD_{A} is low compared to the SD_{WP}, the effect of increasing the number of replicate measurements on the CI of the estimated SD_{WP} is low. When the ratio is 0.25 there is no change in the CI_{95} with the use of more than 2 replicates, but with a higher SD_{A}, the width of the CI_{95} can be reduced considerably by using 4 compared with 2 replicate measurements. For a design with 20 individuals, 6 samples, and a ratio of 2 between SD_{A} and SD_{WP}, the width of the CI_{95} can be decreased from 118% to 59% by using 4 replicate measurements compared with 2. Also, with a given number of individuals the width of the CI_{95} can be decreased to approximately one-third by increasing the number of samples from 2 to 10. Most of the improvement is gained by increasing the number of samples from 2 to 4; when this is done, the width of the CI_{95} is nearly halved. There is little reduction in the widths of the CI_{95} by increasing the number of samples from 8 to 10 (Table 1).

In most situations there is greater benefit from increasing the number of samples than from increasing the number of replicates, except when both the number of individuals and the ratio between SD_{A} and SD_{WP} are high (Table 1).

In Table 2 the skewness values of the CI_{95} for the SD_{WP} are shown for the same designs as in Table 1. The intervals are typically not symmetrical around the point estimator. The results are independent of the SD_{WP} and depend only on the variables given in Table 2.

The width of the CI_{95} for the RCV for the same designs as in Table 1 is shown in Supplemental Table 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol58/issue9. Changes in the ratio of SD_{A} to SD_{WP}, as well as the numbers of replicates, samples, and individuals, have a similar effect on the CI of the RCV as on the CI of the SD_{WP} (Table 1). The CI_{95} for the RCV (see online Supplemental Table 1) can be made relatively small with a large number of individuals and samples. In online Supplemental Table 2 the skewness of the CI_{95} for the RCV is shown.

The width and skewness of the CI_{95} for the estimate of the SD_{WP} are not affected by the ratio between the SD_{BP} and the SD_{WP} (results not shown).

### POWER

The power of several experimental designs with different ratios between the SD_{A} and the SD_{WP} estimated from 10 000 simulated data sets for each design is shown in Table 3. The results show that a low SD_{A} in comparison to the SD_{WP} is vital to gain sufficient power in the experiment. With a high ratio of SD_{A} to SD_{WP} one will never achieve adequate power, usually defined as power above 80% (10), even with many individuals and samples; however, more replicates will increase the power. When the ratio between SD_{A} and SD_{WP} is below 1, the power will be 1.00 for most designs (results not shown).

The ratio between the SD_{BP} and the SD_{WP} does not affect the power of the estimate on the SD_{WP} (results not shown).

## Discussion

The main conclusions we formulated on the basis of the present study were that the width of the CI of the SD_{WP} and RCV, as well as the power for the SD_{WP}, varies with the number of individuals, the number of samples from each individual, and the number of replicates. In addition the higher the SD_{A} is in relation to SD_{WP}, the wider the CI will be. The CI is skewed, and the lower part of the CI is usually smaller than the upper.

To get an idea of the size of the CI, the tables in the present paper can be used. These calculations assume balanced designs, independence of data points, homogeneity of variance, and normally distributed values (11). For example, if the CV_{WP} is 4%, the number of individuals in the study is 15, the number of samples is 10, the number of replicates is 2, and the CV_{A} for this constituent and the used method is 3%, the power of such a study (that is, the probability to detect a CV_{WP} different from 0) is 1 (Table 3). The width of the CI_{95} will be 32% (of 4%), i.e., the width is 1.3 (Table 1). The ratio between the lower and the upper part of the CI is 0.8 (Table 2), and the CI for 4% will therefore be from 3.4% to 4.7%. The bidirectional 95% RCV (*Z* = 1.96 in Eq. 4) calculates to 13.9, and the width of the central 95% CI is 20% (see online Supplemental Table 1), which calculates to a width of 2.8. The CI is right skewed, with a ratio of 0.8 between the lower and upper part (see online Supplemental Table 2), so the estimated interval will be from 12.7 to 15.4. The unidirectional 95% RCV (*Z* = 1.64 in Eq. 4) calculates to 11.6, and with the same width and skewedness as the 2-sided RCV, the estimated central 95% CI will be from 10.6 to 12.9. With the same total number of measurements the width of the CI for the CV_{WP} is smaller if there is a high number of samples from each individual compared to few samples from a higher number of individuals (Table 2).

For troponin, 2 studies have been reported that were performed to investigate the short- and long-term (4 and 8 weeks, respectively) CV_{WP} (4, 5). If we estimate the expected CI for the short- and long-term CV_{WP} for troponin T reported by Vasile et al. (5) we get 36%–62% and 71%–120%, respectively. The 2 CI values are not overlapping, indicating a significant difference in the SD_{WP} for the 2 periods. For the troponin I reported by Wu et al. (4) we get CIs of 7%–13% and 11%–18% for the short- and long-term SD_{WP}, respectively, so the 2 CI values overlap and do not indicate a significant difference in SD_{WP} for troponin I for these time periods. Both the troponin I and T assays have a relatively large CV_{A} compared to CV_{WP} (high ratio), which gives large CIs for the designs used in these 2 studies (4, 5). (The estimated CI assumes that the data fit the model and that the design is balanced.)

The tables and figures of this report may be helpful to both choose the design that gives a desired CI_{95} and power and determine the CI_{95} and power of a given design. Because of the effect of the ratio between the SD_{A} and the SD_{WP} on the CI for the SD_{WP}, it is important to have a crude idea of the size of these components when planning a study. Such information may be found in previous studies or by performing a pilot study.

When comparing Table 1 and online Supplemental Table 1, one has to keep in mind that as the CV_{A} increases, the width of the CI for CV_{WP} also increases, but the estimated CV_{WP} stays the same. For the RCV, both the CI and the estimated RCV itself increase when CV_{A} increases.

The power of a study design intended to estimate the SD_{WP} is very much dependent on the SD_{A}. However, in situations in which one must use a method with relatively high imprecision owing to issues such as availability and cost issues, one can increase the power by increasing the number of samples or replicates. To increase the power, it is always better to increase the number of samples as opposed to increasing replicates. The effect of a hierarchical design means that the degrees of freedom increase from top to bottom, and therefore in general the power is higher for the SD_{WP} than for the SD_{BP}.

If data originate from studies in which the samples were not analyzed in replicates, as is often the case (2, 9), the SD_{A} cannot be calculated, and the stated SD_{A} often stems from the laboratory internal QC data generated by use of commercial sample materials, which can give an SD_{A} that is not necessarily transferable to patient samples (2). Likewise, when analytical quality specifications are set, these are often based on SD_{WP}, but, as shown, a low SD_{A} is a prerequisite to estimate the SD_{WP} accurately (Table 1).

It is preferred that the samples are analyzed within the same series to optimize the study. The time span between consecutive collections of the samples is usually a determining factor for SD_{WP}, and when the SD_{WP} is given, the time span should be indicated as, e.g., within-day variation, between-day variation, between-week variation. The model assumes variance homogeneity for the SD_{A} and SD_{WP} (10, 11), and these conditions should be checked. If inhomogeneity of variance is present, outliers might have to be removed and/or data transformed by a logarithmic transformation, correction of systematic changes by linear regression, or other appropriate transformations (9, 10). The aspects of time span and homogeneity do not affect the calculations of the CI or the power of the design, but they influence the interpretation of the results.

## Conclusion

Study design, number of replicates, number of samples, and number of individuals have a great impact on the reliability with which we can estimate the SD_{WP} and RCV of an analyte. The effect of these variables varies with the ratio of SD_{A} to SD_{WP}. The lower the ratio, the narrower the CI for the SD_{WP} and the better the power of detection. The number of samples collected per person is more important than the number of individuals examined when the SD_{WP} is estimated. With any given experimental design one should estimate the power and calculate the CI_{95} for the SD_{WP}.

## Acknowledgments

Thanks to Reidun L.S. Kjome for making the manuscript more accessible to readers.

## Footnotes

↵4 Nonstandard abbreviations:

- SD
_{WP}, - within-person biological variation;
- SD
_{BP}, - between-person biological variation;
- SD
_{A}, - analytical imprecision;
- RCV,
- reference change value.

- SD
**Author Contributions:***All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.***Authors' Disclosures or Potential Conflicts of Interest:***No authors declared any potential conflicts of interest.***Role of Sponsor:**No sponsor was declared.

- Received for publication April 8, 2012.
- Accepted for publication June 11, 2012.

- © 2012 The American Association for Clinical Chemistry