## Abstract

**BACKGROUND:** Arrhenius modeling of analyte recovery at increased temperatures to predict long-term colder storage stability of biological raw materials, reagents, calibrators, and controls is standard practice in the diagnostics industry. Predicting subzero temperature stability using the same practice is frequently criticized but nevertheless heavily relied upon. We compared the ability to predict analyte recovery during frozen storage using 3 separate strategies: traditional accelerated studies with Arrhenius modeling, and extrapolation of recovery at 20% of shelf life using either ordinary least squares or a radical equation *y* = B_{1}*x*^{0.5} + B_{0}.

**METHODS:** Computer simulations were performed to establish equivalence of statistical power to discern the expected changes during frozen storage or accelerated stress. This was followed by actual predictive and follow-up confirmatory testing of 12 chemistry and immunoassay analytes.

**RESULTS:** Linear extrapolations tended to be the most conservative in the predicted percent recovery, reducing customer and patient risk. However, the majority of analytes followed a rate of change that slowed over time, which was fit best to a radical equation of the form y = B_{1}*x*^{0.5} + B_{0}. Other evidence strongly suggested that the slowing of the rate was not due to higher-order kinetics, but to changes in the matrix during storage.

**CONCLUSIONS:** Predicting shelf life of frozen products through extrapolation of early initial real-time storage analyte recovery should be considered the most accurate method. Although in this study the time required for a prediction was longer than a typical accelerated testing protocol, there are less potential sources of error, reduced costs, and a lower expenditure of resources.

Accelerated stability testing with Arrhenius modeling is the most common method used to predict the stability of a clinical sample, biologic raw material, or in vitro diagnostic (IVD)^{2} reagents. In many cases these materials are aqueous liquids containing a complex mixture of active ingredients, salts, proteins, and stabilizers. Because of the fragility of many of these materials' components, it is frequently necessary to store the materials in the frozen state.

The purported ability to predict shelf life using accelerated testing is primarily based on the assumption that the degradation process is accelerated at temperatures higher than the intended storage temperature, and that the only difference, or important difference, in the conditions that exist at accelerated and intended storage temperatures is the temperature itself, and therefore, the degradation mechanism(s) exhibit Arrhenius behavior. The problem with this assumption is that in many cases it is not true, especially if the normal product storage temperature is below 0 °C. In fact, many have argued that, because of the phase change that occurs upon freezing, or incomplete freezing and cryo-concentration, Arrhenius stability predications have little value. Even with these shortcomings, accelerated studies are heavily relied upon to provide indications regarding sample or reagent stability over a reasonable time frame from which predictions can be drawn regarding long-term storage stability.

Even if the assumption of Arrhenius behavior is essentially true, or deviations from Arrhenius behavior are negligible, error due to the extrapolation means that predictions can be unreliable even when testing errors are minimal. Given these limitations of accelerated stability testing, we explored the feasibility of extrapolating real-time, single temperature, percent recovery data. Although CLSI document EP25-A states that extrapolation is “inappropriate” (1), it is nevertheless the basis of Arrhenius modeling when it is assumed that a trend in degradation rates with temperature will continue at colder temperatures. Extrapolation of a trend in analyte recovery with time at a single temperature actually requires fewer assumptions.

Our studies first used computer simulations of accelerated and extrapolated real-time stability recovery data. By varying the simulated accelerated stability parameters such as method imprecision, failure criterion, number of temperatures, and number of time points per temperature, the resulting uncertainty was compared to real-time linear extrapolation uncertainty, and modifications made so that error was comparable. Then, actual accelerated and real-time studies were initiated on serum matrices, with both high and low concentrations of analyte.

## Materials and Methods

We compared the ability to predict analyte recovery during frozen storage (−10 °C and −20 °C) using 2 approaches: traditional accelerated studies with Arrhenius modeling, and extrapolation of real-time recovery at 20% of shelf life using data fitting. Optimal conditions for conducting these tests were established using computer simulation studies. Extrapolation of real time was then performed using 2 alternative equations.

### COMPUTER SIMULATIONS

Using Matlab® software version R2014a, hypothetical accelerated stability data were generated assuming that the activation energy was 20 kcal/Mole (2) with a real-time stability limit of 1 year for −10 °C and 2 years for −20 °C. In the simulations, it was assumed that the reaction rates at the different temperatures followed the Eyring-adapted Arrhenius model (3): (1) the effect of the intended storage temperature on analyte degradation rate was approximately 0.014% per day for −20 °C storage and 0.028% per day for −10 °C storage, and the only contribution to error was test method imprecision. Second, simulations of real-time recovery at one-fifth the shelf life (time point when a 1/5 of 10% failure criterion is reached) were carried out. Each simulated condition was replicated 50 000 times to obtain 95% confidence limits of simulation results, and the 95% confidence limits for both the Arrhenius and real-time simulations were compared. When they were approximately equal, the simulations formed the basis for the actual studies that followed. See the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol62/issue8.

Both simulated and actual experiment setup were executed so that a statistically significant change in the bi-level analyte concentrations of the processed serum matrices (either 2% for real time or 10% for accelerated) during the testing period would have an 80% chance of being detected (β = 0.2, α = 0.05). This determination was made using test method product insert repeatability imprecision and the power table in CLSI Document EP25-A (1). Regression analyses of simulated and actual data were initially based on ordinary least squares (OLS). Also, the accelerated stress periods for each temperature were twice the estimated time required to reach failure. This helped to ensure that the failure point was near the center of the regression for which the confidence limits are the narrowest (4).

Serum matrices with high and low analyte concentrations used in the actual experiments were proprietary Bio-Rad formulations manufactured at our facility. Ten-milliliter aliquots of the pools were stored in 15-mL Type 1 amber glass vials (SGD, North America) and stoppered with 20-mm integrated screw cap Butyl Rubber stoppers (West Pharmaceutical Services). Bulk formulations were purged of oxygen during compounding using Argon and vials filled under Nitrogen gas (Praxair Distribution Inc.).

The computer simulations evaluated many combinations of temperatures and time points, but as they were actually performed, accelerated studies used 5 temperatures, 6 equally spaced time points per temperature (including T0), and 3 repeat vials per time point tested in singlicate. Stress temperatures and times were 16°, 25°, 35°, 41°, and 47 °C for 21, 7.5, 2.5, 1.25, and 0.78 days, respectively, set up to mimic 2 years at −10 °C storage. The exception was Direct Bilirubin at the lowest concentration which, based on the method imprecision, required 4 vial repeats per time point.

All accelerated stability studies were performed using an isochronal staggered start method, so that testing could be completed on the same day (5, 6). All samples were stored at −70 °C until needed, with recoveries from unstressed samples serving as baseline values that are referred to as pseudo T0s. Once QC test results were deemed acceptable, degradation profiles at each of the 5 temperatures were evaluated for linearity and statistical significance (α ≤ 0.05). Degradation could be demonstrated by either a positive or negative change in analyte recovery with time. If the degradation slope at any temperature was not found to be statistically significant by a Student *t*-test of slope at a significance of *P* < 0.05, it was not used to predict stability using the Arrhenius model, and at least 3 consecutive temperatures were required for an Arrhenius prediction. In addition, the squared correlation coefficient (*r ^{2}*) was required to be ≥0.90 to make the prediction. If these conditions were not met, the shelf life prediction was made using results from the coldest temperature and assuming a 20-kcal/Mole/° activation energy (2). Although the failure points were interpolated using the linear regression, the estimated reaction rates used for Arrhenius predictions were calculated assuming 1st-order logarithmic decay:
(2) (linear)
(3) (first order) Note that differences in the interpolated failure points (linear vs logarithmic) are negligible if the total degradation during the study is <20%. Because the stress time for each temperature was equivalent to 720 days at −10 °C and 3359 days at −20 °C, predictions using the 20-kcal model were either >720 days at −10 °C or >3359 days at −20 °C if failure was not reached or the apparent change was insignificant (α > 0.05) within the stress interval. When Arrhenius modeling was used, shelf life predictions had no upper limit.

For real-time recovery testing, the number of vials for testing at the one fifth shelf life time point and the associated pseudo T0 vials varied depending on the imprecision of the method. The intention was to ensure that there would be 80% power to detect a 2% change in concentration. The number of total tests for the first real-time point and pseudo T0s ranged from 20 to 110 depending on the imprecision of the test method, whereas the total number of tests for the accelerated studies for all but 1 of the analytes was 90. Analytes for which real-time recovery data had no significant degradation (i.e., a slope *t*-test gave a *P* > 0.05) were not extrapolated but were assumed to have a stability >364 days at −10 °C and >728 days at −20 °C. Although the schedule was later modified, the first follow-up testing was performed at the analytes' projected failure points, which varied depending on the analyte and storage temperature. The number of repeat vials tested to verify expected recovery was typically less than the number of repeats for the first time point, but never fewer than 8 repeat vials per condition. All analyte recoveries for the real-time points' vials were assessed relative to the pseudo T0 vials for percent recovery. Table 1 summarizes the analyte concentrations and the amount of testing performed.

For instrumentation, including incubators, differential scanning calorimetry (DSC), chemistry, and immunoassay analyzers, please refer to page 9 in the online Supplemental Data.

## Results

### COMPUTER SIMULATIONS

The results of each computer simulation of accelerated data was a distribution of Arrhenius predictions of the days to failure centered on the theoretical real-time stability of either 365 days at −10 °C or 730 days at −20 °C storage. Online Supplemental Fig. 1 shows that an average days to failure of 731 days has a 95% worst-case limit of 498 days when the test method's CV is 4% and when 5 temperatures, 6 points per temperature, and 2 repeat vials per point (total tests = 60) are used to predict shelf life for −20 °C storage assuming an activation energy of 20 kcal/Mole. For a 4% method CV the calculated error is approximately the same when the real-time extrapolated regression uses 64 repeat vials for the Pseudo T0 and 64 repeat vials for T1/5 (128 tests); see online Supplemental Fig. 2.

The simulation data were used to set up actual studies to compare both methods of predicting shelf life at both −10 °C and −20 °C. Although method imprecision was not expected to be the only contributor to the results, other sources of error, such as insufficient degradation, temperature control, or error in the Arrhenius model assumptions themselves, were not used to establish the amount of testing. Other simulations, not covered here, did show such factors could be significant contributors to error. To help ensure that sufficient degradation would occur during the study, the analytes tested in the low and high analyte concentration matrices were those that tended to have either a history of instability or borderline instability at −10 °C.

For most analytes at the low concentrations, degradation at some or all of the stress temperatures was insufficient to estimate rates, so that the accelerated stability predictions of shelf life necessitated the assumption of a 20-kcal/Mole activation energy. The exceptions were alanine aminotransferase (ALT) and creatine kinase (CK), which were predicted to have much longer shelf lives using the Arrhenius model. At the high analyte concentrations, there was sufficient degradation in 9 of the 12 data sets to estimate the degradation rates.

At approximately 1/5th the intended −10 °C shelf life (74 days for the high concentration and 85 for the low concentration due to scheduling), real-time product vial analyte recoveries were compared with analyte recovery in pseudo T0 vials. If the differences were significant (α < 0.05), the percent recoveries over time were fitted using OLS linear regression analysis and future recoveries extrapolated. When significant differences were measured, a 1-tailed 5% worst-case error line was also calculated as per reference CLSI Protocol EP25-A (1). An example for ALT in the high-concentration material is shown in Fig. 1A and its accelerated data in Fig. 1B. In this example the extrapolated failure point predicted a much shorter life (132 days, 128 days worst case) than the accelerated testing (2905 days or 7.9 years). Note that even with apparently good data and temperature trending, the error range in the multitemperature accelerated prediction was much larger (2.1–29 years—greater than a 10-fold range—in the high-concentration ALT example), than that resulting from the simulations (498–1100 days—approximately a 2-fold range), indicating additional sources of error not accounted for in either the model assumptions or experimental conditions. Table 2 summarizes the initial −10 °C and −20 °C real-time recoveries for all the analyte tested as well as the accelerated data.

Table 2 illustrates a large discrepancy in the accelerated predictions vs the extrapolated real-time predictions, with real-time OLS regression extrapolation tending to be much more conservative. This was especially true of glycated serum protein (GSP), which failed before the first initial data point at −10 °C in high and low concentrations, in the high concentration at −20 °C, and almost failed in the low concentration at −20 °C. Expecting that the extrapolated failure point would be close to the actual failure point, the first follow-up testing was initially performed on only those analytes expected to be close to their failure time point, and because a detection of change of about 10% change was thought necessary, only 8 vial replicates were tested for the time point and the pseudo T0s. However, the majority of the analytes' degradation rates appeared to slow, so that the change in the recoveries was <10%.

As the follow-up, real-time recovery data began to accumulate, it became clear that, with many of the analytes tested, the rate of change was slowing with each successive time point. As will be discussed later, this slowing of reaction rate is not likely due to higher-order kinetics, but due to a gradual change in the matrix during storage at these temperatures. Fig. 2A shows an expected −10 °C linear trend for CK in high concentration although the degradation patterns for ALT in high concentration, alkaline phosphatase (ALP) in the low concentration, and HDL cholesterol at −20 °C in the low concentration that were more typical (Fig. 2, B–D).

Because the goal was to establish a workable model to predict shelf life, curvature was retrospectively applied to early real-time recovery trends. Various methods were explored (see Discussion section), but the best found was to utilize the square root of the time to determine the regression coefficient:
(4) where B_{1} is the X coefficient, T_{1/5} is 1/5 the total time of the real-time testing period, and %RT_{1/5} is the recovery at 1/5 the total time of the real-time testing period. The recovery at a future time point X is estimated using the following radical equation:
(5) and the predicted time when the analyte degrades 10% is:
(6) Table 3 shows the actual and predicted percent recoveries using accelerated data, OLS real-time extrapolation, and radical equation real-time extrapolations. The table's bottom summary row ranks each method by how close the predictions were to the final measured recovery. The results clearly showed the radical equation as a better predictor of future analyte recovery.

## Discussion

The goal of more accurately predicting real-time stability at subzero temperatures was achieved in this study; this was true regardless of whether the change in concentration was positive or negative. This can be seen in the free thyroxine (T_{4}) data in online Supplemental Fig. 3 and here in Table 3 that shows only 4 of 12 positive real-time changes >1% were predicted best with accelerated testing. However, dependent on one's area of expertise, some may disagree on what the mechanism is behind real-time degradation profiles. For the majority of analytes tested, the degradation trend lines were not linear as was seen in all of the accelerated studies regressions with the same amount of degradation (see ALT data in Supplemental Fig. 4 in the online Data Supplement).

Because a comparable amount of degradation was measured during both stress conditions, the curved path in real-time recovery vs the linear path during accelerated testing cannot easily be explained as merely a demonstration of higher order kinetics. Researchers in the food industry may not find the differences surprising. For example, using DSC, Calligaris et al. modeled the altering effects of subzero temperature phase transitions and viscosity on the Arrhenius behavior of sunflower oil and produced an additional factor to account for the change due to percent liquid (7). D. Champion et al. modeled subzero temperature reaction rates of ALP catalyzed DNPP (disodium p-nitrophenol phosphate) hydrolysis in cryo-concentrated solutions by including factors for increases in viscosity (8). An analogous strategy was attempted at our facility, using the abrupt change in heat capacity to account for departure from Arrhenius plot linearity, and slowing of reaction rates that were predicted using the standard Arrhenius model. However, the additional challenge of predicting stability in aqueous biological solutions is due to changes in relative chemical activities that may drive predicted reaction rates in the opposite direction, thus leading to increased rates. Fig. 3 illustrates the 2 opposing forces, which will be different in relative magnitude depending on the matrix and analyte studied.

Complicating matters still further is the fact the water in the serum matix is also in flux during storage at subzero temperature, especially at −10 °C. A 4-day modulated DSC −10 °C isothermal analysis of the matrix used in this study showed that the measured reversing heat capacity started high and gradually decreased. A −20 °C 4-day isothermal showed a similar pattern, albeit starting lower and displaying a smaller rate of decrease. At the end of the 4-day monitoring, the trend was still downward at both temperatures with apparently much further to go before equilibrium was reached (see online Supplemental Fig. 5).

Based on the pattern seen in isothermal monitoring, the water may be slowly self-associating during long-term storage and forming ice. As the process progresses, relative chemical activities may increase whereas at the same time proteins reach their solubility limit and translational motion slows. However, one should note that owing to cycling of freezer compressors, the rate at which percent water changes will likely be different under actual product storage conditions.

Because the initial effects of storage between 0 and −20 °C change with time, a model assuming higher-order kinetics because the mechanism is probably not appropriate for predicting stability at these temperatures. Applying zero to 3rd-order equations to predict future recoveries was also compared with the radical equation (*Y* = *B*_{1} + *B*_{0}), and based on the sum of the squared residuals the radical equation was more accurate in 14 of 19 −10 °C and 7 of 14 −20 °C recovery predictions (see online Supplemental Tables 1 and 2 and online Supplemental Fig. 6 for the rate equations).

Based on an initial assumption that the analytes will exhibit Arrhenius trends at subzero temperatures, and that the only appreciable source of error is method imprecision, computer simulations indicated the predictions' precision equivalence would be achieved with about 90 accelerated tests compared to 20–110 tests per prediction for extrapolation. Therefore, in general, the extrapolation of real-time stability predictions was not only more accurate, but it used less reagent. The drawback was that it took more calendar time to obtain the results. However, the amount of labor hours is much lower than accelerated testing. In addition, if accelerated stress times are short, as with higher temperature stress conditions, sample heating and cooling ramp times can impact results, which are difficult to account for unless water baths are used to speed up ramp times.

Finally, planning of accelerated stability testing requires an assumption of an analyte's degradation activation energy. Errors in the estimates or assumptions of activation energy can lead to insufficient stress time for establishing a rate at some, or all, of the stress temperatures. This will reduce the already questionable utility of the accelerated testing.

The results of this study help to demonstrate that provided that a workable mathematical model is applied, extrapolation of real-time analyte recovery is a more accurate predictor than accelerated testing for subzero storage conditions of clinical samples, biological raw materials, and IVD reagents such as calibrators and controls. Owing to other uncontrolled sources of error in accelerated testing, real-time extrapolation may prove to be superior at predicting stability even at 2–25 °C storage conditions. For example, in addition to errors in intended stress conditions (e.g.; time at temperature), exposure of a product to increased temperatures may result in formation of degradation products that are not typically formed at recommended storage temperatures. These products may positively or negatively influence the stability prediction of an analyte.

Although we will likely continue to use accelerated stability testing early during feasibility to evaluate frozen formulations or raw material candidates, the most cost effective approach would be for estimating relative, not absolute stability, with a single temperature. With the exception of ALP, ALT, GSP, and CK, an extrapolated linear model of real-time data was shown to be conservative, yet less accurate, than the radical equation. More conservative stability predictions lower risk to patients and to users of reagents and biological material, making linear model extrapolations useful as a worst-case prediction, much like the 95% linear model confidence limits used in OLS regressions of accelerated and in-use stability data as described in EP25A (1).

The US Food and Drug Administration stability testing guidelines state that extrapolation of real time may be used if it is justified by “… what is known about the mechanism of degradation, the results of testing under accelerated conditions, the goodness of fit of any mathematical model.” (9). This is a reasonable position, and we believe that the results presented here provide strong support for extrapolation of real-time data over the use of accelerated testing data for frozen products. The mechanism by which analyte recovery in the frozen matrix increases or decreases appears to be strongly influenced by the changing state of the frozen matrix at −10 °C to −20 °C, which is not captured at higher accelerated temperatures but can be demonstrated during DSC analysis. Although the real-time degradation patterns fit the radical equation model most often, the model cannot be used to infer an underlying mechanism of degradation.

## Footnotes

↵2 Nonstandard abbreviations:

- IVD,
- in vitro diagnostic;
- OLS,
- ordinary least squares;
- DSC,
- differential scanning calorimetry;
- ALT,
- alanine aminotransferase;
- CK,
- creatine kinase;
- ALP,
- alkaline phosphatase;
- T
_{4}, - thyroxine;
- GSP,
- glycated serum protein.

(see editorial on page 1049)

**Author Contributions:***All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.***Authors' Disclosures or Potential Conflicts of Interest:***Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:***Employment or Leadership:**K.W. De Vore, Bio-Rad Laboratories; J.E. Sass, Bio-Rad Laboratories.**Consultant or Advisory Role:**K.W. De Vore, CLSI.**Stock Ownership:**K.W. De Vore, Bio-Rad Laboratories; J.E. Sass, Bio-Rad Laboratories.**Honoraria:**None declared.**Research Funding:**None declared.**Expert Testimony:**None declared.**Patents:**None declared.**Role of Sponsor:**No sponsor was declared.

- Received for publication February 16, 2016.
- Accepted for publication May 10, 2016.

- © 2016 American Association for Clinical Chemistry