## Abstract

Background: We evaluated the commutability of a proposed reference material (PRM), with a formulation based on dilution of Certified Reference Material 470 (CRM470), for 24 high-sensitivity C-reactive protein (hsCRP) methods. We also investigated whether calibration by use of PRM was effective in harmonizing results.

Methods: A set of 40 native clinical samples was measured along with PRM and 3 dilutions of PRM. We used weighted least-squares polynomial regression (WLS/PR) to perform comparisons between all method combinations and to calculate normalized residuals for the PRM. The PRM was considered noncommutable if any of the normalized residuals for a method pair was >2. Correspondence analysis (CA) was used to explore the multidimensional relationships between methods and samples to evaluate if the PRM had properties similar to native clinical samples. Clinical sample results from the methods for which PRM was commutable were recalibrated based on the PRM results, and ANOVA was used to estimate the CVs before and after recalibration.

Results: After omitting data for 9 methods because of poor precision or procedural flaws, we used data from the 15 remaining methods to evaluate commutability. Using both WLS/PR and CA we found that PRM was noncommutable with 1 method. We found modest improvement in total and among-method CVs when PRM was used to harmonize the results from the 14 methods for which it was commutable.

Conclusions: A PRM with a formulation based on dilution of CRM470 was commutable with native clinical samples for 14 of 15 hsCRP methods that had acceptable precision. For those methods the use of PRM may contribute to improved harmonization of results for native clinical samples.

In a previous study we demonstrated that a dilution of Certified Reference Material 470 (CRM470)1 had the precision, linearity, and parallelism characteristics required of a secondary reference material and could be used with high-sensitivity C-reactive protein (hsCRP) methods (1). Before this proposed reference material (PRM) can be used with confidence, its commutability with native human serum must be validated for the methods with which it will be used. In this study, we evaluated commutability by analysis of native clinical samples along with dilutions of the PRM to determine if it was commutable and consequently if calibration using PRM was effective in producing harmonized results.

Commutability is a property of a reference material demonstrated by the closeness of agreement between the relation among the measurement results for a stated quantity in this material, obtained according to 2 given measurement procedures, and the relation obtained among the measurement results for other specified materials (2). In this report, the relationship between 2 measurement procedures that is observed for a set of native clinical samples as the “other specified materials” is compared to the relationship observed for diluted CRM470 as the reference material. If the relationship is the same within stated statistical limits, the diluted CRM470 would be validated to be commutable with native clinical samples for the 2 measurement procedures evaluated. For diluted CRM470 to be useful as a reference material for calibration of routine measurement procedures, the property of commutability would need to be validated among all measurement procedures for which it is to be used (3)(4).

Ideally, commutability is evaluated in the context of a reference measurement procedure that can be used to determine the “true” value for the native clinical samples and for the reference materials. Although some progress has been made, there is currently no reference measurement procedure for CRP (5). Consequently, it is necessary to compare all combinations of routine methods to determine commutability of the PRM with native clinical samples.

We have evaluated the commutability of the PRM from phase 1. Approaches to evaluating commutability that are based on the ratio between 2 methods (6)(7)(8)(9) and multivariate statistical approaches (10)(11) have been reported. All of these approaches compare the results for reference materials to the dispersion of results for native clinical samples. We compared each method to all of the others by using 2 approaches that can accommodate the large number of method combinations associated with this study. The first method was weighted least-squares polynomial regression (WLS/PR), which does not require a linear relationship between methods and accounts for nonconstant variance across the measurement range. The second was a multivariate technique, correspondence analysis (CA), a descriptive, exploratory technique designed to analyze 2-way and multiway tables containing some measure of correspondence between rows and columns (12). We have also evaluated the ability of PRM to harmonize results from the methods for which it was commutable.

## Materials and Methods

### participants

A total of 24 different methods were used by the 21 manufacturers participating in this study; 3 of the participating manufacturers submitted results for 2 different methods. Twenty participants used latex-enhanced immunoturbidimetric methods, 1 used a particle-enhanced immunonephlometry method, 2 used lateral-flow immunoassays, and 1 used a time-resolved florescence immunoassay. A list of participating manufacturers can be found in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol55/issue2.

### reference materials and native clinical samples

The participants purchased CRM470 from the Institute for Reference Materials and Methods. Participants also purchased a set of 40 individual human serum samples in the frozen state from Solomon Park Research Laboratories. These samples were prepared from previously frozen serum units that were collected according to the Clinical and Laboratory Standards Institute C37-A protocol (13). The samples were prescreened for hsCRP concentration to select a panel of 40 that spanned the clinically important measuring interval of 0.7–3.9 mg/L.

### laboratory measurements

Participants used the same lots of reagents and calibrators for the entire evaluation, which was conducted in 3 analytical runs performed on 3 days. Except for reconstituted CRM470 sample preparation, all samples and dilutions were thawed or prepared anew each day.

CRM470 was reconstituted and stored according to the Institute for Reference Materials and Methods procedure. A 1:10 dilution of CRM470 was prepared with the method diluent and called PRM. The method diluent was used to prepare 13 aliquots of PRM for analysis. The method diluent was also used to prepare the following dilutions, expressed as the percentage dilution of PRM: 25%, 50%, and 75%. Three aliquots of each dilution of PRM were prepared for analysis. One aliquot of each serum sample was prepared for analysis.

The participants calibrated the method according to the manufacturer instructions, using the manufacturer-recommended calibrators. A new calibration curve was constructed for each run. Runs were completed the same day that the dilutions of CRM470 were prepared. Each preparation (PRM, diluted PRM, native clinical samples, and controls) was analyzed in duplicate. This analysis protocol produced 26 data points for PRM, 6 data points for diluted PRM, and 2 data points for samples and controls per run.

### statistical design

Statistical power calculations were performed to determine the sample size, number of replicates, and number of runs sufficient to detect a 5% difference between any 2 types of materials or between any 2 routine methods, with *P* = 0.80 calculated by using a 2-sided 0.05-level test. Although pair-wise comparisons of materials and methods were not of primary interest in this study, the power calculations were performed in this way to assure adequate power for comparing predicted PRM values based on native clinical sample regression equations with average measured PRM values. For these power calculations, a maximum within-assay CV of 4% was assumed for measurement of samples with concentrations between 1.6 and 4.0 mg/L and a maximum within-assay CV of 6% for measurement of samples with concentrations between 0.7 and 1.6 mg/L. These CVs are consistent with the observations we made in phase 1 (1). We found that we would need to analyze 13 samples in 3 runs in the concentration range between 1.6 and 4.0 mg/L and 27 samples in 3 runs in the concentration range between 0.7 and 1.6 mg/L, for a total of 40 individual native clinical samples. The concentrations of samples used were selected to bracket the CDC–American Heart Association medical decision points of 1 and 3 mg/L (14).

### data analysis

We used SAS® to perform the WLS/PR and CA analyses.

### precision

Regression approaches for evaluation of commutability require that methods have reasonable precision. Because we had results from 3 separate runs, we evaluated both within-run and among-run imprecision at each concentration of the PRM for each assay. From these averages we estimated the 95th and 99th percentile range limits of replicate results for each concentration. The range of values at each concentration for each method was compared with the appropriate limit. Any method with 2 within- or among-run ranges exceeding the 95% limit or with 1 range exceeding the 99% limit for a single dilution was flagged. Methods with 3 or more range flags were considered to have insufficient precision and were excluded from further analysis. The above process was repeated without the excluded participants to check for additional participants with unacceptable precision.

### commutability

One problem with the use of statistically based techniques to evaluate methods with differing precision is that the most precise methods are at a disadvantage. The reason for this is that an average difference of a specified amount is more likely to be detected (i.e., be declared statistically significantly different) for a very precise method than for an imprecise method. Thus, for our commutability study we calculated the normalized residuals by use of a common SE rather than a method’s actual SE (see the online Data Supplement for details).

We considered using linear regression methods such as Deming and Passing-Bablok. When we explored the data, however, we observed that 74 of 210 of the method pairs had nonlinear relationships. Thus we were not able to use linear regression to evaluate commutability in this study.

In the WLS/PR regression approach, we regressed the mean results for each of 40 native clinical samples from method Y onto those from method X. Because the procedure assumes all variance is reflected in the method represented on the ordinate, it was also necessary to perform regression of method X onto method Y. (see the online Data Supplement for details.) We calculated normalized residuals for PRM and its dilutions as follows: From the slope and intercept of each of the regressions for native clinical samples, we calculated the predicted value of PRM for the method represented on the ordinate from its measured result by using the method represented on the abscissa. We then calculated the absolute value of the difference between the predicted value for the method on the ordinate and the mean measured value for the method on the abscissa and divided this difference by the appropriate SE (see the online Data Supplement for details.) If any of the 4 normalized residuals computed for a pair of methods was >2 (approximate 5% statistical significance criteria) we considered the candidate reference material to be noncommutable.

CA was carried out by use of the CORRESP procedure in SAS and independently verified by the implementation described by Bretaudiere et al. (10) on an OpenVMS platform (Hewlett-Packard). CA is a multivariate clustering technique that generates unique χ^{2} equations to describe the associations among studied variables (10)(12). These equations are ranked according to the amount of association described, and the projections of the first 2 equations, plotted in 2 dimensions, usually describe >50% of the associations. The dimensions result from the analysis of the active elements (native clinical samples and routine methods) and represent the total variance accounted for within the plot. Values for the inactive elements (PRM and its dilutions) are superimposed on the plots of the active elements to interpret their positions relative to those of the active elements. If the inactive elements plot near the active elements, then the material is considered to have properties similar to the native clinical samples and thus be commutable.

We assumed that the values of the first 2 dimensions were bivariate normal, and as objective delineation of areas of commutability we computed 95% tolerance ellipses around the native clinical samples and around the assay methods. For native clinical samples, the ellipse around the native clinical samples should encompass approximately 95% of the dimension 1 and 2 values generated by the routine methods under consideration. Thus, if the PRM and its dilutions behave like native clinical samples they should fall inside this ellipse with probability 0.95. Similarly, the ellipse around the routine methods should encompass approximately 95% of the dimension 1 and 2 values associated with the routine methods under consideration. Thus, if a particular routine method behaves like other methods, it should fall inside this ellipse with probability 0.95. Several iterations of the analysis were performed by removing the analytical methods for which the PRM was suspected to be noncommutable. Analysis of the CA data provides information about the strength of the relationships between the routine methods as well as interpretations about how the inactive elements correlate to the primary dimensions.

### harmonization

We used the routine methods for which PRM and its dilutions were commutable to evaluate the ability of PRM to harmonize results. We obtained a regression equation for each run for each method by using the measured results for the PRM and its dilutions as the dependent variable (*y*) and the target values based on dilution factors as the independent variable (*x*). The assigned value of CRM470 is 39.2 mg/L. Thus, the target values for PRM and its 25%, 50%, and 75% dilutions were 3.92, 2.94, 1.96, and 0.98 mg/L, respectively. We then used that regression relationship and the native clinical sample results from the original routine method as *y*-axis values to calculate the harmonized results (*x*-axis values). Finally, we used a nested ANOVA to estimate the total CV (which included within- and among-run analytic and among-assay variances) and among-assay CV for the recalibrated (harmonized) results.

## Results

The participant using method 17 did not perform dilutions of the PRM correctly; consequently the data from this participant could not be evaluated. Method 24 had among-run variances for the 75% and 50% dilutions of the PRM that were 62 and 14 times higher, respectively, than the average variance of the other methods for these same materials. Thus, including method 24 in the among-run imprecision evaluation resulted in such a large average among-run variance that no methods could be excluded, including method 24. Therefore we chose to exclude method 24. No other methods failed the among-run imprecision evaluation after methods 17 and 24 were excluded. We found that 7 methods (2, 3, 6, 20, 21, 22, and 23) had unacceptable within-run imprecision and eliminated them from the analysis. Thus data from 15 methods were analyzed further.

The normalized residuals for PRM obtained with the WLS/PR approach are shown in Table 1⇓ . Of the 210 regressions performed, 136 were linear and 74 were quadratic. Normalized residuals for the 25%, 50%, and 75% dilutions of PRM for the WLS/PR approach can be found in the online Data Supplement. For PRM, there were 3 cases for which the normalized residual was ≥2; for the 25%, 50%, and 75% dilutions, there were 7, 16, and 14 cases for which the normalized residual was ≥2, respectively. All of these cases involved method 16. Using the WLS/PR approach, we would conclude that PRM was not commutable with native clinical samples for method 16.

The results of CA are shown in Figs. 1⇓ and 2⇓ . In Fig. 1⇓ , which shows the projections of the 15 methods, dimensions 1 and 2 account for 56.3% and 27.8%, respectively, of associations between the active elements; the ellipses define the 95% tolerance limits of either specimens or analytical methods. The projections for the PRM and its dilutions are on the edge of the tolerance ellipse formed by the native clinical samples and showed a strong correlation (between 0.840 and 0.849 for PRM and the 3 dilutions) with the plane formed by these 2 dimensions. Method 16 was found to be outside the tolerance ellipse formed by the methods (Fig. 1⇓ ); Fig. 2⇓ shows the impact of removing method 16 from the analysis. In Fig. 2⇓ , dimensions 1 and 2 account for 60.8% and 16.1%, respectively, of the associations among the active elements and the projections of the PRM and dilutions moved more into the center of the tolerance ellipse formed by the native clinical sample projections. This result suggests that PRM is not commutable with native clinical samples for method 16. An in-depth analysis of the CA results, with and without method 16, is provided in the online Data Supplement.

The results of our evaluation of the ability of the PRM to harmonize results of the methods for which PRM was commutable with native clinical samples are shown in Figs. 3⇓ and 4⇓ , in which we compare the results of the ANOVA of the data before and after harmonization. Fig. 3⇓ shows the total CV (which includes within- and among-run analytic and among-assay variances) for each sample, and Fig. 4⇓ shows only the among-assay CV. The among-assay variation accounts for most of the total variation (92% of total before harmonization and 74% of total after harmonization). Harmonization using PRM generally reduced the total and among-assay CV, although the improvement was modest. We also calculated the difference between the original results and the predicted results that would be obtained after harmonization with PRM (a plot of these is shown in the online Data Supplement). These differences estimate the amount of change necessary for each method to achieve optimum harmonization. The actual plotted differences indicate that for some assays the results tend to be higher after harmonization and for other assays the results tend to be lower after harmonization. We also found that 2 of the methods (1 and 5) had differences that appeared extreme compared to the other 13 methods. When we omitted these assays from the ANOVA, we found little change in variability between the original and harmonized results (not shown).

## Discussion

Evaluation of commutability of a reference material with native clinical samples for a large number of routine methods is a formidable task. A panel of native clinical samples needs to be available with concentrations of the measurand that span the concentrations of the candidate reference materials being investigated. For hsCRP methods, the task was further complicated by the absence of a reference measurement procedure that required results for the native clinical panel and the PRM from each routine method to be compared to results for all other routine methods. Some investigators have selected one routine method as a comparison method, and compared results for the other methods only to that comparison method. That approach is incorrect, however, because it does not evaluate commutability relationships between all methods and can cause incorrect conclusions about the suitability of a reference material intended to be used as a common calibrator for all the routine methods.

The 2 approaches that we used in this investigation gave the same conclusions regarding commutability, i.e., that PRM was not commutable for method 16.

In the WLS/PR approach, normalized residuals were evaluated based on an acceptance criterion of ≤2 (consistent with a 5% statistical significance criterion) indicating equivalent relationships between methods for the native clinical samples and the PRM. WLS/PR accounts for nonconstant variance across the measurement range and does not require linear responses. Using normalized residuals calculated from the WLS/PR, we found that all method combinations had commutable results between PRM and native clinical samples, except many combinations that included method 16. Method 16 was the only method that was common to all of the method pairs that had normalized residuals ≥2.

CA is not limited by linearity of the examined methods, requires no reference measurement procedure, and is a multivariate technique that evaluates all combinations in a single statistical procedure (11). It is a useful descriptive technique that provides a “snapshot” of all the data in graphical plots. CA can also provide information about the relationships within the elements (e.g., native clinical sample group and method group). A disadvantage of CA is that it is often difficult to define unambiguous acceptance criteria. CA is best suited for qualitative assessment of conformity of candidate PRMs to the relationships among methods observed for native clinical samples. Fig. 1⇑ , which shows the projections for PRM and its dilutions on the edge of the cluster of native clinical results, suggests some lack of commutability despite projections being within the rather large 95% tolerance ellipse. In agreement with the normalized residuals procedure, CA showed method 16 to be different from the projections for the other methods and outside the tolerance ellipse. When results from method 16 were removed from CA, the projections for the PRM and its dilutions moved toward the cluster of native clinical samples but remained near the edge of the cluster, suggesting less than ideal commutability, particularly if the objective is to use the PRM as a common calibrator for routine methods.

Although one native clinical sample was projected just outside the tolerance ellipse formed by the group of native clinical samples (see Fig. 2⇑ ), we did not eliminate any native clinical samples from the analysis. It is important that commutability evaluations include as many native clinical samples as possible to ensure a robust evaluation of the reference material over the many types of samples encountered in clinical practice. Other than lipid measurements, we do not have additional information on the characteristics of the samples used in this study. To the best of our knowledge, the samples used were collected from ambulatory disease-free individuals. A potential limitation of this study is a lack of samples from diseased individuals (4).

The harmonization study provided recalculation of results from each method based on the responses of PRM and its dilutions used as common calibrators and showed a modest improvement in among-assay variation for the 14 assays for which PRM was commutable. A modest improvement is an expected outcome because according to the participating manufacturers all the methods in this evaluation were traceable to the CRM470 that was diluted to prepare the PRM. Modest improvement in harmonization on normalization of results to the common PRM suggested there may have been differences in traceability protocols and calibrator lot-to-lot specifications among manufacturers.

Concordant results for commutability of PRM based on diluted CRM470 with a panel of native clinical samples were obtained by use of the quantitative normalized residuals procedure and the qualitative CA procedure for 14 of 15 hsCRP methods examined. PRM and its dilutions, when used as common calibrators for these 14 methods, produced an improvement in harmonization among results for the native clinical samples. These results support the use of diluted CRM470 as the basis for calibration traceability for these hsCRP methods. The PRM and its dilutions were not commutable with native clinical samples for one method. Manufacturers of hsCRP methods need to carefully validate commutability of diluted CRM470 with native clinical samples before using the PRM as the basis for calibration traceability.

Unfortunately, poor precision among replicate results for the native clinical samples necessitated elimination of 8 methods from the commutability evaluation. Consequently, no information is available for assessment of the suitability of PRM for calibration of these methods or of whether their results would have changed the commutability conclusions for the other methods. However, the suitability of these 8 methods for clinical applications may be compromised by the poor precision.

## Acknowledgments

**Author Contributions:** *All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article*.

**Authors’ Disclosures of Potential Conflicts of Interest:** *Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest:*

**Employment or Leadership:** G. Miller, AACC and CLSI.

**Consultant or Advisory Role:** F.A. Dati, Myconostica, Manchester, UK, and General Biologicals, Taiwan.

**Stock Ownership:** None declared.

**Honoraria:** None declared.

**Research Funding:** None declared.

**Expert Testimony:** None declared.

**Other Remuneration**: F.A. Dati, DiaSys, Shanghai, China.

**Role of Sponsor:** The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, preparation or approval of manuscript.

**Acknowledgments:** The authors thank the following people and companies who participated in this study: Chester Swirski (Bayer Diagnostics, Tarrytown, NY), Daniel B. Seymour (Beckman Coulter, Brea, CA), Josep Serra (Biokit SA, Barcelona, Spain), Neal Bellet (Cholestech, Hayward, CA), Lone Juhl (DakoCytomation, Glostrup, Denmark), Manfred Lammers (Dade Behring Marburg, Marburg, Germany), Kazunori Saito (Daiichi Pure Chemicals, Ibaraki, Japan), Yousuke Meguro (Denka Seiken, Niigata, Japan), Erwin Metzmann (DiaSys Diagnostic Systems, Holzheim, Germany), Hideyuki Hayashi (Eiken Chemical, Tochigi, Japan), Katherine Ekholm (Genzyme Diagnostics, Cambridge, MA), Victor Chiou (Good Biotech, Taichung City, Taiwan, R.O.C.), Et-suko Sato (Iatron Laboratories, Chiba, Japan), Virpi Leppänen (Innotrac Diagnostics Oy, Turku, Finland), Ikuo Terunuma (Nissui Pharmaceutical, Ibaraki, Japan), Ryo Kojima (Nitto Boseki, Koriyama, Japan), Matthew D. McCusker (Olympus Diagnostica Ireland, County Clare, Ireland), Anja Taulaniemi (Orion Diagnostica, Espoo, Finland), Ron Jamison (Pointe Scientific, Canton, Michigan), Günter Trefz (Roche Diagnostics, Penzberg, Germany), and David Li (Wako Diagnostics, Richmond, VA).

## Footnotes

1 Normalized residuals ≥2 are in bold.

↵1 Nonstandard abbreviations: CRM, certified reference material; hsCRP, high-sensitivity C-reactive protein; PRM, proposed reference material; WLS/PR, weighted least squares polynomial regression; CA, correspondence analysis.

- © 2009 The American Association for Clinical Chemistry