## Abstract

**BACKGROUND:** Measurement standardization of the catalytic concentration of α-amylase in serum is based on 3 pillars: the primary reference measurement procedure (PRMP), reference laboratories, and suitable certified reference materials (CRMs). Commutability is a prerequisite when using a CRM for calibration and trueness control of routine methods or for value transfer from the PRMP to end-user calibrators of routine methods through a calibration hierarchy.

**METHODS:** We performed a commutability study with 30 serum pools and 5 candidate reference materials (RMs) for pancreatic α-amylase using an automated version of the PRMP and 5 different routine methods. Four candidate RMs had an artificial matrix, each with a different composition, and 1 candidate RM was based on human serum. Data were analyzed according to a linear regression analysis with prediction interval as described in the Clinical and Laboratory Standards Institute guideline EP30-A and a difference in bias analysis as described in the recommendations of the IFCC Working Group on Commutability.

**RESULTS:** The commutability profile of the 4 candidate RMs with an artificial matrix was variable. Only 1 candidate RM, with human serum albumin in the matrix, showed a good profile like that of the candidate RM based on serum. The comparison of both commutability assessment approaches indicated some differences because of inconclusive results for the difference in bias approach, suggesting a large uncertainty on the commutability assessment.

**CONCLUSIONS:** A CRM for pancreatic amylase in an artificial matrix can be commutable for routine methods using the same substrate as the PRMP, but the matrix composition is crucial.

The enzyme α-amylase catalyzes the hydrolysis of large α-linked polysaccharides, such as starch and glycogen, yielding glucose and maltose. The enzyme is mainly secreted by the exocrine pancreas and the salivary glands, with each producing a specific isoform. The catalytic concentration of (pancreatic) α-amylase in serum is frequently measured in medical laboratories for detection and diagnosis of acute pancreatitis (1) and pancreatic trauma (2). To contribute to a clinical decision, the results obtained should be accurate and equivalent over time across laboratories and measurement procedures. However, external quality assessment (EQA)^{10} schemes and international studies have shown a wide interlaboratory variation for the catalytic concentration of α-amylase in serum samples (3, 4).

Accurate and equivalent results independent of reagents kits, instruments, or location and time of testing can be achieved by measurement standardization based on the concept of metrological traceability, which requires an established calibration hierarchy beginning with the definition of the measurand and ending with the measurement results on patient samples. A calibration hierarchy is a sequential set of measurement procedures and fit-for-purpose reference materials (RMs) that enables value transfer from the highest order reference to the patient sample. In vitro diagnostic (IVD) manufacturers should use the calibration hierarchy for the value assignment of end-user calibrators used by medical laboratories to calibrate routine methods (5). Commutability, defined as “the equivalence of the mathematical relationships among the results of different measurement procedures for an RM and for representative samples of the type intended to be measured” (6), is an essential requirement for a fit-for-purpose RM.

Results of catalytic activity measurements depend entirely on the experimental conditions (7, 8). The primary reference measurement procedure (PRMP), which consists of a well-described and extensively evaluated standard operating procedure, therefore, occupies the highest position in the traceability chain (9). The PRMP for the catalytic concentration of α-amylase in serum listed by the Joint Committee for Traceability in Laboratory Medicine is based on 2 enzymatic reactions and spectrophotometry (10). The PRMP is mainly applied by specialized reference laboratories to assign quantitative traceable values to panels of native human sera used by the IVD manufacturers to calibrate their routine measurement systems. To perform the PRMP in a highly controlled manner, reference laboratories use the certified reference material (CRM) IRMM/IFCC-456 as a trueness control. IRMM/IFCC-456, consisting of human purified pancreatic amylase in an artificial matrix, was certified by the European Commission's Joint Research Centre (EC-JRC) in 2001. The CRM cannot be used in the calibration hierarchy of routine measurement systems because its commutability was not extensively studied (11).

In the upcoming years the current IRMM/IFCC-456 batch will become exhausted, and a replacement is needed. The new CRM for pancreatic α-amylase should be a well-characterized material without significant contaminations from salivary α-amylase to be useful for both methods measuring total α-amylase and methods measuring only pancreatic α-amylase. The natural presence of salivary α-amylase in human serum underscores the need for an artificial matrix. A proof of commutability of this new CRM may allow an extension of the intended use to calibration and trueness control of the relevant routine measurement systems.

The first aim of this study was to investigate whether a CRM with an artificial matrix could be commutable for routine methods. Four candidate RMs in an artificial matrix, each with a different composition, were selected, and 1 candidate RM based on human serum was added as a control. The second aim was to compare the outcome of 2 commutability assessment approaches: 1 according to a linear regression analysis with prediction interval as described in the Clinical and Laboratory Standards Institute (CLSI) guideline EP30-A (6) and 1 according to a difference in bias analysis as described in the recommendations of the IFCC Working Group (WG) on Commutability (12).

## Materials and Methods

### SERUM POOLS

The 30 serum pools consisted of serum originating from multiple donors (approximately 10 per pool) with similar catalytic concentrations of α-amylase. Turbid, icteric, and hemolyzed donor samples were excluded. Exposure to freeze–thaw cycles was limited: 1 cycle after serum collection and 1 cycle after pooling of the sera. The catalytic concentration of α-amylase in the serum pools ranged from 54 to 727 U/L as measured with an automated version of the PRMP.

Frozen serum pools have been used in EQA schemes for the catalytic concentration of α-amylase (3) and shown to behave like single-donation sera for several routine methods (13). Therefore, a separate commutability assessment comparing the serum pools of this study with single-donation sera was not performed.

### CANDIDATE RM

Five candidate RMs for pancreatic α-amylase were tested, and their specifications are provided in Table 1. Various factors potentially affect the commutability of the selected RMs, such as the composition and pH of the matrix, the source of the enzyme (purified vs recombinant), and processing procedures resulting in physical changes (liquid frozen vs lyophilization) (9).

The candidate RMs A, B, D, and E were provided at catalytic concentrations of 250 to 350 U/L. The lyophilized materials (A, B, and E) were reconstituted by the study organizer (EC-JRC), and the weight of the added ultrapure water was recorded. The vials were stored at −70 °C directly after reconstitution. Material C was originally provided as a highly concentrated solution (1200 U/mL) and diluted by the study organizer to obtain a concentration within the same range as the other candidate RMs. The dilution buffer was prepared according to the manufacturer's recommendations, and the diluted solution was distributed into glass vials and stored at −70 °C.

### METHODS

The serum pools and the candidate RMs were measured with 5 methods frequently used in routine clinical laboratories (Table 2). These routine methods measure the catalytic concentration of total α-amylase (both pancreatic and salivary), and their results are traceable to the PRMP according to the information provided by their manufacturer. Four of the tested methods use the same substrate and enzymatic reactions as the PRMP, whereas the method from Abbott uses a different substrate. Each routine method was applied by 1 participating laboratory.

A reference laboratory with accreditation according to ISO/IEC 17025 and ISO 15195 performed the reference measurement procedure (RMP) on the serum pools and the candidate RMs. The RMP was an automated version of the manual PRMP (10, 14) with identical measurement principle and conditions. A more detailed description is provided in the Methods section that accompanies the online version of this article at http://www.clinchem.org/content/vol64/issue8. This automated RMP has a higher throughput allowing all measurements in a single run. Calibration of the RMP was performed with 5 calibrators consisting of human serum pools. Their catalytic activity concentrations have been certified with the manual PRMP. The 3 calibrators with the nearest concentration to each measurement result were used for the calculation of the final value.

### EXPERIMENTAL DESIGN

All study materials were shipped with dry ice to the laboratories and stored at −80 °C on arrival. Study materials were thawed and carefully mixed at room temperature just before the measurements. Each laboratory measured the 30 serum pools (n = 30) in 3 replicates (n = 3) and the 5 candidate RMs in 6 replicates (Eq. 1, *p* = 6). All measurements were performed in a single run according to a predefined order. The measurement series consisted of 3 parts. The first part started with the first replicate measurement of each candidate RM, followed by the first replicate of each serum pool, and ended with the second replicate of each candidate RM. In the second part, the measurements were repeated in the reverse order, and in the third part, the order was the same as in the first part.

### DATA ANALYSIS

All laboratories reported their results on standardized reporting sheets, and the data analysis was performed by the study organizer. The measurement results obtained for the candidate RMs A, B, and E were corrected for the weight of the water added during reconstitution. Statistical analyses were performed with Microsoft Excel and the statistical add-in software Analyse-it^{®}, version 4.65 (Analyse-it Software).

### METHOD COMPARISON

Each routine method was compared with the RMP based on the results of the serum pools using the Passing–Bablok regression analysis, and the Spearman rank correlation coefficient was also calculated.

### EVALUATION OF MEASUREMENT PRECISION

For each method, both the SD among the concentrations measured for the replicates of each serum pool and candidate RMs, and the SD among the ln-transformed concentrations were calculated. Both SD types were plotted vs the mean concentration measured for the relevant sample (see Fig. 1 in the online Data Supplement). These precision plots were used to evaluate trends across the tested concentration range and to identify outliers.

### COMMUTABILITY

The commutability assessment was done according to 2 different approaches:

Linear regression analysis with prediction interval as described in the CLSI document EP30-A (6), referred to as the CLSI approach.

Difference in bias analysis as described in the recommendations of the IFCC WG on Commutability (12), referred to as the IFCC approach.

The analyses described below were performed for each comparison of a routine method with the RMP separately.

For the assessment according to the CLSI approach, the ln-transformed mean concentration of each serum pool (*SP _{i}*) measured with the routine method,

*ln*(

*c̅o̅n̅c̅*), was plotted vs the ln-transformed mean concentration obtained with the RMP,

_{SPi,rout}*ln*(

*c̅o̅n̅c̅*). The nonweighted Deming regression procedure was used to fit the regression line. The 95% prediction interval around this regression line was calculated using formulas described in CLSI EP30-A Appendix C (6). The ln-transformed mean concentrations of the individual candidate RM,

_{SPi,RMP}*ln*(

*c̅o̅n̅c̅*) and

_{RMj,rout}*ln*(

*c̅o̅n̅c̅*), were also plotted, and candidate RM

_{RMj,RMP}_{j}was considered commutable if its data point was inside the 95% prediction interval.

For the commutability assessment according to the IFCC approach, 2 bias plots were designed:

The bias of each serum pool (

*B*), calculated as the difference between the mean concentration obtained with the routine method and the mean concentration obtained with the RMP (i.e.,_{c̅o̅n̅c̅SPi}*c̅o̅n̅c̅*−_{SPi,rout}*c̅o̅n̅c̅*), was plotted against_{SPi,RMP}*c̅o̅n̅c̅*._{SPi,RMP}The bias of each serum pool (

*B*), calculated as the difference between the ln-transformed mean concentration obtained with the routine method and the ln-transformed mean concentration obtained with RMP [i.e.,_{ln(c̅o̅n̅c̅SPi)}*ln*(*c̅o̅n̅c̅*) −_{SPi,rout}*ln*(*c̅o̅n̅c̅*)], was plotted against_{SPi,RMP}*c̅o̅n̅c̅*._{SPi,RMP}

Comparison of both bias plots showed that the scatter width had the smallest dependence on the concentration for the bias calculated from the ln-transformed concentrations (see Fig. 2 in the online Data Supplement). Therefore, the commutability assessment was based on the ln-transformed concentrations. The calculations used in this study were slightly different from the calculations described previously (12) because estimation of position effects (meaning an effect of the measurement order on the measurement result) was not possible with this study design.

From inspection of the bias plots, we concluded that the mean bias of all serum pools, *B̅ _{ln(c̅o̅n̅c̅SP)}*, could be used as an estimate of the bias for the clinical samples in general. The associated uncertainty,

*u*(

*B̅*), was calculated as the SD of the

_{ln(c̅o̅n̅c̅SP)}*B*values divided by the square root of the number of serum pools (n = 30).

_{ln(c̅o̅n̅c̅SPi)}The bias of candidate RM_{j}, *B _{ln(c̅o̅n̅c̅RMj)}*, was estimated as

*ln*(

*c̅o̅n̅c̅*) −

_{RMj,rout}*ln*(

*c̅o̅n̅c̅*). For estimating the associated uncertainty,

_{RMj,RMP}*u*(

*B*

_{ln(c̅o̅n̅c̅RM)}), the SDs between the replicate measurements of the 5 candidate RMs were pooled by calculating the mean variance for the routine method,

*S̅D̅*

^{2}(

*ln*(

*conc*)), and for the RMP,

_{RM,rout}*S̅D̅*

^{2}(

*ln*(

*conc*)). Pooling the SDs of the candidate RMs assumed that these SDs were equal, which was supported by visual inspection of the precision plots. Eq. 1 was used to calculate

_{RM,RMP}*u*(

*B*

_{ln(c̅o̅n̅c̅RM)}) with

*p*as the number of replicate measurements of each candidate RM. (1) The difference in bias, D

_{RMj}, between candidate RM

_{j}and the serum pools was estimated with Eq. 2. (2) The associated expanded uncertainty U(D

_{RM}) was calculated according to Eq. 3. (3) The coverage factor 1.9 was used to obtain at least 90% coverage. To assess the commutability of an individual candidate RM, the

*D*and

_{RMj}*U*(

*D*) were compared with the commutability criterion C. The criterion was set at 3.7% (corresponding to 0.037 on the ln scale) based on the intended use of the new CRM. The rationale for setting this criterion is covered in later paragraphs.

_{RM}In the commutability assessment according to the IFCC approach, 3 outcomes are possible (12):

The uncertainty interval

*D*±_{RMj}*U*(*D*) falls completely within 0 ±_{RM}*C*→ RM_{j}is commutable.The uncertainty interval

*D*±_{RMj}*U*(*D*) falls completely outside 0 ±_{RM}*C*→ RM_{j}is noncommutable.The uncertainty interval

*D*±_{RMj}*U*(*D*) partially overlaps with 0 ±_{RM}*C*→ inconclusive result.

The recommendations from the IFCC WG on Commutability also include a statistical analysis to detect specific differences in the intermethod behavior of the serum pools. A description of the analysis and the results are provided in the Material file included with the online Data Supplement. However, it should be considered that these results cannot be used to estimate the sample-specific effects present in real clinical samples because serum pooling reduces the influence of individual sera with sample-specific effects (15).

## Results

### METHOD COMPARISON

The measurement results from the routine methods showed a good linear correlation with the results from the RMP for the serum pools with Spearman rank correlation coefficients ≥0.995. The slope of the Passing–Bablok regression lines varied from 0.993 to 1.067, and the intercept ranged from −1.018 to −3.211 (see Fig. 3 in the online Data Supplement). The precision and bias plots (see Figs. 1 and 2 in the online Data Supplement) showed that all routine methods fulfill the desirable goal for the imprecision (i.e., 4.4%) and bias (i.e., 7.4%) of catalytic concentration measurements of α-amylase in serum as set in the Biologic Variation database (16).

### MEASUREMENT PRECISION

The precision plots based on the measured concentration values (see Fig. 1A in the online Data Supplement) indicated that the SD was approximately proportional to the mean concentration value for the RMP and the routine methods from Roche and Siemens. The plots based on the ln-transformed concentrations (see Fig. 1B in the online Data Supplement) showed more stable SDs across the tested concentration range.

Visual inspection of the precision plots identified some serum pools with a higher SD compared with others. Technical errors could not explain these outliers, and in all cases, the relative SD was <4%. Their impact on the further analyses was expected to be small, and these measurement results were retained.

For the Beckman method, the SD of the 5 candidate RMs was clearly higher than that of the serum pools. Detailed evaluation showed that for all candidate RMs, the results of the first 2 replicate measurements were about 5% lower than the results of the last 4 replicate measurements. This trend was not observed in the serum pools and could not be explained by an analytical order effect or any other technical reason. Therefore, all measurement results were retained.

### COMMUTABILITY ASSESSMENT ACCORDING TO THE CLSI APPROACH

The results of the Deming regression analysis on the ln-transformed concentrations of serum pools and the comparison of the ln-transformed concentrations of the candidate RMs with the 95% prediction interval are shown in Fig. 1. Table 3 summarizes the results per candidate RMs and per routine method. None of the candidate RMs was commutable for the comparison of the Abbott method vs the RMP. Candidate RM A was commutable for the 4 remaining comparisons of routine methods vs the RMP, and RM E was commutable for 3 comparisons. The candidate RMs B, C, and D showed a considerable lack of commutability.

### COMMUTABILITY ASSESSMENT ACCORDING TO THE IFCC APPROACH

The commutability assessment according to the IFCC approach was based on bias plots of the ln-transformed concentrations of the serum pools and the candidate RMs (Fig. 2). The commutability criterion was set at 3.7%. The individual results per candidate RM and each routine method are summarized in Table 3. The outcome of the commutability assessment was conclusive (commutable or noncommutable) in 17 of the 25 combinations of a routine method with a candidate RM, and these conclusive results agreed with the outcome of the commutability assessment according to the CLSI approach.

For 8 combinations of a routine method with a candidate RM, the commutability assessment according to the IFCC approach gave an inconclusive result. Specifically, for the Beckman method, no conclusions could be drawn about the commutability of 4 candidate RMs. The expanded uncertainty associated with the estimated difference in bias was too large.

## Discussion

This study assessed the commutability of 5 candidate RMs for pancreatic amylase with 2 different statistical approaches. The linear regression analysis with prediction interval as described in CLSI EP30-A (6) has been the gold standard in commutability assessment for several years despite having several drawbacks. A recent publication from the IFCC WG on Commutability (12) recommends a difference in bias analysis to overcome these drawbacks. First, the CLSI approach assumes equal numbers of replicate measurements and comparable variances among the replicates for the clinical samples and the RM. If the requirement is not fulfilled, the outcome of CLSI approach might be incorrect. The IFCC approach includes the number of replicate measurements and the variance among the replicates of the RM in the uncertainty of the commutability assessment, and the outcome will be inconclusive if the uncertainty becomes too large (as illustrated here in the commutability assessments for the Beckman method). Second, the CLSI approach does not quantify how closely the RM agrees with the estimated relationship of the clinical samples. An RM with a data point close to the prediction interval limits will be considered as commutable as an RM with a data point in the middle of the prediction interval. The IFCC approach quantifies the closeness of agreement and the associated uncertainty. An RM with a difference in bias close to the commutability criterion will have an inconclusive commutability outcome (as observed here for commutability assessments of candidate RMs B and D for the methods from Roche and Biosystems). Third, the width of the prediction interval of the CLSI approach is influenced by the presence of large sample-specific effects, and the commutability criterion might vary between different method comparisons. The IFCC approach uses the same commutability criterion based on clinical application requirements for each comparison. In this study, the prediction intervals of the different method comparisons are comparable, but this might be a consequence of the use of serum pools, which may mask large sample-specific effects.

The commutability criterion used in the IFCC approach should depend on clinical application requirements and the intended use of the RM. For an RM intended to be used as a trueness control or an external quality control for routine methods, the criterion should be a fraction of the acceptable bias of a routine method (15). The Biologic Variation database sets the desirable goal for the bias of the catalytic concentration of α-amylase in serum at 7.4%, based on the intrasubject and intersubject biologic variation (16). In this study, the commutability criterion was set at 3.7% (i.e., 50% of the desirable goal for the bias), considering the good precision of the routine methods and the fact that the certified value of the new CRM would be assigned by the PRMP with a low associated uncertainty. A CRM intended to be used in the calibration hierarchy of routine measurement systems would require a stricter commutability criterion (15) and smaller uncertainties on the commutability assessments than obtained in this study. A new commutability study with more replicate measurements for the candidate RMs might solve this problem because the uncertainty associated with the bias of the candidate RMs, *u*(*B*_{ln(c̅o̅n̅c̅RM)}), is the main contributor in this study.

None of the tested candidate RMs were commutable for the comparison of the RMP with the Abbott method. The Abbott method uses a different substrate and, therefore, has a different measurand than the PRMP (8). A change in measurand throughout a calibration hierarchy frequently occurs within the field of IVDs (17) and is acceptable if the relationship between the different measurands is understood and stable at all levels of the hierarchy. The traceability of the Abbott method to the PRMP was established by determining a calibration factor from the analysis of serum samples by many different laboratories (18). This study showed a good correlation between the results of both methods for the serum pools. However, a more detailed analysis indicated the presence of small, but significant, serum pool-specific effects (see Material file in the online Data Supplement). The sample-specific effects present in real clinical samples are likely to be larger because pooling reduces the influence of individual sera with sample-specific effects (15). A previous commutability study comparing routine methods with different substrates showed sample-specific effects assumed to be caused by variations in pancreatic/salivary α-amylase ratio or by unknown interferents (19). This difference in measurand between the Abbott method and the PRMP might make it impossible to find a commutable CRM containing only pancreatic α-amylase in an artificial matrix.

A review of the results for only the routine methods with the same substrate as the PRMP shows that 1 of the candidate RMs with an artificial matrix has a better commutability profile than the other RMs, and this profile is like that of the candidate RM based on serum. The matrix composition is seemingly the major factor determining the commutability of the candidate RMs. The most apparent difference between the commutable and the noncommutable candidate RMs with an artificial matrix is the presence of human serum albumin. The pH of the buffer (7.4 vs 7.5) might also play a role. The effect of lyophilization is probably negligible, as both the commutable candidate RM in an artificial matrix and the candidate RM in serum were lyophilized during their processing. No conclusions can be drawn about the importance of the origin of the α-amylase (recombinant vs purified) because none of the tested candidate RMs contained recombinant α-amylase in a commutable matrix.

The design of this commutability study was influenced by several practical constraints leading to the following limitations. First, frozen serum pools were used instead of fresh single donation sera to obtain sufficiently large volumes that could be aliquoted and distributed to the 6 participating laboratories. During the preparation of the pools, the sera were exposed to 2 freeze–thaw cycles. The frozen serum pools used in this study were not tested for their commutability, but frozen serum pools have been used in several EQA schemes, and it has been shown by others that these pools behave like single-donation sera for several routine methods (3, 13). Second, the lyophilized candidate RMs were reconstituted centrally to minimize reconstitution errors. The candidate RMs were frozen at −70 °C directly after reconstitution, and the effect of this 1 freeze–thaw cycle on their commutability was assumed to be negligible. Third, all measurements of 1 method were performed in a single run under specific measuring conditions, meaning 1 lot of reagents, 1 instrument, and 1 analyst. The routine methods were performed according to the manufacturer's specifications, and the RMP measurements were performed by an expert reference laboratory in a highly controlled manner. These measuring conditions were assumed to be representative of those encountered in routine medical and reference laboratories; therefore, the commutability conclusions of the candidate RMs could be generalized to future results obtained with the same methods. Substantial changes in the measuring conditions (e.g., changes of the reagent formulation) might, however, make these conclusions invalid (15).

This study shows that a CRM for pancreatic α-amylase in an artificial matrix can be commutable for routine methods with the same substrate as the PRMP, but the matrix composition plays a crucial role.

## Footnotes

↵10 Nonstandard abbreviations:

- EQA,
- external quality assessment;
- RM,
- reference material;
- IVD,
- in vitro diagnostic;
- PRMP,
- primary reference measurement procedure;
- CRM,
- certified reference material;
- EC-JRC,
- European Commission's Joint Research Centre;
- CLSI,
- Clinical and Laboratory Standards Institute;
- WG,
- Working Group;
- RMP,
- reference measurement procedure.

**Author Contributions:***All authors confirmed they have contributed to the intellectual content of this paper and have met the following 4 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved.*B. Toussaint, administrative support, provision of study material or patients; M. Orth, provision of study material or patients; G. Nilsson, statistical analysis; F. Ceriotti, provision of study material or patients.

**Authors' Disclosures or Potential Conflicts of Interest:***No authors declared any potential conflicts of interest*.**Role of Sponsor:**No sponsor was declared.

- Received for publication March 13, 2018.
- Accepted for publication May 23, 2018.

- © 2018 American Association for Clinical Chemistry

## Podcast