## Abstract

A quality-control chart based on exponentially weighted moving averages (EWMA) has, in the past few years, become a popular tool for controlling inaccuracy in industrial quality control. In this paper, I explain the principles of this technique, present some numerical examples, and by computer simulation compare EWMA with other control charts currently used in clinical chemistry. The EWMA chart offers a flexible instrument for visualizing imprecision and inaccuracy and is a good alternative to other charts for detecting inaccuracy, especially where small shifts are of interest. Detection of imprecision with EWMA charts, however, requires special modification.

- indexing terms: statistics
- exponentially weighted moving average
- Shewhart chart, Westgard algorithm compared

A control chart based on the exponentially weighted moving average (EWMA) was first described by Roberts in 1959 (1).1 Whereas the Shewhart chart takes only the immediate control into consideration for statistical testing, the EWMA chart uses the previous values also. In brief, after multiplication by a weighting factor *w*, the current measurement is added to the sum of all former measurements, which is weighted with (1 − *w*). Thus, at each time *t* (*t* = 1,2,… ), the test statistic *z*_{t} [= *w ¯x*_{t} + (1 − *w*)*z*_{t−1}], with *w* ∈]0;1], can be obtained.^{2} The computed *z*_{t} values are displayed on a control chart over the course of time. Because the mean of the *n* control observations per run is used, this control chart is called the EWMA- chart. Another way of expressing this is: with the first value *z*_{0} in this sum generally being set to the mean of former observations. This smoothing process means that the contribution of a value to the test statistic decays exponentially by time or by the number of new observations, with the speed of decay being adjustable by the weighting factor.

The limits for warning and action of the EWMA chart differ from those of a Shewhart chart and have to be computed separately, as shown later. The EWMA control chart differs from the similar Cusum chart by using the additional weighting factor, which allows the adjustment of shift sensitivity. (Setting the EWMA weighting factor *w* = 1 yields a Shewhart control chart.) Because of this flexibility, the EWMA chart has drawn increasing attention in industrial quality-control practice during the past few years, as shown by the number of publications in the *Journal of Quality Technology* since 1989.

Exponential smoothing was first proposed for use in clinical chemistry as early as 1975 (2). Cembrowski et al. introduced a trend detection method using Trigg’s technique, which is based on the exponentially smoothed forecast error of EWMA predictors. However, this method has never played an important role in laboratories—like many other theoretically convincing concepts such as the combined Shewhart–Cusum chart (3). At the time of introduction, implementation problems, e.g., lack of the necessary computing power, may have contributed to the low attention given these concepts. Although the situation has changed, given the present almost-ubiquitous use of computer systems in laboratories, the mathematical prerequisites for the implementation of such charts still remain more complex than for Shewhart charts. Recent developments, however, have facilitated the use of the EWMA chart: In 1989, Crowder (4), using computer computations, established the following four-step procedure for implementing the EWMA chart:

Step 1. Select the run length (RL), which should be obtained under in-control conditions. This selection can easily be derived from existing charts, as shown below for the Shewhart charts. Also, many corresponding RL values for common limits in clinical chemistry are listed later in Table 2⇓ .

Step 2. Select the desired shift-sensitivity of the chart: Based on this shift, the weighting factor *w* is determined from a graph such as that provided in Fig. 1⇓ .

Step 3. Determine the factor for the control limits from the derived *w* (Fig. 2⇓ ).

Step 4. Perform a sensitivity analysis to obtain the best overall performance. This last step can be done with a simulation program such as the one written for this article.2

In step 2, the selection of the most sensitive shift can be derived from the relation of laboratory performance to the allowable total error, as given by CLIA for proficiency testing (5). The approach of Koch et al. (6) is useful for this determination. The critical shift can be derived from the total error (minus a bias for inaccuracy as compared with the reference method if necessary) divided by the observed SD, minus 1.65. For example, the CLIA criteria recommend a maximum total error of 0.5 mmol/L for potassium. If the laboratory has no method bias and the SD is 0.1 mmol/L, the critical size of shifts is:

[(total error − method bias)/SD] − 1.65 = [(0.5 mmol/L − 0 mmol/L)/0.1 mmol/L] − 1.65 = 3.35.

For calcium, the same calculation might look like:

(1 mg/L/0.32 mg/L) − 1.65 = 1.475.

The size of critical errors so obtained can be used directly for determining the weighting factor *w* of the EWMA chart by means of Fig. 1⇑ .

**Procedures**

### implementing an ewma chart: a practical example

The following example shows how one might set up an EWMA-*¯x* chart by the four-step procedure described above:

1) Select an in-control RL of 100. This is also the average run length (ARL) of the Westgard algorithm.

2) Define the optimum shift to be detected as 2SD. Using this, and the above ARL, gives a weighting factor *w* of 0.4 (Fig. 1⇑ ).

3) With these selected values, derive from Fig. 2⇑ a (EWMA chart) limit *q* of ∼2.5. For one control sample per batch (*n* = 1), the upper control limit (UCL) and lower control limit (LCL) are calculated as follows: where μ_{0} is the expected mean of the process and σ_{0} is the in-control SD.

4) Perform a sensitivity analysis. This analysis and fine-tuning require the use of a computer program such as that2 used to calculate the results presented in Tables 1–4.

One main difference between EWMA charts and other common clinical chemistry control methods is the use of RL values instead of α- and β-errors. To accommodate to the underlying random distribution of control values, the mean of these values is used. Therefore, with θ designating a quality attribute of the process, the ARL is defined as: ARL(θ)= E (RLθ).

When working with ARLs, it is useful to bear in mind that, for Shewhart charts, ARL_{Shewhart} = 1/α_{Shewhart}. For example, use of control-chart limits *(l)* of 2SD results in an ARL of 22, and 3SD limits give an ARL of 370 (assuming gaussian distribution). Using ARLs instead of α- and β-errors also simplifies some of the considerations in setting control limits. For example, a laboratory that analyzes one control serum after each batch of 30 patients’ samples and analyzes 300 patients’ samples a day will obtain an average of 10 control values per day. Selecting a limit of 2SD means that a false alarm would occur on average every 2–3 days (22/10 = 2.2 days). A limit of 3SD will average one false alarm about every 2 months (370/10 = 37 days). As these examples show, the use of ARLs simplifies these considerations by obviating the need for gaussian distribution tables. The ARL is a comprehensive variable that can easily be derived by simulation for every control chart. When interpreting ARL values, one must remember that an appropriately high ARL is used for the in-control situation. Under conditions of low accuracy or low precision, the ARL should be as low as possible; i.e., an ARL of 1 is optimal because error causes an alarm after the first control sample.

### trend detection by ewma procedure

An example for detecting small trends by an EWMA-*¯x* chart is illustrated in Figs. 3⇓ and 4. Laboratory data from the quality-control samples for serum pseudocholinesterase are evaluated with a Shewhart chart in Fig. 3⇓ and by EWMA-*¯x* charts (with different trend sensitivities provided by different weighting factors *w*) in Fig. 4⇓ . The control limits of the Shewhart chart were set arbitrarily, and those for the EWMA charts were derived by the described four-step procedure from the arbitrary limits: For 1_{3 s} limits with SD = 33.33 and *n* = 1 for the Shewhart chart, an ARL reaching ∼370 is needed for the EWMA charts to be comparable with the Shewhart chart.

Figure 4⇑ shows a trend detected by the EMWA chart that is not obvious in the Shewhart chart in Fig. 3⇑ . As the bottom panel of Fig. 4⇑ shows, results for the controls are too low during the first 19 days, mainly because of the values from days 5–9. During the observation period, the controls appear to return slowly to the desired interval. Note that, because of the smoothing process, the range of values and control limits decreases with an increasing smoothing (i.e., decreasing *w*).

### design of the simulation study

To show the properties and limitations of the EWMA-*¯x* chart, a comparison by computer simulation was made with the Shewhart chart, the official German guidelines “Richtlinien der Bundesärztekammer zur Qualitätssicherung in medizinischen Laboratorien” (Rili-BÄK) (7), and the full Westgard algorithm (8). Among the various solutions for computing the ARLs of EWMA charts (see, e.g., (9)(10)), I chose the Monte Carlo simulation because of its ease of implementation for every type of control chart. The simulation was written in Pascal on a DOS-compatible PC. For each ARL value, 10 000 repetitions were simulated. In comparison with other possible methods, e.g., Markov chains (10) or integral equations (9), between-method differences were <1% of the ARL values—even though each method has certain limitations because of approximation errors. Comparing the results given by the simulation for the Rili-BÄK multirules with those based on Markov chains (11) similarly revealed no major differences.

Changes in accuracy and precision were simulated by adding a constant shift and increased variability (SD) of the random numbers. Gaussian distribution was assumed for all simulations. The shifts *d* were standardized to multiples of the SD. Increased imprecision was simulated in the same way, with ε representing multiples of original SD. Thus, ε = 2 indicates that the imprecision has doubled (Tables 3⇓ and 4⇓ ).

The Westgard multirule chart was implemented with two observations (*n* = 2) as previously published (8). The R_{4s} control rule is implemented in the computer version (range of two control measurements exceeding 4SD) and the manual version (one control measurement exceeds +2SD and the other exceeds −2SD). The Rili-BÄK (7) is implemented in the Shewhart chart with the 1_{2s} or 1_{3s} rule and two additional rules: *7*_{T}, an assay is out of control if seven consecutive measurements show the same trend upwards or downwards; and 7_{¯x}, an assay is out of control if seven consecutive measurements fall on one side of the mean.

The EWMA-*¯x* chart is an EWMA chart as described above, based on the average *¯x* = (*x*_{1t} + *x*_{2t}+ … + *x*_{nt})/*n* of the values at each time *t*. The EWMA-S chart, as proposed by Mittag (12), uses the standard deviations *s*_{t} of the control samples per batch instead of the averages, *¯x*_{t}. The EWMA-S chart requires a correcting factor, because the *s*_{t} of the samples gives a biased estimate for the SD of the process. The explanation of the formulae thus derived is too complex for presentation here. However, except for the correcting factor, all formulae can be derived as shown for the EWMA-*¯x* chart in the *Appendix*. A four-step procedure for implementing the control chart can be derived in the same way (13).

When interpreting the results of the performance study, one must take into account the fact that, unlike the multirule procedures, the Shewhart-*¯x* and EWMA-*¯x* control charts are designed to detect only inaccuracy. For detecting imprecision with EWMA or Shewhart charts, the SD can be used in place of the mean as a test statistic. Examples of such charts are included in the comparison in Tables 3⇓ and 4⇓ . Obviously, for charts based on the SD, more than one observation (*n* >1) is needed at each time point. Because this is not always possible,* ¯x* charts are often used for detecting imprecision.

**Results**

### inaccuracy

Table 1⇓ shows the ARL values of EWMA-*¯x* charts for different analytical shifts. Only data for *n* = 1 and *n* = 2 are shown, given how seldom more than two quality-control samples are useful per batch (14). Table 1⇓ also provides the limits, *q*, for achieving typical in-control ARLs by EWMA charts. A weighting factor *w* = 0.2 was selected for the EWMA chart used in this report; that is, the chart is designed for an optimal detection of analytical shifts of ∼1SD (4). An example with *w* = 0.5, which means optimal detection of shifts of ∼2SD, is included to show the impact of the weighting factor.

To compare the EWMA chart in Table 1⇑ with the other control charts given in Table 2⇓ requires selecting the same in-control ARL (*d* = 0) from the two charts with the same number of samples per batch *n*. The ARL values of the two selected columns can then be compared directly for different shifts *d*. The limits used for the control charts *(l)* are found in the first row; e.g., *l* = 2 means the use of 2SD control limits, and 1_{3s} refers to 3SD limits.

For example, the first Shewhart chart in Table 2⇑ , which uses 2SD limits for one measurement per batch (*n* = 1), has an ARL(0) of 22.03. The corresponding EWMA chart in Table 1⇑ is also the first chart. The two charts thus can easily be compared by noting the corresponding ARL values for each shift *d*. In this example (column one in both tables), the EWMA chart (Table 1⇑ ) has an ARL of 4.33 for *d* = 1, whereas the Shewhart chart (Table 2⇑ ) ARL is 6.27. This means that the EWMA chart will detect a shift of *d* = 1 about two control samples earlier than the Shewhart chart will.

Comparing the ARLs of the EWMA charts in Table 1⇑ with the other chart types in Table 2⇑ as described here makes evident that, for small shifts, the EWMA charts show advantages over all other chart types. For example, for *d* = 0.5, the Shewhart chart in Table 2⇑ (column 2) has an ARL of 153.01 for *n* = 1 and 3SD limits, whereas the corresponding EWMA-*¯x* chart (column 4 of Table 1⇑ ) has an ARL of 36.16. Introducing additional rules to the Shewhart chart, as is the case with the Rili-BÄK or the Westgard model, increases efficiency and approaches the results of the EWMA chart. For big shifts (*d* >2), the run lengths of the EWMA chart are slightly longer than in the other charts because the starting value of the smoothing process lies in the middle of the in-control interval. However, the absolute differences between the charts become negligible for shifts >2SD.

### imprecision

Table 3⇓ provides some typical ARLs for EWMA-*¯x* and EWMA-S charts. Table 4⇓ summarizes the other common chart types. The two tables can be compared for imprecision as was described above for inaccuracy.

Whereas performance of the EWMA-*¯x* chart is comparable with that of the Rili-BÄK multirule, the best performance for imprecision control is provided by the Westgard algorithm. The EWMA-S chart (rightmost two columns of Table 3⇑ ), a more appropriate approach for detecting random errors, reaches performance data similar to the Westgard algorithm. Note that the weight *w* = 0.2 was arbitrarily selected for the EWMA chart; i.e., no effort was made to show an optimal performance in comparison with the Westgard chart.

Note also that, if *¯x* charts of the Shewhart- or EWMA-type are used, increasing the number of controls *(n)* from 1 to 2 barely decreases the ARL. Interestingly, for *n* = 2, the appropriate Shewhart-S chart shows no advantage over the Shewhart-*¯x* chart.

**Discussion**

As shown here, the EWMA chart is at least as good as the Westgard multirule chart with respect to inaccuracy control. The greater ability of the EWMA chart to detect small shifts, however, recommends its use instead of the others for quality control. A further advantage of the EWMA chart is that all results are shown graphically and no additional rules for improved performance are needed (unlike the Shewhart chart). Thus, the quality-control manager is presented with all of the data at once, rather than having to check several variables, some of which might inadvertently go unnoticed. Thus, although the multirule system provides valuable data for quality-control assessment, the use of an EWMA graphical control chart provides the same or superior assessment data, not just a theoretically more satisfying concept (3).

The control of imprecision by the EWMA-*¯x* chart does not compare well with the other chart types. For this, the Westgard algorithm offers a much better alternative. Use of the EWMA-S chart, however, can overcome this problem.

In general, the strength of EWMA charts is the detection of small increases in inaccuracy or imprecision. The practical relevance of this situation must therefore be considered. One often-used argument is that highly developed analytical tests have low analytical variances compared with medically important variances (6). According to this argument, the use of EWMA charts would probably be appropriate for unstable tests with a relatively high variation (in comparison with the medically relevant variation). Given that small shifts or increases of random error might result in changed medical decisions, strict control limits must be set, a situation in which the detection of small changes by the EWMA chart would be useful. On the other hand, sensitive charts (like the EWMA charts) might also be useful for stable and highly automated tests. Two points give credence to this assumption:

1) The medically relevant decision limits for many laboratory tests are under discussion and have not been clearly evaluated by studies. Also, certain specific situations often require higher accuracy and precision (e.g., creatine kinase measurements for evaluating the severity of myocardial infarction).

2) A small but constant shift may show incipient analytical problems before correction is necessary because of their medical relevance.

Thus, quality-control charts that are sensitive to small shifts, e.g., the EWMA chart, are highly useful.

In conclusion, the Westgard multirule system provides good overall performance for quality control. However, the EWMA-*¯x* chart allows the detection of small shifts earlier than the Shewhart chart does. The EWMA-*¯x* chart is therefore appropriate for use as a supplement to a multirule protocol as an additional quality-control tool. For those laboratories that still use only Shewhart-*¯x* charts, a combination of this with EWMA charts—or a pure EWMA system—can provide greater sensitivity without increasing the number of false alarms. The combination of an EWMA-*¯x* chart for inaccuracy and an EWMA-S chart for imprecision would, in my opinion, be the best choice. This two-charts system offers great flexibility, with optimal control of shifts and random error, but the implementation requires some effort. If accuracy is the main objective, the use of *¯x* charts may be sufficient. Many automated methods offer a nearly constant imprecision, which could be controlled by monitoring the CV, as is often done in German laboratories. As a compromise, inaccuracy could be detected by a combination of an EWMA-*¯x* chart with a Shewhart-*¯x* chart or by a single EWMA-*¯x* chart. Alternatively, inaccuracy could be controlled by using a Shewhart chart, and the EWMA-*¯x* chart could be used to determine trends without triggering out-of-control alarms.

### Formulae for the EWMA-*x*** chart**

The test statistic of the EWMA chart, given by *z*_{t}* *[= *w**x*_{t} + (1 − *w*)*z*_{t−1}], with *t*=1,2,… and *w* ∈]0;1], can also be written as: If the weights *w *are summed for all *t* (→∞), the sum equals 1, as: If the observations *¯x*_{t} are realizations of independent random variables with the expected mean μ_{0} and the variance σ_{0}^{2}, then the following statements are true (10): and For *t* →∞, it is true that: Thus, the asymptotic UCL and LCL for the EWMA-*¯x* chart are: and where *q* is the appropriate factor for the desired RL value. This factor can be determined by computer simulation or taken from Fig. 2⇑ (4).

Although for short runs the use of asymptotic RL values is not optimal, in practice the use of the correct time-dependent limits shows no advantage (13).

## Acknowledgments

This work was significantly improved by many discussions with C. Wolter. My special thanks to M. Page for getting this text into fluent English. I also thank C. Falkner, M. Borchers, and D. Wagner for their support of my work.

## Footnotes

Institut für Klinische Chemie und Pathobiochemie (Director: Prof. D. Neumeier), Klinikum rechts der Isar der Technischen Universität München, Ismaninger Str. 22, 81675 Munich, Germany. Fax 89-4140 4875; e-mail A.S.Neubauer{at}lrz.tu-muenchen.de

↵1 Nonstandard abbreviations: EWMA, exponentially weighted moving average; w, weighting factor; Cusum, cumulative sum; RL, run length; ARL, average run length; Rili-BÄK, German guidelines of the Federal Physicians Association [7]; UCL/LCL, upper/lower control limit; d, shift in inaccuracy (in multiples of the SD); ε, change in imprecision (in multiples of the original SD); q, limit for the EWMA chart; l, limit for a control chart; CLIA, US Clinical Laboratory Improvement Amendments (1988).

^{2}Random variables and their realizations are always printed in lower-case letters.↵2 A copy of the simulation program used can be obtained via WWW for IBM-compatible PCs without charge from: http://edv1.klinchem.med. tu-muenchen.de/∼neubauer/ or can be requested from the author via e-mail (A.S.Neubauer{at}lrz.tu-muenchen.de)

1 Implemented with

*n*control samples per batch, the weighting factor*w*, and the factor*q*, which determines the control limits of the chart.1 Simulated with

*n*control samples per batch. The Shewhart chart with the 1_{2s}rule is identical to a Levey–Jennings chart with 2SD limits. For*n*= 2, the same chart is listed with*n*= 2 and*l*= 2. The Rili-BÄK charts listed are multirule charts with the Shewhart limits 2SD or 3SD and the two additional rules 7_{T}and 7_{x̄}as explained in the text. The Westgard algorithm is implemented with two versions of the R_{4s}rule as described in the text.1 Implemented with

*n*control samples per batch, the weighting factor*w*= 0.2, and the factor*q*, which determines the control limits of the chart.ε, increased imprecision in multiples of the original SD.

1 Simulated as described in the footnote to Table 2⇑ .

- © 1997 The American Association for Clinical Chemistry