## Abstract

*Background:* ROC analysis is widely accepted to assess and compare diagnostic validity of laboratory tests. Within the last few years, many new ROC programs have become available but have not been systematically evaluated. The aim of this study was to assess different ROC programs regarding their ease of use, mathematical correctness, final output, and their compatibility with other graphics programs.

*Methods:* Eight available programs running under Windows (AccuROC, Analyse-It, CMDT, GraphROC, MedCalc, mROC, ROCKIT, and SPSS) were evaluated. ROC analyses of prostate-specific antigen and related values were performed from a dataset of 928 men with prostate cancer and benign prostatic hyperplasia and corresponding subsets. Criteria such as data input, data output, and correctness and completeness of results were used to evaluate the practicability of the programs.

*Results:* Although the programs produced equivalent results (areas under the curves and their characteristics), we observed deficiencies concerning input of data, processing of the output data, and completeness of the results. Analyse-It, AccuROC, and MedCalc exhibited good performance, but each program had different shortcomings. Only GraphROC could compare curves at a certain sensitivity or specificity cutoff.

*Conclusions:* Adequate ROC analysis and ROC plotting cannot be performed with a single program. Analyse-It, AccuROC, and MedCalc can be recommended with certain limitations. Further improvements of the programs are necessary.

ROC analysis is now a standard tool to assess, define, and compare the diagnostic validity of laboratory tests or diagnostic measures (1). Medline searches have shown that the number of publications using ROC curves has increased from ∼300 studies in the 1980s to >5000 studies since 1990. Several computer programs have been developed to generate ROC curves, and some of the early programs were briefly described in 1993 (2). However, all of these early programs had limitations for easy and accessible practical use. Within the last several years, commercial and public domain programs have become available for complex ROC analysis and ROC plotting. To our knowledge, an overview and comparison of these newly available ROC programs has not been performed.

The aims of this study were *(a)* to survey currently available ROC programs, *(b)* to compare these ROC programs for their ease of use, and *(c)* to evaluate their relative utility in ROC analysis.

## Material and Methods

### roc software studied

Eight currently available ROC programs were evaluated (Table 1⇓ ⇓ ). All programs run on IBM-compatible computers. We performed all our evaluation studies on computers running under Microsoft Windows 2000 with at least 128 MB of RAM, a Pentium processor, and 250 MB of space on the hard drive. The general features of these programs are summarized in Table 1⇓ ⇓ . The software Stata 7.0 was not included in this comparative study. We were unable to make a complete evaluation of this software because all necessary calculations could not be performed although we repeatedly discussed the issues with company representatives in Germany via the company hotline.

### datasets for roc analysis

To compare the programs, we used a previously described dataset of 928 men with prostate cancer (n = 606) and benign prostatic hyperplasia (n = 322) and subgroups of this population (3). ROC analyses of total prostate-specific antigen (tPSA),1 free PSA (fPSA), the ratio of fPSA to tPSA (fPSA/tPSA), and of other values calculated by an artificial neural network approach with the mentioned dataset (3) were carried out to estimate the advantages and disadvantages of each program.

### evaluation criteria

To evaluate the programs, five simple criteria were chosen to encompass the ease of learning program operations, use of the software, and data handling and to characterize the usefulness of a each program (Table 2⇓ ). A maximum percentage value was assigned to each criterion. The sum of all percentage values gives the final score. The criteria are described briefly below:

#### Data input.

It is important to import or copy data into the program easily without any intermediate storage or special format, to be able to edit the data in the program (e.g., in a spreadsheet), and to save more than one dataset. The tendency of each program to crash was also taken into consideration.

#### Data output.

Presentation of the results and processing of the exported data were assessed. The program should be able structure the results comprehensively. Processing of data characterizes the capability of the program to export and save the results, including the calculated graphs, as well as to draw more than one curve in one graph. This facility is very important for comparing several tests with each other.

#### Analysis results.

This criterion was the most important one and included correctness and completeness of the results. It is obvious that correctness of results is mandatory. Incorrect results had to be considered as an exclusion criterion to recommend the respective software for ROC analysis.

There are several approaches to calculate the area under the ROC curve (AUC) for the comparison of ROC curves. Table 3⇓ lists the main characteristics and limitations of three commonly used methods. It is crucial to know whether the curves result from independent or dependent (correlated) data. In laboratory diagnostics, the values of interest are in most cases measured on the same patients. We therefore considered only methods for correlated data. A second distinction can be made between nonparametric and parametric methods. Parametric methods are efficient under certain assumptions. These assumptions are often not fulfilled in practice, and their results are biased. Nonparametric methods should be used if the variables follow an ordinal or skewed distribution or if there are small sample sizes. A parametric approach should be preferred in case of a large sample size and continuous measurements.

The subcriterion completeness assessed the capability of a program to calculate all necessary ROC data for a reasonable decision regarding a diagnostic test. This included the AUC with its confidence intervals (CIs), the sensitivities and specificities at certain cutoffs with their CIs, the presentation of the graph, and the ability to compare the AUCs showing the respective statistical significance values.

#### Program comfort.

This point of the comparison dealt with the compatibility of the program with standard calculation, text, and presentation programs, e.g., Microsoft Excel, Word, or PowerPoint. Programs were also evaluated based on the availability of help functions, tutorials, and demonstration versions and ease of obtaining information regarding program updates.

#### User manual.

This criterion assessed the structure and comprehensibility of the user manual and whether the manufacturer provides an online manual, a homepage, or an e-mail address to solve current problems.

## Results

The ROC programs were tested with a previously described dataset and various subsets (3). The assessment ratings for the five evaluation criteria are given in Table 2⇑ for each program. As shown in one representative example (Table 4⇓ ), AUC calculated by the various programs, which in some cases used different calculation methods as described below, differed only marginally. In addition, equivalent statistical differences between the AUCs of the various markers were obtained. Thus, the essential demand concerning the correctness of results seemed to be fulfilled by all programs compared. Moreover, the other criteria were helpful to assist in ranking the software for usefulness. The individual programs are described below.

### AccuROC

AccuROC uses the method of DeLong et al. (4). To our knowledge, at this stage it is the only program that uses this method. The layout of the program is very well structured, and because of the comprehensive manual and the up-to-date homepage, the program is easy to learn. Up to three curves can be drawn into one graph, and the coordinates of each curve can be saved, which makes it possible to put more than three curves in one graph with use of a calculating program such as Excel. Furthermore, AccuROC can calculate the CIs and SD with a bootstrap method.

A serious drawback of this program is that except for the graph and its coordinates, none of the other results can be saved or exported; they can only be printed. If a diagnostic marker shows that lower values are associated with a higher risk of disease, all the test values have to be transformed by rendering them negative, manually or using a spreadsheet. This procedure makes the data input quite complicated.

### Analyse-It

This software was published in 2001. The ROC analysis is performed according to the method of Hanley and McNeil (5)(6). According to the information of the software developers, an update was planned for the end of 2002. This update should use the method of DeLong et al. (4). It is an add-in program for Microsoft Excel. Like the software MedCalc, it is a program that implements several statistical procedures, including ROC analysis. It is simple to use and provides a very good online manual, help function, and tutorial. An advantage of its integration into Excel is that the interplay with other programs is excellent. Data input is easy, and the layout is clearly arranged. All necessary results are calculated in one step, and up to three curves can be displayed in one graph.

Unfortunately, AUCs can not be compared if any AUC is <0.7, but this will also be changed in the update version. Another drawback of this program is that it does not calculate CIs for the sensitivities and specificities.

### cmdt

CMDT is a freeware program and can be downloaded from the internet (Table 1⇑ ⇑ ). An estimate of the AUC is given by the Wilcoxon rank-sum statistic. For comparison of ROC curves, it uses a permutation test suggested by Venkatraman and Begg (7).

The drawbacks of this program are that it is prone to crashing, the graph can barely be edited in the program, and only one curve can be displayed. This makes it impossible to compare curves visually. Furthermore, the graph is not of publication quality and has to be saved as an extended metafile to be processed in another graphics program.

The advantages of the program are that it uses a bootstrap method to calculate the CIs and that the data can be edited in the program.

### GraphROC

The program GraphROC uses the method of Hanley and McNeil (5)(6) to calculate the ROC curve. It is one of the first commercially available programs on the Windows platform and is still in use (8). GraphROC is a longwinded program. Creating an input file is complicated, and it is not possible to edit the data after loading them into the program. Every result has to be copied via clipboard to save it. To edit the graph, it has to be copied via clipboard into another graphics program. In addition, the program is susceptible to crashing.

The advantages of GraphROC are the ability to draw several curves in one graph and the opportunity to compare paired and unpaired datasets. It is also possible to compare curves at a certain sensitivity or specificity cutoff, which is, as far as we know, a feature that only GraphROC provides. A demonstration version of GraphROC can be downloaded.

### MedCalc

MedCalc also works with the method of Hanley and McNeil (5)(6). This program is very interesting for those users who wish to do more than just ROC analysis because it provides a wide range of other special biomedical statistics, e.g., Bland-Altman plots, Passing-Bablok regression, and logistic regression. The data import is very easy and is possible from Excel, SPSS, dbase, Lotus, and as a text file. The layout is clearly arranged, it is possible to export data, and the graph can be edited in the program. MedCalc provides an online manual, and a 30-day demonstration version can be downloaded from the company homepage.

A clear disadvantage of this program is that only two curves can be presented in one graph.

### mROC

mROC is a computer program that implements an approach of combining the ROC curves of several tumor markers or test values by the best linear combination, which maximizes the AUC under the hypothesis of a multivariate gaussian distribution (9). Methods for estimating CIs for the AUC are also provided (10). Furthermore, conventional ROC analysis is possible. Learning to work with the program is easy, the layout is well structured, and the provided manual is intelligible. However, the data input is quite complicated, and the data cannot be edited in the program. Numerical and graphic results can be exported. Unfortunately, only one curve can be displayed in a graph, and a comparison of different ROC curves is not possible.

By combining several markers or tests into one ROC curve, thus creating a “virtual marker”, this program brings interesting additional new aspects to ROC analysis. Nevertheless, it cannot be recommended for a convenient ROC analysis.

### rockit

ROCKIT is a free program developed by C.E. Metz et al. (11)(12)(13). Although it is mathematically a very well thought-out program, we would not recommend this program unless the user has a statistical background. It is uncomfortable to create an input file, the layout is somewhat confusing, the interplay with other programs is not optimized, it does not have a help function, and it frequently crashed when we used it.

Apart from these disadvantages, it calculates all necessary results, and with the included software PLOTROC (a program in Excel), several curves can be displayed in one graph.

### spss

Although SPSS is a widely used statistical program, the ROC analysis within this package is not yet fully developed. In SPSS it is not possible to compare ROC curves. More than one curve in a graph can be displayed only if either higher or lower values of a marker are associated with a higher risk of disease. Despite the advantage of this program to show a wide range of other statistics, a valid ROC analysis cannot be performed with this software.

As can be seen in Table 2⇑ , we did not find any software that fulfilled all our expectations perfectly. Every program had advantages and disadvantages. More detailed characteristics of each program are summarized in Table 5⇓ .

## Discussion

Since the original paper by Metz (14) describing ROC analysis and its use in optimizing diagnostic strategies, many enhancements have been made to further improve its use (2)(5). ROC analysis has recently been included in the checklist for reporting studies concerning diagnostic accuracy of medical tests (1). Other studies have focused on preconditions and their influence on diagnostic performance (15). Most studies comparing tumor markers (e.g., PSA and its molecular forms) are already using ROC comparisons (3). To perform these ROC comparisons, many commercially available programs have been introduced (4)(7)(8)(9)(10)(11)(12)(13)(14)(16); however, to our knowledge, a comparison study of the available programs regarding their technical and mathematical aspects has not been published. With this study, we analyzed the advantages and drawbacks of eight ROC programs to find the best-optimized program for ROC analysis for clinicians. The programs Analyse-it, AccuROC, MedCalc, and to a certain extent GraphROC show good performance, but each program has different limitations.

The results of the comparison show that three of the eight programs can make ROC analysis easier and more economical. The leading program is Analyse-it with a final score of 91%. Although this program received maximum scores for the criteria data input, software comfort, and user manual, it is not acceptable that only three curves can be displayed and that the CIs for the sensitivities and specificities are not calculated. However, add-in software for a program, such as Excel, that is already widely used is potentially valuable, and if the drawbacks can be removed in a future version, this software could make ROC analysis much easier. Except for SPSS, none of the other programs provides as good a help function and tutorial. Questions concerning the program are answered quickly via e-mail. Therefore, the price is acceptable considering such good service. Additionally, a full demonstration version can be downloaded atwww.analyse-it.com.

In second place is AccuROC with a total score of 85%. Its use of the totally nonparametric method of DeLong et al. (4) and bootstrap methods (17) and its well-structured layout are the strong points of this program. On the other hand, complicated data input and the fact that data output (except the graph) can only be printed and not be saved or copied are disadvantages. Another drawback is the limited license for 2 years and the limited use of this program for only one computer. If one attaches great importance to highly accurate results and accepts the mentioned drawbacks, we can recommend AccuROC.

The third software that we recommend is MedCalc, with a total score of 84%. Although the ROC analysis is only one tool of this program, all necessary parameters are calculated. Data and results are clearly arranged, and the general handling is easy. Unfortunately, only two curves can be presented in one graph, which limits the relevant use of this program. If it were not for this drawback, MedCalc would fulfill most of our expectations of efficient ROC analysis. Even the price is reasonable, considering the additional statistical methods included. For those who do not need a multicurve presentation and are interested in a wide range of other statistics, MedCalc is a reasonable software.

GraphROC achieved a score of 78%. The completeness of the results cannot be criticized. All the main parameters can be calculated with this software. It even has a feature that shows every possible cutoff point with its sensitivity and specificity in a separate diagram with automatic updating of clinical sensitivity and specificity values, by use of simple mouse clicks. The main drawbacks are the user-unfriendly data input and the longwinded processing of results and graphs. The user-friendliness of the program would be improved if there was a way to export the points of the ROC curve to either a text file or spreadsheet. This would give the user more flexibility in terms of graphic capability. GraphROC can still compete with the other programs for ROC analysis, although the software has not been further developed since 1996.

The shortcomings of the other four programs outlined above make it difficult to recommend these programs for regular ROC analysis.

In summary, it is surprising that valid ROC analysis with all necessary data and a good plotting function is not offered in a single program. It should not be necessary to use more than one software to perform a valid ROC analysis. Therefore, the programs Analyse-it, AccuROC, or MedCalc should be enhanced as described above to provide all necessary functions.

## Acknowledgments

We gratefully acknowledge Prof. Wernecke for helpful suggestions and Silke Klotzek for helpful technical assistance. The study contains parts of the thesis of S.W.

## Footnotes

1 According to the description of the software, not always tested.

2 Limited version.

3 Full version for 30 days.

4 Program can be downloaded free of charge.

5 For a 2-year license, including all updates during this time.

6 Depending on delivery mode (e-mail or disc), country from where it is ordered, and individual or institutional use.

1 Values are related to the respective evaluation criterion with the maximum values shown in parentheses. The final score was calculated from the results of all criteria.

1 A subset of 53 patients with prostate cancer and 64 patients with benign prostatic hyperplasia of a total group of 924 patients was analyzed to characterize the diagnostic power of the ratio of fPSA/tPSA and the artificial neural network output value regarding the differentiation between the two groups of patients (3). Data [mean of the area (SE) and/or 95% CI] are given as results calculated by the respective programs.

1 Only by use of other graphics packages.

2 In PLOTROC, an add-in program for Excel.

3 Only possible if either higher or lower values are associated with a higher risk of disease.

4 It is also possible to compare curves at a certain sensitivity or specificity cutoff.

5 −, no processing possible; +, processing possible; ++, good processing.

↵1 Nonstandard abbreviations: tPSA and fPSA, total PSA and free prostate-specific antigen, respectively; AUC, area under the ROC curve; and CI, confidence interval.

↵2 Both authors contributed equally to this article.

- © 2003 The American Association for Clinical Chemistry