BACKGROUND: Laboratory medicine practice guidelines (LMPGs) are an important part of clinical laboratory medicine. The Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument has been developed to evaluate the process of practice-guideline development and the quality of reporting. We assessed the applicability of AGREE II in assessing the National Academy of Clinical Biochemistry (NACB) LMPGs.
METHODS: The NACB website was searched for all available LMPGs up to December 2011. Two independent appraisers used the AGREE II instrument to assess each LMPG identified by the search. Quality was assessed across 6 domains (scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence), comprising a total of 23 items and 2 overall assessments, each scored on a 7-point scale (1, strongly disagree, to 7, strongly agree). All scores were expressed as AGREE II calculated percentages (100% indicates that all items scored 7 by all appraisers).
RESULTS: Eleven LMPGs were identified. All of the LMPGs provided some information seen as applicable to clinical practice by the appraisers. Only 5 of the LMPGs had overall scores ≥50%, with a median score of 42% (range: 8%–92%). Individual domain scores varied considerably from 0% to 100%. One guideline achieved a very high score on the instrument.
CONCLUSIONS: The AGREE II instrument is applicable and useful to evaluate LMPGs. All domains were evaluated as being useful to assess LMPGs, some were addressed well (e.g., clarity of presentation), whereas others could be improved (e.g., applicability).
The National Academy of Clinical Biochemistry (NACB)5 has produced several laboratory medicine practice guidelines (LMPGs) intended to inform clinical laboratory practice. These guidelines are the most relevant source of information regarding evidence-based practice in laboratory medicine for clinical chemistry. There are other guidelines produced at local, regional, and national levels that are specific for laboratory testing or that include aspects of laboratory testing.
Standardized protocols, methodology, and ways of reporting evidence for guidelines have been developed (1–3). Following a standardized methodology should result in higher quality clinical practice guidelines (CPGs). The standardization of guideline methodology is intended to make it easier for all healthcare providers and the public to appraise, agree with, and implement the guideline (4, 5). A good example of guideline standardization is provided by the National Guideline Clearing House, which produces standardized summaries of guidelines, reformatting their content and verifying critical information to further support implementation and use of the CPG (6). Standardizing the production of guidelines at the source by emphasizing good methodology is the ultimate goal (7). Initially, it is important to use appropriate tools to ensure a reliable and reproducible selection and appraisal of the evidence used for CPGs during the systematic review step of the guideline development process (2, 4, 5). Subsequent steps will ensure the transparent and accountable translation of that evidence into the guideline recommendations (8, 9).
Because most guidelines are issued in the fields of treatment and prevention, the applicability of the general guideline methodology to LMPG must be considered, because several specific aspects may be included in laboratory-related guidelines that might not be considered in guidelines for producing CPGs or in evaluation instruments designed for CPGs (10). For example CPGs for diabetes mellitus have been reviewed on 2 occasions, and in both occurrences the methodology used and the reporting of laboratory testing was found to be variable and not standardized (11, 12). This finding raised the question of the validity of the clinical laboratory testing component of these CPGs (12). In addition, the suitability of all-purpose guideline methods for producing LMPGs may be questioned.
The Appraisal of Guidelines for Research and Evaluation (AGREE) collaboration has been formed with the purpose of improving the quality and effectiveness of CPGs (13). The AGREE collaboration has developed an instrument for use in evaluating guidelines, the AGREE II instrument (8). This instrument has been used to evaluate CPGs covering a range of clinical topics around the world. The AGREE (9) or AGREE II instrument also has been used to evaluate guidelines that relate to aspects of laboratory medicine (11, 12, 14–17). In addition to the reviews of diabetes CPGs (11, 12), Watine et al. published an example of evaluating CPGs with laboratory information using the AGREE instrument (17). However, no publication evaluating a series of LMPGs has been produced to confirm the applicability of AGREE II to LMPGs. The current study was designed to demonstrate the applicability of the AGREE II instrument to LMPGs by using it to evaluate a series of LMPGs produced by the NACB. This work has enabled us to report the quality of NACB LMPGs and make recommendations for improving future LMPGs.
The NACB website was searched for all LMPGs published up to December 2011. All LMPGs identified as current and available for downloading as full files were eligible for inclusion (18). Other published versions were not used. When available, associated supplemental files were also collected from the NACB website.
The AGREE II instrument (8) was used to evaluate each of the identified LMPGs produced by the NACB (18). The AGREE II is organized into 6 quality domains (scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence) comprising a total of 23 items (9). Each item targets 1 key aspect of the practice guideline quality. The instrument also includes 2 overall rating items requiring the appraiser to make overall judgments of the practice guideline (9). Each of the items and the 2 overall rating items were assessed for their applicability to the LMPGs and then rated on a 7-point scale (1, strongly disagree to 7, strongly agree). A score of 1 was given when relevant information was very poorly reported or not provided. Scores from 2 to 6 were assigned when the reporting did not meet the full criteria or considerations for an item, with scores increasing as more criteria and considerations were met. A score of 7 was given when the quality of reporting was exceptional and all criteria and considerations were met in full for an item. Four appraisers (A.C. Don-Wauchope, J.L. Sievenpiper S.A. Hill, A. Iorio) were trained to use the AGREE II instrument with the on-line training instruments provided on the AGREE trust website (13). Two appraisers independently reviewed each LMPG using an online or printed version of the AGREE II instrument. The eligible LMPGs were randomly allocated to appraisers with a random number generator. The independent appraisals were discussed by the team at a face-to-face meeting of all of the appraisers, and if the individual scores varied by 4 or more points on the 7-point scale the individual scores were reconsidered after the documentation was looked at. There was no assumption that the 2 scores should agree, nor was there any further adjustment of the revised scores (9).
Data are expressed as calculated percentage scores. The AGREE II instrument (8) calculates domain and overall rating scores by summing up all of the individual item scores for each appraiser (obtained scores) minus the minimum possible score [minimum possible score per item (1, strongly disagree) × n items × n appraisers]. The total is expressed as a percentage of the maximum possible score [maximum possible score per item (7, strongly agree) × n items × n appraisers] minus the minimum possible score, according to the formula:
Agreement on individual items among the various appraisers was determined by weighted (quadratic) κ analysis of the final scores (Analyze-It, version 2.22, Analyze-It Software). We performed a sensitivity analysis by using the Mann–Whitney test to compare guidelines published before or after 2009 (publication of AGREE II).
Eleven LMPGs were identified (19–29) as current on December 5, 2011. For 4 (23, 25–27) of 11 LMPGs, reports included consideration of the methodological quality criteria of the AGREE/AGREE II instrument in the development of their guideline. Eight (19–21, 23, 25–28) of the LMPGs had an additional supplemental file that contained disclosures for conflicts of interest of the LPMG team. In our opinion these supplemental files were not easy to locate and were often not linked or referenced in the main version. The scoring of the item on editorial independence was done initially without, and then after considering, the supplemental files (Fig. 1). Table 1 has an additional row for editorial independence showing the supplemental scores.
The AGREE II instrument was applied successfully to all 11 of the LMPGs. Item 16 (the different options for management of the condition or health issue are clearly presented) was excluded in the Tumor Marker Quality Requirements LMPG (26). It was felt by both appraisers and agreed with by the team that this item was not applicable in the context of this guideline. All other items were deemed applicable to all LMPGs.
The overall scores for each guideline and the domain scores are presented in Table 2, and the individual appraiser scores are shown in Table 1. The quality of the guidelines, as scored by use of the AGREE II instrument, was generally poor. Only 5 (20–23, 25) of the 11 guidelines had overall scores ≥50%, with a median score of 42% (range: 8%–92%). Domains, which contributed to lower overall scores, were “Domain 3. Rigor and development” [28% (18%–85%)] (Fig. 2), and “Domain 5. Applicability” [25% (6%–67%)]. On the other hand, “Domain 1. Scope and purpose” [64% (31%–92%)] “Domain 2. Stakeholder involvement” [50% (11%–100%)], and “Domain 4. Clarity of presentation” [71% (19%–94%)] contributed to higher overall scores.
The Diabetes LMPG (23) scored the highest, reflecting the improvement in the way in which the guideline addressed the domains in the AGREE II instrument. It was also 1 of 2 guidelines to get “Yes” from both appraisers to the final item (“I would recommend this guideline for use”) of the overall evaluation (Table 1 and Fig. 1). The overall score for the Emergency Toxicology LMPG (29) was very low but it was still felt that it had some potential for use in a clinical context, with modifications by 1 of the 2 appraisers. A number of the other LMPGs (24, 26) received a “No” and “Yes, with modifications” on the final item of the overall evaluation. However, no LMPG was found not to be useful for clinical laboratory practice by both assessors. All had aspects that contributed to good laboratory practice and were considered to have a role in clinical practice.
There were 40 of 253 scores (16%) that differed by more than 4 on the 7-point scale. These required consideration for revision. Two domains [stakeholder involvement (27%) and editorial independence (22%)] had a higher percentage of differing scores (range (11%–27%). Most often the appraiser had not identified an item in the LMPG or had misinterpreted the information presented. After rescoring these 40 items, weighted (quadratic) κ demonstrated agreement of >0.5 for 9 of 11 LMPGs (Table 2).
When we compared domain scores for guidelines issued before or after 2009 using the Mann–Whitney test no significant changes were noted (Table 3). However, the direction of change for many of the domains was toward improvement, suggesting that a more standardized guideline development methodology was being followed. For example, the median applicability score improved from 13% to 29% (P = 0.07).
INTERPRETATION OF DOMAIN SCORES
Interpreting the domain scores is subjective, but the percentage score can allow the user of the LMPG to gauge the overall utility of the LMPG.
Domain 1 (items 1–3) covers the overall aim of the document. The best example of a well-described aim was found in the Point-of-Care Testing LMPG (22) (92%), whereas the Expanded Newborn Screening LMPG (19) received the lowest score (31%). The item that was least clear in this domain was the population to whom the guideline is meant to apply (Table 1). The Diabetes guideline (23) describes this item very well (maximum score) whereas the Pharmacogenetics guideline (28) does not (minimum score). Domain 1 is important for LMPGs because it defines the area in which the LMPG should be found applicable in both laboratory and clinical practice. Without clear definition of this domain, the potential users of the guideline will find it difficult to assess its applicability to their situation.
Domain 2 (items 4–6) focuses on the inclusion of appropriate stakeholders in the development of the guideline to ensure that the guideline represents the view of the intended users of the guideline. With the exception of the diabetes guideline this section was not that well addressed in the LMPGs. Domain 2 had a higher percentage (27%) of individual scores that required reconsideration, suggesting that the appraisers found it difficult to locate or interpret the information presented. The recording of the affiliation and roles of each of the committee members and the views and preferences of the public were poorly recorded in many of the LMPGs (Table 1). Domain 2 is important for LMPGs because the reader needs to be made aware that appropriate experts were involved in producing the document and that appropriate public representation and expertise were sought in the development of LMPGs. The list should include an affiliation, area of expertise, and what entity or organization the individual is representing on the guideline committee.
Domain 3 (items 7–14) assesses the methodology used to develop the guideline. Three items were particularly poorly addressed in most of the LMPGs (Fig. 2). These were: (a) the use of systematic methods to find evidence; (b) descriptions of the methods used to evaluate and select the evidence; and (c) a description of the explicit plan to update the guideline. Domain 3 is crucial for the reader to assess the validity of the LMPG by enabling judgment of the underlying quality of evidence.
Domain 4 (items 15–17) covers the clarity of the presentation and structure of the guideline. The importance of domain 4 in any guideline is straightforward. Most of the guidelines were well written and clearly presented (Table 1). Item 16, describing the different options for management of the condition or health issue, was the item for which appraisers struggled to reconcile the explicit criteria for the AGREE II instrument with the laboratory-specific quality requirements guideline (26). This difficulty might be partly related to the specific field of clinical chemistry, but some attention to the offering and discussion of alternatives, such as screening, diagnostic, and prognostic test use, would certainly add value to the guidelines. However, item 16 is mostly based on the depth and breadth of the question for the LMPG. If the question is very laboratory specific it would probably be more difficult to meet the criteria for item 16. As reported in the results, this item was deemed not applicable in 1 guideline, because the question being addressed was focused on the area of quality of testing and not the utility of the laboratory test in clinical practice (26). However, if an LMPG addresses a clinical utility question, then explicitly stating the question should result in the ability to adequately address this domain.
Domain 5 (items 18–21) reviews the way in which the guideline describes the applicability of the recommendations to clinicians and clinical laboratory practice, including the identification of barriers and facilitators to implementation. This domain scored quite low for many of the LMPGs. Item 20 (“The potential resource implications of applying the recommendations have been considered.”) was very poorly addressed in all the LMPGs (Table 1). Item 21 (“The guideline presents monitoring and/or auditing criteria.”) was poorly addressed with the exception of the Diabetes LMPG. Domain 5 is an important section to consider in the development of LMPGs and we would encourage developers of LMPGs to address this domain in future guidelines.
Domain 6 (items 22 and 23) is concerned with the editorial independence and risk of bias of the document. Items 22 and 23 that deal with funding and conflicts of interest were initially scored low in most of the LMPGs (Fig. 1 and Table 1). This finding reflects the fact that online supplemental files reporting conflict of interest were not included or linked in the main report, and it was our opinion that this did not allow an easy assessment for the average user of their editorial independence. In addition the layout of the website has changed from December 2011, making the supplemental files a little easier to locate in May 2012. Table 1 demonstrates that scores for item 23 improved for 8 of 11 LMPGs when the supplemental files were considered. Reporting the conflicts of interest is important for readers to understand the relationship between the funding agency and the authors of the guideline as well as to determine independence. Explicit listing of potential conflicts of interest is a requirement for most publications and presentations. Production of guidelines can potentially influence a wide range of practice and it would be desirable to have any potential conflicts reported explicitly in the guideline. It would be preferable if the committee was selected from individuals who did not have substantive conflicts of interest. Representation from industry is important but should not influence guideline development.
The final evaluation of the guideline summarizes the overall opinion of the appraiser for the quality of the guideline and if the guideline is recommended for clinical use. This result represents the overall opinion of the appraiser (Fig. 1). It is important to note that the AGREE II instrument does not assess content validity of the LMPG (7, 17), let alone the quality of the evidence. What is important is that the methods by which the evidence was sought and weighted are clearly and reproducibly reported in the LMPG. The AGREE II instrument is designed to improve the reporting of LMPG methodology and to encourage clarity of presentation and transparent reporting of conflicts of interest. These improvements enable the reader to judge the validity of the CPG. A high score on the AGREE II instrument indicates that the information meets these criteria. A low score indicates that the information is not well presented or the methodology is not well reported. Of course a high-quality method does not eliminate the risk of relying on poor evidence, but a poor method does not allow the reader to appraise the value of evidence at all. We judged the quality of the process, including the quality of evidence selection and appraisal, and the quality and completeness of the process leading from evidence to recommendation. Direct assessment of the quality of evidence content is beyond the AGREE II instrument and was not assessed in this investigation.
UTILITY OF THE AGREE II INSTRUMENT
We found that the AGREE II instrument was useful in appraising 11 NACB LMPGs. The AGREE II domains and items were all applicable and appropriate in the evaluation of most LMPGs that consider the clinical application of laboratory tests. The AGREE II instrument was possibly less useful for the appraisal of LMPGs that considered purely technical, analytical, organizational, or quality aspects of clinical laboratory testing. It would be recommended that all groups writing LMPGs consider these criteria as they plan and prepare the LMPG. These findings concur with the previous use of the AGREE instrument in a laboratory medicine context (17).
Most of the LMPGs had overall scores <50%, which suggests that they were of lower quality. Although some of the domains were well covered and had higher mean scores, others were not. Table 3 demonstrates that the median overall score did not change, although many domains had changed. The most recent LMPG (Diabetes) (23), which reported the use of the AGREE II instrument in the development of the guidelines, scored the highest, reflecting the improvement in the way in which the guideline addressed the domains in the AGREE II instrument. One can argue that this guideline scored better as the obvious result of the instrument having been used in the planning phase of the guideline production, which is of course true. On the other hand, this assessment is not, in our opinion at least, biased, because the AGREE II is the only instrument available on a wide scale to improve the quality of CPGs. That we found an LMPG that successfully implements the AGREE II recommendations is actually a robust empirical demonstration that AGREE II can be used in the field of laboratory medicine.
For future LMPGs the following items should be considered more carefully, because these were items that could have been addressed but that scored poorly in this evaluation.
A specific description of the population to which the guideline (recommendation) applies is critical to the interpretation of the LMPG.
It is important for LMPGs to specifically describe who has produced the guideline and who has been consulted. Both expert and public opinion must be actively sought and reported. Detailed lists of funding sources and potential conflicts of the individuals and organizations involved in the LMPG should be made available as an embedded or linked appendix to the LMPG.
It is essential that every effort is made to ensure that all NACB LMPGs present their methodology in an explicit manner, with transparency and assessable external validity so that they become more accessible to both clinicians and clinical laboratories. Domain 3 requires the most improvement for future LMPGs. It would be recommended that clear and reproducible strategies for obtaining and appraising evidence are used and described.
A strategy for updating the LMPG must be clearly described in the document.
Writers of the LMPG should address factors that implicate resource use in relation to the recommendations of the guideline as described in item 20. This will facilitate translation of guidelines to local practice. In addition, the LMPG should suggest ways of monitoring the recommendations and the implementation of the recommendations.
Setting more focused questions for LMPGs may result in a document that is easier to digest and put into practice, but this approach might limit appraisal of the role and the value of a specific diagnostic procedure in the full scope of healthcare.
The NACB has selected topics relevant and important to laboratory medicine, and people working in clinical laboratories or clinical practice have considered these LMPGs a source of good information. The AGREE and AGREE II instruments have been extensively adopted in other clinical fields to evaluate guidelines. These evaluations can be made available on the on-line database of the AGREE trust to people who wish to adapt a given guideline to their clinical practice (13). Our findings show that the AGREE II instrument is useful for the purposes of evaluating LMPGs, as demonstrated by the variability of the scores we obtained and by 4 guidelines that were reported to have been formulated considering the AGREE instrument in the development process (23, 25–27). This evaluation of the NACB LMPGs has also identified areas for improvement in most aspects of LMPG development and reporting, which could be taken into account in the NACB guideline process.
↵5 Nonstandard abbreviations:
- National Academy of Clinical Biochemistry;
- laboratory medicine practice guideline;
- clinical practice guideline;
- Appraisal of Guidelines for Research and Evaluation.
(see editorial on page 1392)
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: No authors declared any potential conflicts of interest.
Role of Sponsor: No sponsor was declared.
- Received for publication March 7, 2012.
- Accepted for publication July 11, 2012.
- © 2012 The American Association for Clinical Chemistry