As a part of ongoing efforts of the NCI-FDA Interagency Oncology Task Force subcommittee on molecular diagnostics, members of the Clinical Proteomic Technology Assessment for Cancer program of the National Cancer Institute have submitted 2 protein-based multiplex assay descriptions to the Office of In Vitro Diagnostic Device Evaluation and Safety, US Food and Drug Administration. The objective was to evaluate the analytical measurement criteria and studies needed to validate protein-based multiplex assays. Each submission described a different protein-based platform: a multiplex immunoaffinity mass spectrometry platform for protein quantification, and an immunological array platform quantifying glycoprotein isoforms. Submissions provided a mutually beneficial way for members of the proteomics and regulatory communities to identify the analytical issues that the field should address when developing protein-based multiplex clinical assays.
The in vitro diagnostics industry is experienced in the regulatory processes involved in the clearance or approval1 of diagnostic tests and is knowledgeable of the many resources available to help in this regard. For example, regulations, recommendations, standards, and other useful material [e.g., Code of Federal Regulations, US Food and Drug Administration (FDA)2 guidances, Decision Summary or Summary of Safety and Effectiveness for every FDA-cleared or -approved assay, package inserts of every approved premarket application assay, consensus standard documents developed and published by the CLSI, International Standards Organization documents] are often used during the development and validation of new diagnostic tests. Almost all of these resources are available online. The translational proteomics community, however, may not be fully aware of the intricate regulatory issues involved in moving a newly developed test to market. This complex process can appear particularly daunting when new technology [e.g., mass spectrometry (MS)-based proteomics measurement of clinical analytes] is applied. Understanding of the process is further complicated by the requirement that the FDA maintain confidentiality regarding test-specific discussions and submissions made for regulatory purposes, and device sponsors do not often make their interactions with the FDA available for public review. Thus, there are few opportunities for others to learn from this information.
To begin to demystify the process and to start addressing uncertainties in translational research about the types of studies and data needed to establish the performance of tests on complex protein-based platforms, the participants of the National Cancer Institute–FDA Interagency Oncology Task Force on Molecular Diagnostics workshop held in October 2008 recommended that the Clinical Proteomic Technology Assessment for Cancer (CPTAC) consortium take the lead in developing 2 “mock 510(k)” documents (nonregulatory documents fashioned in a format similar to a regulatory—in this case, 510(k)—submission) (1). The National Cancer Institute and the CPTAC consortium provided these documents to the FDA to obtain initial feedback in the hope of understanding the requirements for further development, as well as to help orient the FDA to multiplex MS and affinity array platforms. The documents sent to the FDA included both the original submissions and 1 supplement for each submission that was written as a response to the first draft of FDA feedback. The FDA’s feedback included questions and comments from the FDA reviewers working with the National Cancer Institute and CPTAC collaborators.
It is important to understand that the FDA divides all devices into 1 of 3 classes, depending on the risk of the device. Review requirements are scaled to the classification of the particular test’s intended use. A brief overview of the FDA requirements for different classes of in vitro diagnostics assays is provided elsewhere in this issue (1). In general, the classification of the diagnostic test depends on the risk of the intended use of the assay. An intended use with a higher risk will dictate that the assay be classified as class III, requiring a premarket application to the FDA, whereas moderate-risk intended uses would receive a lower risk classification (class II), in most cases requiring a 510(k) submission. For classification purposes, the FDA primarily considers the risks associated with in vitro diagnostics assays in terms of assessing the harm derived from an incorrect result (false-negative and/or false-positive results), not merely whether the procedure is invasive or not. For example, what would be the risk of an assay reporting a false-negative result when the woman actually has breast cancer? Would a false-positive result be followed by unnecessary surgery or a treatment with toxic side effects?
Materials and Methods
The following mock 510(k) submissions were provided to the FDA for review:
PepCa10—a multiplex diagnostic test that uses immunoaffinity MS protein quantification (multiplex MS);
SDIA—an immunological array platform for simultaneous assay of multiple glycoprotein isoforms (multiplex affinity array platform).
It is important that readers of this article not assume that any classification decision was made regarding these tests and their intended uses. Because the classification of a new assay depends on the risk of its intended use (2) and because the types of assays and their intended use(s) were not specified in discussions for implementing this exercise, the term “mock 510(k)” was chosen for convenience and is not reflective of any FDA determination of risk level of the tests described. Indeed, it is probable that the intended uses for the assays of these 2 submissions, if made for regulatory marketing authorization, would be classified in the high-risk (class III) category and therefore would require a submission of a premarket application (see supplemental materials in the Data Supplement that accompanies the online version of this Special Report at http://www.clinchem.org/content/vol56/issue2 for specific comments regarding the regulatory pathway and intended use). Those interested in the classification of a particular test should seek additional information from the FDA with respect to the particular intended use of their test and consult the accompanying report in this issue (1).
The efforts we describe represent a small but useful step in the direction of wider understanding of these critical processes. A submission called a “pre-IDE” (for investigational device exemption) initiates an informal process for obtaining informal regulatory feedback from the FDA about a manufacturer’s proposed analytical and clinical studies in support of an upcoming premarket assay submission (3). As indicated in Fig. 1⇓ , the pre-IDE process can occur approximately halfway through the process by which a successful biomarker discovery becomes a successful FDA-approved commercial diagnostic test. The 2 mock submissions (available in the online Data Supplement) were intended to emulate submissions to the FDA for premarket clearance or approval of 2 hypothetical protein-based multiplex diagnostic devices. These submissions were prepared by proteomics translational-biomarker researchers, and because they were mock submissions, they were sent as pre-IDE submissions. In these submissions, the proteomic researchers provided very detailed descriptions of the platforms and assay designs (i.e., in the Device Description section of a submission). Although these mock submissions were less detailed than a premarket submission would be (i.e., they described hypothetical analytes, provided little or no information about the chemicals used, contained no performance data, included incomplete evaluation studies and protocols, and were reviewed with an extremely limited timeline—all of which precluded the extent of more focused FDA comments), the exercise has been very instructive for the submitters. The FDA review comments to the submitters took 2 formats: (a) a review memo that asked relatively general questions that would normally require submitters to respond with additional supporting data and (b) comments interpolated within the originally submitted documents pinpointing additional questions of regulatory and scientific concern.
The complete mock submissions as well as the FDA review’s comments and questions are available in the online Data Supplement. Readers should keep in mind several important points regarding the supplementary information:
These 2 mock submissions are simply presubmissions. Only partial and hypothetical analytical data were submitted. There were neither clinical data nor information regarding performance of the specific instrumentation, software, and other equipment.
Not all regulatory issues regarding the submitted material were resolved, given that no regulatory decision was intended to be made because of this exercise.
Considering that the submitted data were hypothetical, the FDA responses may not be fully representative of comments on an actual data submission. Thus, in a true regulatory submission, additional and/or different comments may be made.
Readers should consider the supplementary information as helpful in understanding the regulatory process and lines of questioning, while at the same time realizing that it does not represent a complete regulatory review.
Results and Discussion
A series of interesting issues emerged in the mock submission process that could facilitate future attempts to introduce real multianalyte protein-based clinical tests with the 2 approaches covered here.
pepca10—a multiplex immunoaffinity ms protein-quantification test
This mock submission (file “Submission affinity multiplex with FDA comments v1” in the online Data Supplement) describes a hypothetical 10-plex MS-based test (PepCa10) that can measure the concentrations of 10 tryptic peptides (2 peptides from each of 5 proteins) in a patient plasma digest. The result of this test is to aid physicians in deciding whether to recommend a breast biopsy for patients with suspicious or abnormal mammogram results. The peptide analytes are captured from the digest by specific affinity reagents (antipeptide antibodies) and measured by quantitative triple-quadrupole MS in relation to 10 respective internal standards, which are stable isotope–labeled peptides of identical sequence. The individual peptide measurements are combined with the proprietary PepCa10 algorithm to yield a single score. With the use of an appropriate cutoff, the diagnostic test would provide a binary final output: “low risk” (negative) or “high risk” (positive).
The hypothetical PepCa10 submission highlighted several areas (described below) in which the protein measurement could be substantially improved, compared with current technology, although it lacks the depth of experience and detailed understanding of failure modes acquired for immunoassays over the past 40 years.
There are FDA-cleared tests that use mass spectrometers, such as for drug and metabolite monitoring, but none of the more recently developed mass spectrometer platforms used widely in proteomics research has been approved by the FDA for clinical laboratory use. The currently available triple-quadrupole mass spectrometers and their software were not manufactured under good manufacturing practices and so were not in compliance with the FDA’s quality system regulation (4) required by the FDA for diagnostic assays and instruments to assure consistent performance in clinical use. This fact might not be an issue, given that major manufacturers appear to be developing plans for FDA-approvable instrumental platforms manufactured under good manufacturing practices, which are likely to become available in the next few years.
In at least one respect, the inclusion of a sample-digestion step (converting proteins to peptides) represents a definite increase in assay complexity compared with standard immunoassays. It would be helpful, if not necessary, to include an additional level of internal standardization and/or process QC to ensure control of this process. Consequently, there will likely be a need to develop new approaches to understand and control variation in this process.
Particular features of MS-based protein/peptide assays have the potential to overcome some of the significant limitations of current immunoassays while possibly introducing other challenges. These differences lead to unique scientific questions during regulatory review. For example, it appears possible to achieve extremely high structural specificity for peptide analytes with tandem MS, and with it the capability to recognize and reject specific interferences within the assay itself. If proved, this capability could have the potential not only to eliminate the need to individually test potential interfering compounds, which would be extremely difficult in the case of the hundreds of thousands of peptides in a plasma digest, but also to facilitate flexible multiplexing in a single fluid volume. Because the advantages of eliminating interference are so substantial, the rigorous demonstration of this capability would appear to be a key item on the agenda of research groups such as the CPTAC consortium. It would be extremely useful to both proteomics and regulatory communities if MS proteomics experts could design and perform adequate studies that can demonstrate absence of interference effects for a particular platform and assay.
The potential for absolute structural specificity in MS peptide detection offers the possibility of producing true internal standards against which to measure analyte concentration. Differential isotope-labeling schemes for peptides could enable verification of the multiple stages of sample processing separately and thereby improve assay reliability. Early demonstration of such capacities could simplify test development.
Although the concept is beyond the scope of direct FDA regulation, submitters propose that substantial benefits might be derived from a common assay methodology—and possibly a common instrument platform—that could be used in both biomarker-verification studies and diagnostic testing in the clinical laboratory. Coupled with the use of chemically indistinguishable (i.e., stable isotope– labeled) internal standards, MS-based laboratory analyses could begin to approach the level of performance in which the differences between alternative commercial implementations of an assay might be substantially reduced.
sdia—immunological array platform for simultaneous assay of multiple glycoprotein isoforms
This mock submission (file “Submission MS immunoaffinity multiplex with FDA comments v2” in the online Data Supplement) describes an assay for glycoprotein markers associated with tumor metastasis that is aimed at addressing the issue of glycan variants on multiple cancer marker proteins and that uses an immunological array platform. Capture of the members of specific protein families would be achieved at individual array elements with antibodies that target a peptide epitope common to all members of the family. Members bearing the disease-associated structural feature(s) (in this case, breast cancer–associated Lewis antigens) would be targeted with a fluorescently labeled second antibody. Subsequent to the capture of all family members on specific array elements, the portion bearing the disease-associated glycan feature would be quantified by fluorescence at individual array elements. Up to 10 glycoproteins would be assayed in a single test. The use of assay wells 1 mm in diameter and containing 120 antibody array elements would make it possible to assay each glycoprotein at multiple array elements within the same well. Immunological array platforms such as the one described in the mock submission can generally run 100 samples/h in plasma directly, with purported antigen limits of detection in the range of 1–10 ng/L. It will be important to demonstrate that one can measure intact antigens in a highly quantifiable way with these assays and thereby capture the tertiary and quaternary structures, as well as the posttranslational modifications, relevant to the disease.
The hypothetical SDIA submission, like the PepCa10 submission, demonstrated several areas in which the measurement of proteins could be substantially improved, compared with current technology, while noting that these new methods will require additional depth of experience and detailed understanding of the failure modes for this type of technology.
Assay analytical specificity arose as an area of concern in the preparation and review of the mock submission documents. Assay analytical specificity could be improved from what is available in many of the current immunological assays. There are at least 20 glycoforms of prostate-specific antigen, for example, which are currently being measured together. This difference could be important given that glycosylation patterns can undergo major cancer-associated changes. Modern proteomics is revealing many similar cases in which there are large numbers of proteins of similar structure. Considering that so few genes give rise to so many protein species, a large number of protein variants can be generated by gene polymorphism, alternative splicing of pre-mRNA, posttranslational modifications, and protein complexation, and many of these protein species vary in biological function and activity. The importance of these facts for clinical diagnostics is the high probability that a single or small number of specific protein variants play a role in the mechanism of a disease. When that is the case, the analytical specificity of the test will be increased greatly by measuring only the disease-associated variants or isoforms. Such a test will be challenging given that small differences at a single position, such as the stereochemistry in a glycan or the location of a splice junction, can be the differentiating feature in a macromolecular biomarker. Accomplishing this level of analytical specificity in a multiplexed assay platform requires a high level of selectivity.
In considering common features of future assay systems, we assume that capture and enrichment of analytes will remain a major feature of protein assay systems. Moreover, no technology on the horizon appears to be capable of displacing antibodies and other high-specificity binding proteins as the capture agents of choice. What will change is the need for better qualification of antibodies. Although antibodies typically purify a protein analyte 1000-fold or more in the capture process, it is well known that they also bind other proteins through a variety of mechanisms. Nonspecific binding of proteins and peptides to antibodies and support matrices is well documented. Cross-reactivity with other proteins and peptides bearing the same or similar epitopes is another capture mechanism. Protein complexes to which the analyte is bound are captured as well. Regulatory agencies may require the information regarding which proteins and peptides are captured during an assay and how captured nonanalytes affect the measurement of a specific analyte. Modern proteomic methods should be able to address the issue of identifying nonanalytes.
As the mock submission documents were being prepared, the understanding emerged among protein researchers that the degree of complexity in preparing and validating tests of simultaneous parallel measurements of different analytes or biomarkers increases drastically with the number of biomarkers being measured. That is because when particular analyte values are reported for each queried species, each analyte must be independently validated, and assurance should be available that the measurement of one analyte does not affect the quality of measurement of another. Although simultaneous measurement of 100 biomarkers may be possible in the future, the effort of validating and correlating the relationships between these many markers and the target condition becomes formidable. It is important to differentiate a multiplex assay such as SDIA in which the result for each marker in the assay is reported separately from a multianalyte signature assay in which the result is reported as a single score. The validation approaches would be different.
common lessons and next steps
One critical lesson gained from this process is that any real submission to the FDA would need a strong scientific and analytical foundation and a clinical study design that reflects the proposed intended use of the diagnostic test. FDA classifies and reviews submissions in the context of the intended use (claim) of the assay. The intended use should be clear and well defined so that the FDA can understand the assay risk and determine the regulatory route for the premarket submission. Moreover, the FDA reviews all of the results of the study of analytical and clinical performance on the basis of the diagnostic claims made in the assay’s intended use and indications for use. Performance studies need to be designed to demonstrate the safety and effectiveness of a new test as it is intended to be used. For this reason, before the analytical and clinical evaluation process for novel and complex assays is begun, it would be helpful for a sponsor to consult with the FDA’s review divisions to discuss appropriate performance-evaluation studies. The biomarker community would be well served if these considerations were understood and had a direct influence on planning, even for early discovery studies.
Besides the intended use of the test, the types of preclinical studies needed for performance evaluation will also depend on the technology used. The mock submission exercise we have described is the first step toward formulating the types of preclinical studies specific to multiplex protein-based assays that may be appropriate for adequately evaluating an assay with this technology. The nature of preclinical studies will also depend on the type of results reported—quantitative, qualitative, or semiquantitative (ordinal).3 For example, a diagnostic test with quantitative results will require a linearity study, whereas a qualitative test with 2 outcomes (i.e., positive, negative) generally does not need this type of analytical study. The design of many analytical studies, such as for precision, will depend on identifying the major sources of imprecision for a specific test type and addressing these in a multisite reproducibility study (over a number of days, reagent lots, and so on), in which the sites chosen should represent the intended sites of the assay’s use. In this way, test performance provided in the FDA submission and assay labeling will reflect the performance of the assay when it is carried out for medical decision-making purposes in the clinical laboratory. It is important to note that the major sources of assay imprecision might vary for different proteomic technologies, and although there are some general documents that sponsors should consult or reference in performing evaluation studies (see references to CLSI documents in the review comments), not all reproducibility studies will necessarily need to have the same design.
Something worth considering within the CPTAC community would be to take one MS assay through a set of performance studies equivalent to those required for a 510(k)—e.g., the interlaboratory reproducibility. Such an investigation could move this field one step further toward revealing the issues that will need to be evaluated for a real cancer test.
A number of reviewers at the FDA had the opportunity to participate in the review of the 2 mock regulatory submissions, and this experience was valuable for them as well as for the proteomics community. Although there will be continuing effort to sufficiently develop various parts of the submissions for multiplex MS and affinity array platforms (such as appropriate analytical studies needed to demonstrate assay performance beyond description of the platforms and assay principles), these documents help establish a baseline for identifying issues and review expertise for proteomic technologies when FDA clearance or approval is sought in the future.
The goal of this report is to demonstrate the process and interactions between the sponsor and the FDA in a fashion similar to how they would proceed generally. Additionally, the feedback provided by the FDA provides some insight into the review issues that are relevant to these types of tests. Because the sponsors of the 2 mock submissions did not submit full responses and appropriate data and because they submitted hypothetical data in large part, many issues and problems were not mentioned and discussed. For these reasons, this document is not meant to be inclusive of all the requirements for any future submission that would be made to the FDA; however, this experience is a valuable step in helping to empower the scientific community with the right tools, and it serves as a preview of the regulatory mindset and direction for multiplex protein assays.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest:
Employment or Leadership: F.E. Regnier, Quadraspec Inc.; P. Tempst, Memorial Sloan-Kettering Cancer Center.
Consultant or Advisory Role: F.E. Regnier, Quadraspec Inc.; L.G. Kessler, Underwriters Laboratories.
Stock Ownership: F.E. Regnier, Quadraspec Inc.; N.L. Anderson, Anderson Forschung Group.
Honoraria: F.E. Regnier, NIH.
Research Funding: F.E. Regnier, National Cancer Institute, Clinical Proteomic Technology Assessment for Cancer; P. Tempst, National Cancer Institute–NIH; L.G. Kessler, National Cancer Institute; N.L. Anderson, Agilent Technologies.
Expert Testimony: F.E. Regnier, Goldberg trial (civil lawsuit in Phoenix, AZ).
Other Remuneration: F.E. Regnier, International Symposium on Microcolumn Separations, Dalian, China; S.J. Skates, Teva Inc.
Role of Sponsor: The funding organizations played a direct role in the preparation of the manuscript.
↵1 The FDA “approves” premarket application submissions and “clears” 510(k) submissions. For the purposes of this report, the words “approved” and “cleared” have the same meaning and are not related to any proposed or real classification decision for any device.
↵2 Nonstandard abbreviations: FDA, US Food and Drug Administration; MS, mass spectrometry; CPTAC, Clinical Proteomic Technology Assessment for Cancer; IDE, investigational device exemption.
↵3 An ordinal quantity is a quantity for which a total ordering relationship can be established according to magnitude. For example, the mass concentration of a protein can be expressed as +, ++, and +++. Differences and ratios of ordinal quantities have no meaning.
- © 2010 The American Association for Clinical Chemistry