Many investigators have used proteomic methodologies to investigate specific questions in basic biology and medical science. Indeed, many millions of dollars have been spent around the world on the generation of extensive lists of proteins that reside in a particular organelle, cell type, tissue, or body fluid. These lists represent the proteomes of those specific systems. Many studies attempt to go one step further and compare proteomes via a variety of approaches, in most cases with methods that are much too imprecise to be considered for clinical use. Some argue that it is difficult to point to the clinical application of these lists and crude comparisons, but at least 2 organizations are aiming to lead or synthesize expanded efforts to characterize the human proteome (see the suggested reading in the Data Supplement that accompanies the online version of this Q&A at http://www.clinchem.org/content/vol57/issue10). What does this mean for clinical laboratory medicine? Can the results of these efforts possibly lead to important discoveries that will change how we care for our patients? We compiled the opinions of 4 experts in proteomics and genomics to provide a perspective from individual leading investigators and the National Human Genome Research Institute. Their views provide cautious optimism and important insights on how to move forward.
In general, have experiments aimed at cataloguing human proteomes made important advances?
N. Leigh Anderson: Yes. I would point to 3 significant results as examples: (1) the reliable detection of almost 2000 proteins in human plasma by numerous groups; (2) passing the halfway point in the Human Proteome Resource effort to use antibodies to map locations of the human proteins by Matthias Uhlen et al. (see suggested reading in the online Data Supplement); and (3) the recent synthesis and characterization of nearly all human tryptic proteotypic peptides by Robert Moritz, Ruedi Aebersold, and others. These have resulted from focused long-term efforts and lay a strong foundation for applied clinical research.
Ruedi Aebersold: I think they have made significant advances. First, there have been impressive technical advances that are reflected in the increasing number of proteins credibly identified. Second, the protein catalogs have generated a lot of new insights into the proteome and its organization. However, I believe that the time of cataloguing the protein contents of samples has passed. So has the time of simply recording changes in the proteome between 2 states. The focus must shift toward reliably and reproducibly quantifying a set of proteins in many samples.
Adam Felsenfeld: Limiting the answer specifically to results from large-scale cataloguing efforts, there have been no “blockbuster” results with clear clinical significance (specifically, it has been very difficult to identify reliable disease biomarkers), but notable efforts in structural genomics and efforts to comprehensively characterize the cellular localization of proteins are likely to enable basic and clinical research.
Is the “human proteome” a well-defined concept?
Daniel C. Liebler: No. This is the main problem with the whole notion of a “human proteome project.” Proteomes differ between tissues and between biological states. An attempt to select any particular tissue to represent “the proteome” would be arbitrary. The real utility of proteomics comes from defining characteristics of specific proteomes and subproteomes in biological contexts. Without context, a proteome is just a list.
N. Leigh Anderson: No. A simplified version, defined as a “prototypical” protein for each of the 20 000+ protein-coding genes, has some utility as a goal. However, the true size of the complete human proteome, including all splice variants and all posttranslation modifications, is a combinatorial problem of immense and unknown size and has little value as a project goal. Any large-scale coordinated effort needs to carefully define a clinically relevant subproteome that is completeable in a realistic sense. For example, “the proteins normally present at >1 μg/L in plasma” is a reasonable subproteome and probably includes 1000–2000 proteins.
Adam Felsenfeld: At its extreme, the definition is the comprehensive set of all human proteins, including the products of all coding sequences and alternatively spliced messenger RNAs, all modification states, and all variant proteins arising from human genetic variation. However, capturing all such information is well beyond the capabilities of current technology, and there have not yet been any efforts that have attempted to be truly comprehensive. Most current and previous efforts have very reasonably focused on a subset of proteins. In some cases, such a subset has been termed a “proteome,” but in such cases comprehensiveness is in the eye of the beholder.
Ruedi Aebersold: I think that to make the term operationally useful we should define layers of the proteome that are also related to technically addressable challenges (see the suggested reading in the online Data Supplement). More specifically, there are proteins and their isoforms encoded by the genome, each of those protein isoforms can have splice variants, and each of the spliced protein isoforms can be posttranslationally modified. Thinking of each level as a zoomable proteome map, one can see that these resources will allow investigators to determine more efficiently the biological or clinical information of proteins that is relevant for a specific project.
The Human Proteome Organization (HUPO) and BGI (formerly the Beijing Genomics Institute) have proposed large-scale projects to characterize the human proteome. Is this an important priority for the scientific community?
Adam Felsenfeld: Large-scale characterization of human proteins is a high priority. Without tackling the problem at a large scale with some coordination, we will never do more than scratch the surface. There are several high-priority large-scale proteomics efforts under way, and others that are being contemplated. However, as we have discussed, all of these address a subset of the human proteome, and in the opinion of the National Human Genome Research Institute, that is all that is technically practical at present. It is often difficult to decide whether more development is needed before starting large-scale projects; often it is not possible to determine whether there are technical barriers to achieving a certain goal or what those barriers are without actually beginning an effort at some scale. Regardless, we need to improve proteomics technologies to the point that truly comprehensive projects are feasible.
Daniel C. Liebler: An in-depth analysis of any human proteome by a network of laboratories would cost many millions of dollars. Even if done with highly standardized analysis platforms, the best possible outcome of the work would be a list of proteins and their modified and variant forms. The project proposals under discussion by HUPO lack focus or any clear biological context in which to do such studies. This is not a defensible use of resources, particularly at a time when research budgets are under tremendous strain.
Ruedi Aebersold: I do not know what the BGI plans to do beyond spending a lot of money on instruments. The HUPO plan is actually not a large project but an attempt to generate and make generally available resources, reagents, and data that should allow the research community in general to perform better, more reliable, and higher-quality proteomic studies. The HUPO Proteome Project is therefore enabling researchers.
Many people liken the importance of characterizing the human proteome to that of the effort to sequence the human genome. Do you think that characterization of the human proteome will lead to as many advances for medicine as sequencing of the human genome?
Ruedi Aebersold: I think that the human proteome for basic and clinical research is at least as important as the human genome, because it captures the actual state of a cell or tissue. However, a proteome map is useful but not sufficient. Proteomics lives from the dynamics of the system, and this is different in essentially all biological or clinical studies. Therefore I think—and this is supported now by the HUPO plans—that the focus should be not on a map but on enabling any researcher to reliably and quickly measure the proteome in a state that is relevant for his/her studies.
N. Leigh Anderson: While this analogy between genome and proteome is widely used, it is misleading; DNA sequencing (basically digital) is far easier than proteome characterization (basically analog) and is rarely limited by dynamic-range issues on account of the PCR. Nevertheless, I think a real understanding at the protein level will be more productive for medicine than genomics has been, principally because our ability to predict disease manifestation from genetic data, apart from a few spectacular examples, is so very limited.
Daniel C. Liebler: I think the analogy is faulty. Whereas the human genome is largely invariant between individuals and over the course of a life span, proteomes are different in all tissues and are constantly changing. To achieve an impact similar to that of the human genome project, a proteome project would have to characterize many proteomes in many contexts, which would be beyond the scope of resources conceivably available. What would be valuable instead would be focused studies that apply standardized technology platforms to characterize proteomic differences between distinct tissue phenotypes in biologically and clinically important contexts.
Adam Felsenfeld: The analogy with the Human Genome Project does have some validity. There are common lessons about mounting a large-scale scientific effort, and I would say that while the quest for comprehensiveness is important, genomics has shown that there is potentially a huge amount of value in even incomplete “-omic” data sets. But there are also many differences; at all levels, from the biological complexity to the sociology, proteomics will need to forge its own course and learn its own lessons. As with genomics, successfully characterizing the human proteome will undoubtedly lead to very significant medical advances across a very broad range of diseases. But as with the genome, the pathway is not going to be direct, and is probably unpredictable. Also similar to genomics, discovery of a robust difference between a “normal” sample and a disease sample is only one step: What does the difference mean? What do you do next?
The clinical translation of proteomics efforts relies on 2 main components: (a) developing methods to extend proteome coverage and (b) performing protein measurements in clinically relevant populations. On which component should the field be focusing at this time?
Adam Felsenfeld: It has to do both, and more. In some cases, of course, it will already be possible to take the limited proteomic information that we have, and the limited methods that we now possess, and use them to assay clinically relevant samples in looking for reliable differences. One can then make the translational step of understanding how those differences should be interpreted and used for patient treatment—though this is easier said than done. Without a doubt, there will be some clinical payoffs of this type. But the results will be limited unless we continue to develop our knowledge base of the meaning of proteome data.
N. Leigh Anderson: To date, most effort in proteomics has been devoted to extending proteome coverage with methods that cannot be generally applied to large sample sets, and I think this is the principal roadblock impeding clinical application. It is time to focus much more effort on accurately measuring proteins in clinical populations, in essence moving the right tools out of “proteomics” and into clinical research.
Daniel C. Liebler: We should use “a” to do “b.” There is still room for improvement of our technologies and our ability to standardize analyses. At the same time, we should apply our best, most stable methods to uncover and quantify the proteomic characteristics that confer clinically interesting phenotypes.
The approaches to characterizing the human proteome include using (a) shotgun proteomics experiments with mass spectrometry to identify and quantify the proteins in tissues and plasma, (b) immunohistochemical approaches to localize proteins in tissues, and (c) immunoprecipitation to identify the proteins with which each protein interacts. Are these the right approaches to characterize the human proteome?
N. Leigh Anderson: Each of the technologies mentioned plays an important role in proteome characterization, and ideally these and other methods will be employed in an integrated manner. It is interesting to note that most of the existing cancer markers were discovered by monoclonal antibodies before the advent of modern proteomics. The key missing component for human proteome work in my opinion is the lack of sensitive, specific, multiplexable assays for all human proteins. Without such assays and the means to run them on statistically significant sample sets (i.e., thousands of samples per study), progress in the biomarker space will remain very slow.
Daniel C. Liebler: These are the right approaches when applied in the right context, but, again, context is everything here. For example, shotgun proteome inventories of tissues is a sensible approach to compare well-characterized tissues that represent clinically relevant phenotypes (e.g., an aggressive tumor vs an indolent tumor). If the objective is a diagnostic to distinguish these tumor subtypes, then follow-up development of good antibodies to detect key differential proteins provides a path to a useful laboratory diagnostic test.
The question raises 2 other points: (1) Regarding immunoprecipitation-based studies of protein interactions, this type of work in model organisms (e.g., yeast) has been tremendously important. Expanded protein-interaction studies in human and mouse cell models will help us better understand the networks through which protein perturbations produce their effects. (2) Regarding shotgun proteomics in plasma (or serum), this is of little value, at least for biomarker discovery. The list of medium- and high-abundance blood proteins is well known, as is the fact that detection of low-abundance tissue-derived proteins is sporadic, even with depletion methods. Plasma and serum should be reserved for targeted analyses of specific proteins.
Adam Felsenfeld: From the point of view that current technologies may not be adequate, the specific approaches you mention are not as important as asking the right questions that will lead to significant insights and the development of better technologies that can tackle those questions. Some high-level questions relevant to the approaches you mention are: (1) What are the proteins and their amounts in any given tissue/cell type/individual at any given time? Then, what are the available approaches? You mention a few, mass spectrometry (either shotgun or targeted), use of affinity reagents, etc. What do they tell us, and how can they be made better? (2) What information can we gather about the function of all the proteins? Two important data sets are surely cellular localization and protein–protein interaction, and there are several approaches to determining these. As you listed, immunohistochemistry can be used for localization, as can methods involving protein tagging. For protein interactions, useful approaches include immunoprecipitation (direct or of tagged proteins), followed by mass spectrometric studies, as well as protein arrays and indirect methods, such as yeast two-hybrid analyses. The main point is that even if you have settled for a time on a method that is adequate to capture some information, it is valuable to continuously evaluate the adequacy of the method to address the questions. This is an important driver for development of new technologies, whether they are elaborations of current methods or completely new approaches. Importantly, a list of approaches is not complete without considering the rigorous analysis and integration of large proteomic data sets and the informatics infrastructure to efficiently accomplish this. If we don't focus on this issue to some extent, we will fail to effectively take advantage of proteomics.
Drs. Mark Guyer and Eric Green of the National Human Genome Research Institute assisted in preparing the Institute's response.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest:
Employment or Leadership: A.N. Hoofnagle, Clinical Chemistry, AACC; N.L. Anderson, Anderson Forschung Group and Clinical Chemistry, AACC.
Consultant or Advisory Role: A.N. Hoofnagle, Onconome; N.L. Anderson, Epitomics.
Stock Ownership: N.L. Anderson, Anderson Forschung Group.
Honoraria: None declared.
Research Funding: A.N. Hoofnagle, Waters; N.L. Anderson, Agilent Technologies.
Expert Testimony: None declared.
- Received for publication March 8, 2011.
- Accepted for publication March 28, 2011.
- © 2011 The American Association for Clinical Chemistry