Background: We investigated whether chromosome 9p21.3 single-nucleotide polymorphisms (SNPs), identified in coronary heart disease (CHD) genome-wide association scans, added significantly to the predictive utility for CHD of conventional risk factors (CRF) in the Framingham risk score (FRS) algorithm.
Methods: In the Northwick Park Heart Study II of 2742 men (270 CHD events occurring during a 15-year prospective study), rs10757274 A>G [mean frequency G = 0.48 (95% CI 0.47–0.50)] was genotyped. Using the area under the ROC curve (AROC) and the likelihood ratio (LR) statistic, we assessed the discriminatory performance of the FRS based on CRFs with and without genotype.
Results: rs10757274 A>G was associated with incident CHD, with an effect size as reported previously [hazard ratio in GG vs AA men of 1.60 (95% CI 1.12–2.28)], independent of CRFs and family history of CHD. Although the AROC for CRFs alone [0.62 (95% CI 0.58–0.66)] did not increase significantly (P = 0.14) when rs10757274 A>G genotype was added [0.64 (95% CI 0.60–0.68)], including genotype gave better fit (LR P = 0.01) and including rs10757274 moved 369 men (13.5% of the total) into more accurate risk categories. To model polygenic effects, 10 hypothetical, randomly assigned gene variants, with similar effect size and frequencies were added. Two variants made significant AROC improvements to the FRS prediction (P = 0.01), whereas further variants had smaller incremental effects (final AROC = 0.71, P <0.001 vs CRFs; LR vs CRFs P <0.0001).
Conclusions: Although overall, rs10757274 did not add substantially to the usefulness of the FRS for predicting future events, it did improve reclassification of CHD risk, and thus may have clinical utility.
The genome-wide association scan, a direct outcome of the Human Genome Project and HapMap, has revolutionized the field of genetics of complex, common diseases by identifying novel loci and gene regions associated with disease risk. To overcome the problem of type I errors, all findings require replication. Recently, 3 genome-wide association scans all identified a single region on chromosome 9p21.3 associated with coronary heart disease (CHD) or myocardial infarction (MI)1 risk (1)(2)(3). Helgadottir et al. (3) identified a single-nucleotide polymorphism (SNP), rs10757278, that showed significant association with risk (P <10−20) and was replicated in a total of 4587 cases and 12 769 controls, with a population-attributable fraction (PAF) of 20% for MI. Resequencing of the region did not identify any obvious causative mutations. McPherson et al. (1) identified 2 SNPs in the same genomic region, rs10757274 and rs2383206, both associated with risk, in 6 independent studies comprising more than 23 000 individuals and with a PAF of 10% to 15%. The Wellcome Trust Case Control Consortium identified rs1333049 in this region, also showing strong association with CHD risk (P <10−14) (2). Finally, a study that combined the Wellcome Trust Case Control Consortium data with data from Germany also confirmed the chromosome 9 region and identified 6 additional novel candidate genes for CHD (4).
9p21.3 is a chromosomal region relatively replete of open reading frames, and the closest genes are a cluster consisting of CDKN2A-ARF-CDKN2B,2 which lie in a linkage disequilibrium (LD) block, adjacent to these SNPs. This locus has been associated with tumor suppression, but also plays a role in cell proliferation, senescence, and apoptosis (5), all features implicated in atherogenesis. To date, the potential mechanism by which variants in this chromosome 9 region increase risk of CHD is unclear, and none of the genotypes used in the 3 studies were associated with any of the classic CHD risk factors such as blood pressure and lipid concentrations. The identification of this locus as being associated with CHD risk thus identifies a novel and potentially highly important new causal pathway for investigation and the development of therapeutic approaches that will complement current risk-reducing modalities.
While statistically robust, these studies have not addressed the issue of clinical utility. Case-control studies are efficient for gene discovery, yet they provide limited information on population allele frequency, attributable risk, effect of genes on other important risk factors for cardiovascular disease, or derivation of metrics essential to evaluate the predictive utility of genetic information. All of this requires prospective studies, which additionally allow assessment of effects on incident as opposed to prevalent CHD. Here, we investigated whether addition of chromosome 9p21.3 genotypes improves the prediction of CHD events of conventional risk factors (CRFs), such as cholesterol, triglycerides, blood pressure, age, and smoking, used in the Framingham risk algorithm (6). We followed 2742 healthy middle-aged men from the prospective Northwick Park Heart Study II (NPHS-II) for an average of 14 years, with 270 CHD events. We evaluated discrimination by use of the area under the ROC curve (AROC) based on combinations of CRFs and chromosome 9p21.3 region genotype, and we assessed the ability of the genotype to stratify individuals into risk categories by determining the number of men correctly reclassified. Model fit was assessed using the likelihood ratio (LR) and Bayes information criterion (BIC), which are global measures combining both discrimination and calibration, as suggested recently (7). The potential clinical benefit of such changes in risk stratification was examined by evaluating the proportion of subjects reclassified based on a 10-year CHD risk threshold for intervention of 20% as recommended by current guidelines.
Materials and Methods
NPHS-II is a prospective study of healthy middle-aged men (50–64 years old) recruited from 9 UK general practices (8). Full details have been reported (8)(9). Family history of CHD was assessed by questionnaire at baseline as described (10). In the 2742 white men with genotype data, by December 2005, there had been 270 CHD events comprising 175 acute CHD events (42 fatal), 72 coronary artery revascularization procedures, and 23 silent MIs.
We genotyped rs10757274 and rs2383206 by use of TaqMan technology (Applied Biosciences). (See the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol54/issue3 for more details on methods.)
We assessed associations with CHD risk by use of Cox proportional hazards models and derived hazard ratios (HRs). Analyses were stratified by general practice. For the conventional model, a score was derived based on age, triglycerides, cholesterol, smoking, and systolic blood pressure (9). (See the online Data Supplement for complete details on methods.) We then fitted a model including both conventional factors and rs10757274 genotype and obtained a second score by weighting according to the β-coefficients from the model (Table 1⇓ ). We evaluated the models by use of BIC and the LR χ2 (see the online Data Supplement for details on methods).
We used a simple model testing the changes of the ROC area with increasing genetic information to display the importance of combining a number of genes, with moderate effects, with the classic risk factors used by Framingham in the prediction of CHD events. (See the online Data Supplement for details on methods.)
conventional risk factors and chd events
The baseline characteristics of the Northwick Park Heart Study II, stratified by subsequent CHD event, are presented in Table 1⇑ . The men who went on to develop CHD during follow-up (n = 270) were older and had higher plasma cholesterol, triglycerides, and blood pressure, lower HDL cholesterol, and higher prevalence of smoking than those who remained CHD free (n = 2472). The HRs associated with these traits are presented in Table 1⇑ . Based on the measured variables included in the Framingham algorithm, the AROC given by this set of CRFs was 0.62 (95% CI 0.58–0.66), with a DR5 [detection rates (or sensitivies) for a 5% false-positive] of 13.5%.
genotype effects on chd traits and chd
The genotype distribution of rs10757274 A>G and rs283206 A>G were in Hardy-Weinberg equilibrium and strong LD (r2 = 0.89). The association of these SNPs with intermediate traits is shown in Supplementary Table 1 in the online Data Supplement. There was some evidence of a modest association with levels of apolipoprotein A-I and HDL cholesterol, with mean (SD) HDL levels being higher in those homozygous for the G allele [AA 1.68 (0.58) mmol/L, GG 1.76 (0.61) mmol/L, P = 0.03], whereas fibrinogen levels were lower in G carriers [AA 2.76 (0.52), GG + AG 2.69 (0.51), P = 0.002].
For rs10757274, the frequency of the G allele was significantly higher (P <0.007) in the CHD-positive men vs the CHD-free group [0.54 (95% CI 0.49–0.58) and 0.48 (95% CI 0.46–0.49), respectively]. As shown in Table 2⇓ , compared with men homozygous for the common A allele, men homozygous for the G allele had HR of 1.60 (1.12–2.28), P = 0.03, an effect that was not materially changed after adjustment for CRF. The PAF for this SNP was 26.2% (95% CI 7.1–41.1). The Kaplan-Meier survival plot associated with rs10757274 is presented in Fig. 1⇓ , showing the lower survival in those carrying one or more G alleles from the start and continuing thereafter.
When baseline HDL levels were included, the HR for GG men was 1.96 (1.31–2.94), P <0.003, suggesting that the mechanism of risk was independent of HDL levels. Fibrinogen added to the model had no additional effect. The effect size was consistent in subjects with definite MI [HR 1.45 (95% CI 0.99–2.11)], coronary artery bypass graft [HR 1.52 (95% CI 0.85–2.72)], and silent MI [HR 1.40 (95% CI 0.52–3.77)]. Subjects with possible MIs showed the same effect size, HR 1.79 (95% CI 0.68–4.70), but were not included in the CHD analysis. There was no statistically significant evidence for the risk effect being different in those who smoked or had different levels of obesity [as assessed by body mass index (BMI)] or inflammation [as assessed by C-reactive protein (data not shown)]. We have previously reported that in the NPHS-II men, a family history of early CHD was strongly associated with higher future risk, independent of CRFs (10), but the HR for rs10757274 (G carriers vs AA men) was 1.42 (95% CI 1.05–1.93), P = 0.02, in a model of CRFs without family history and 1.41 (95% CI 1.04–1.92), P = 0.03, after adjustment for family history, suggesting that the SNP did not explain a significant part of the family history risk. The prevalence of a family history of early CHD was not different by rs10757274 genotype (see Supplementary Table 1a in the online Data Supplement).
For rs283206, despite the strong LD with rs10757274, the effects on risk and trait association were generally smaller and did not reach statistical significance (see Supplementary Tables 1b and 2 in the online Data Supplement). Haplotype analysis of the combined SNPs did not add significantly to the risk estimates (data not shown). Neither SNP was associated with significantly increased risk of type 2 diabetes or cancers (data not shown).
combined effect of crfs and rs10757274 in the risk algorithm
When rs10757274 genotype data was added to the CRFs in the risk algorithm, for model 1 (adjusted for age and general practice) the AROC increased from 0.62 to 0.64 (P = 0.14) (Fig. 2⇓ ) with a DR5 of 13%. Thus, on its own, genetic variation near CDKN2A did not add to the overall risk prediction of CRFs. However, since no single genotype is likely to add significantly to CRF risk prediction (11), we modeled the effect on CHD risk of up to 10 hypothetical, randomly assigned gene variants, with allele frequencies and risk similar to those of rs10757274. The risk associated with each additional SNP is shown in Fig. 3⇓ . The addition of 1 further SNP with similar characteristics increases the AROC significantly (P <0.03), whereas the inclusion of 2 or more SNPs had a greater effect (P <0.001), with the addition of further SNPs having smaller incremental effect. As shown in Fig. 4⇓ , the AROC for 10 SNPs was 0.76, with a DR5 of 24.2%. However, whether this improvement in the AROC is clinically useful remains to be seen, as the proportion of individuals with multiple independently segregating risk alleles in any population is likely to be small.
clinical utility of genotype risk stratification
The potential in ability of the SNP to improve risk stratification was examined, as shown in Supplementary Tables 3–4 in the online Data Supplement. Based on their CRF score, men were divided into those with a 10-year CHD risk of <5%, 5%–10%, 10%–20%, and >20% risk. After inclusion of the genotype, 585 (21.9%) men were reclassified, of whom 63% (369) moved into more accurate categories (defined by the observed risk corresponding better to the predicted risk in the new category). As shown in Supplementary Table 4 in the online Data Supplement, for the single SNP model the BIC value decreased, and the LR statistic increased significantly (P = 0.01). Both BIC and LR take into account the increase in the number of predictors. The nonsignificant P values for the Hosmer-Lemeshow statistic indicated no problems with calibration in any of the models, and the increase in the P values as terms were added indicated that predicted risks corresponded better to observed risks. The corresponding values for the model including the 10 hypothetical SNPs indicated the expected considerable improvements in calibration.
We examined the potential clinical utility of this genotype for risk stratification in the NPHS-II men. The National Institute for Clinical Excellence, which sets standards of clinical treatment in the UK, recommends that subjects with a 10-year risk of cardiovascular disease >20% should be treated with statins. Using the variables in the Framingham algorithm, this equates to a risk score of 23. Of the 2670 men with a 9p21.3 genotype and complete trait data, 164 had scores >23, of whom 33 had an event (observed 10-year risk 23.3%). By adding rs10757274, 55 men who were originally below this cutoff using CRF alone were now above the cutoff; of these, 12 men went on to experience a definite CHD event (observed 10-year risk 24.0%). The mean cholesterol levels at baseline in these 2 groups were 6.73 and 6.78 mmol/L, respectively, and baseline LDL-C levels were 4.2 and 4.8 mmol/L. Based on the expected benefit of reducing their individual cholesterol levels to the Joint British Society Guidelines (JBS2) target of 4.0 mmol/L, the number of CHD events prevented per 100 treated would be 9.1 in the CRF-only group and 8.5 in the group identified using the genotype (P >0.5).
In a prospective study of healthy middle-aged men, we confirm here the results reported from 4 large GWA studies that variation in the chromosome 9p21.3 region is strongly associated with CHD risk. The risk estimates associated with rs10757274 genotype in these UK men are very similar in magnitude to those reported (3), and because of the high frequency, the genotype has a high PAF of 26.2%. The second SNP examined showed similar but marginally smaller effects, but because of the strong LD between the two SNPs examined, their combined effect was not different from the single SNP effect. Because the risk allele showed association with higher plasma levels of apolipoprotein A1 and HDL cholesterol and lower levels of fibrinogen, adjustment for these factors increased the size of the associated risk, demonstrating that the risk effect is independent of these intermediate traits and confirming the earlier reports (1). The risk effect was similar in those with and without a family history of early CHD, and therefore was independent of this risk (10).
Although a genotype may be strongly and robustly associated with CHD risk, if it has this effect through influencing the level of a CRF that is already included in the CHD risk-score algorithm (such as cholesterol, triglycerides, or blood pressure), it is unlikely to add significantly to the overall ability of the algorithm in risk prediction (11). The corollary of this is that genotypes whose risk mechanism is not working through the CRFs already included are more likely to have clinical utility, and this is the case with rs10757274 and rs2383206. Perhaps surprisingly, therefore, given its strong effect, addition of the rs10757274 genotype to the Framingham CRF risk variables did not add significantly to risk prediction in this group of middle-aged healthy men, with only a 3% improvement in the AROC value. However, it is now increasingly recognized that prediction can be improved significantly only by the inclusion of factors that both are common and have very large effects (12), and that a single genotype (or biomarker) associated with odds ratios in the region of 1.2–1.6 will not on its own significantly improve risk prediction for polygenic multifactorial CHD. The availability of several SNPs in combination has clear potential (13)(14), however. The addition of only one other model SNP (of similar allele frequency and effect as rs10757274) did significantly improve prediction, and the addition of a second or third SNP improved the AROC value by 8.4% and 13.3%, respectively; addition of each subsequent SNP had smaller incremental effects on improving the value. Were 10 such SNPs available, the predicted improvement would be 23%. This clearly demonstrates that genetic information has the potential to improve the overall ability to predict risk in the general population, with as few as 3 gene variants providing information that will have clinical utility. In the recent combined genome-wide association scan data (4), 6 novel genes determining CHD risk were reported with allele HRs in the range of 1.20 to 1.33 and risk allele frequencies between 0.22 and 0.77, strongly suggesting that this model is likely to be achievable in the near future. The addition of even strong risk markers such as CRP to the Framingham algorithm does not improve prediction, because they are highly correlated with other biomarkers already in the algorithm (such as BMI and smoking), so the potential utility of these independent genetic factors is clear.
Although a risk marker may not itself significantly improve overall prediction in the population, it may still have clinical utility in risk stratification for the individual, as recently discussed (15). Genotyping for rs10757274 added significantly to the ability of the Framingham score to discriminate individuals who will suffer cardiac events, with 13.5% of the men moving into more accurate risk categories for their future events after inclusion of genotype. Specifically, it allowed reclassification of CHD risk in 3.3% of intermediate-risk men to the high-risk category, who could then be offered statin therapy to reduce their risk. The clinical utility of such genetic tests would be realized as additional similar risk variants were identified, as has been demonstrated by Samani et al. (4). In the UK, the current National Institute for Clinical Excellence guidelines propose that subjects with a Framingham 10-year CHD risk of >20% (equivalent to a CRF score >23) would qualify for statin therapy, aiming to reduce cholesterol to 4.0 mmol/L under JBS2 guidelines. One hundred sixty-four men (6.1% of the total) were identified as having a score >23 based on their baseline CRFs. Although treating subjects at low Framingham risk score will not be cost-effective because of the low event rate, the majority of CHD events occur in subjects with intermediate-risk scores, since this is the most common group. Using additional and independent factors to identify those in the <23 risk score group who are at high risk will have clinical benefit, and the data from the NPHS-II men supports this. There were 55 men (2.1% of the total group) who would not have qualified for statin treatment based on their CRF score alone (<23) but who would have a score >23 if rs10757274 genotype were added. In this group, individual statin treatment of their baseline LDL cholesterol levels to the JBS2 target would have prevented a similar number of future CHD events as in the 164 high-risk men. This strongly suggests that the use of this single genotype to identify individuals who have intermediate risk based on CRF is likely to have clinical utility.
The reasons that this single SNP did not add significantly to overall AROC risk prediction are partly because the effect is relatively modest (although the CHD risk associated with any single SNP is always likely to be in the range 1.2–1.8 (16)) and partly because the number of individuals carrying the risk genotype is low. When other independent, confirmed CHD risk SNPs are identified, however, a greater proportion of men will be carrying such risk genotypes, and as suggested by the modeling, the clinical utility of adding SNP data to the algorithm will increase significantly. In addition, some individuals will, by chance, carry more than one risk allele (since they are inherited independently), and effects on risk are likely to be additive (17). Because many SNPs can be cheaply determined simultaneously in a single sample (for example, using DNA obtained from a buccal swab), this will not have major cost implications.
The mechanism of the risk association of the chr9.21.3 locus is still unclear. Chromosome 9p21.3 is often deleted in malignant tumors, and attention has therefore focused on the role of the few genes within the region as potential cancer genes; cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4) (CDKN2A) and cyclin-dependent kinase inhibitor 2B (p15, inhibits CDK4) (CDKN2B) encode p16INK4a and p16INK4b inhibitors of CDK4 kinase. Within the CDKN2A locus, an alternative reading frame specifies a protein (ARF) that is structurally unrelated to these p16s and functions as a stabilizer of the tumor suppressor protein p53 (reviewed in (5)). In spite of structural and functional differences, these 3 genes share a common functionality in cell cycle G1 control. Because CDKN2B is expressed in macrophages and is dramatically induced by transforming growth factor β, this gene may have a role in growth inhibition induced by transforming growth factor β (18). Thus the involvement of these genes in senescence (19) and apoptosis, both processes associated with atherosclerosis, suggests a potential CHD mechanism, either in plaque progression or rupture, due to changes in the senescence and apoptosis of endothelial or smooth muscle cells or monocyte-macrophage-foam cells in the lesion. Resequencing the coding regions, intron–exon boundaries, and regulatory regions in these genes, however, did not identify any functional variants (1)(2). This suggests that the functional changes may not be in the CDKN2A-ARF-CDKN2B gene loci themselves but in a regulatory region within this LD block. Recent evidence suggests that the entire locus might be coordinately suppressed by a cis-acting regulatory domain or by members of the Polycomb group of repressor complexes, which recognize histone modifications (20). Comparative sequence homology of 9p21.3 (http://ecrbrowser.dcode.org/) shows that the 58-kb LD block is highly conserved across several species (Fig. 4⇑ ). There is also the possibility that this noncoding DNA is transcribed into RNA with distinct regulatory roles (21), and an antisense RNA to the gene cluster has been identified, lending support to this concept (22).
The only other gene in the region, methylthioadenosine phosphorylase (MTAP), is ubiquitously expressed. It encodes an enzyme responsible for recycling 5′-methylthioadenosine (MTA) to S-adenosylmethionine (23). However, although variants may influence plasma levels of homocysteine and folate, which were measured in NPHS-II (24), there were no significant differences in levels of these factors by either SNP genotype (see Supplementary Table 1 in the online Data Supplement). Understanding the precise molecular mechanisms of this risk effect will not only identify new therapeutic targets but will also improve the accuracy with which these SNPs can be used for risk stratification. Finally, this analysis illustrates the value of studying genetic effects uncovered in case control studies in population-based cohort studies. This type of research is likely to help better understand the value of the exciting new genetic information for the betterment of public health.
Grant/funding Support: NPHS-II was supported by the British Medical Research Council, the US National Institutes of Health (grant NHLBI 33014), and Du Pont Pharma, Wilmington, Delaware. J.A.C., J.P., R.L., F.D., A.D.H., and S.E.H. are supported by the British Heart Foundation (PG/2005/014 and SP/07/007/23671).
Financial Disclosures: None declared.
Acknowledgments: The authors dedicate this paper to the PI of the NPHS-II study, Professor George Miller, who died on August 14, 2006, after a long illness.
1 Data are mean (SD) unless noted otherwise.
2 Geometric mean (approximate SD).
1 Model 1: adjusted for age and general practice.
2 Model 2: adjusted for age, smoking, systolic blood pressure, cholesterol, triglycerides, and BMI.
3 Model 3: adjusted for age, smoking, systolic blood pressure, cholesterol, and calculated baseline HDL.
↵1 Nonstandard abbreviations: MI, myocardial infarction; PAF, population-attributable fraction; LD, linkage disequilibrium; CRF, conventional risk factor; NPHS-II, Northwick Park Heart Study II; LR, likelihood ratio; BIC, Bayes information criterion; HR, hazard ratio.
↵2 Human genes: CDKN2A, cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4); CDKN2B, cyclin-dependent kinase inhibitor 2B (p15, inhibits CDK4); MTAP, methylthioadenosine phosphorylase.
- © 2008 The American Association for Clinical Chemistry