Data sources
Genetic association data on plasma LDL-C, HDL-C and TG concentrations were extracted from GLGC, which released aggregated data (i.e. point estimates and standard errors) for 146,492 East Asian participants and 1,320,016 European participants. Here we used the aggregate results excluding UKB participants. We wish to confirm that participant data have been obtained according to the terms and conditions of the databases where these data have been sourced.
The current study considered cardiometabolic traits with sufficiently large sample-sized GWAS in both ancestries (at least 2000 participants for quantitative traits, and at least 1000 cases for binary traits, Supplementary Data 6–7). When there were multiple eligible GWAS conducted on the same trait and ancestry group, the study with the largest sample size was included.
For individuals of European ancestry, we leveraged data on ALT (n = 474,736) and AST (n = 474,755) and Apo-A1, Apo-B, and Lp(a) from 361,194 UKB participants, SBP, DBP, PP on 757,601 participants32, glucose and HbA1c on 196,991 participants33, CRP (n = 204,402)34, body mass index (BMI, n = 694,649)35, CHD (60,801 case)36, any stroke (110,182 cases) and ischaemic stroke (86,668 cases)37, HF (47,309 cases)38, T2D (80,154 cases)39, CKD (64,164 cases)40, glaucoma (15,655 cases)41 and subarachnoid haemorrhage (5140 cases)42, breast cancer (133,384 cases)43, lung cancer (292,66 cases)44, prostate cancer (46,939 cases)45. Additional outcome data were sourced from a FinnGen and UKB meta-analyses by Sakaue et al.12 on angina (30,025 cases), ventricular arrhythmia (1018 cases), PAD (7114 cases), asthma (38,369 cases), intracerebral haemorrhage (1935 cases), pneumonia (16,887 cases), with COPD (58,559 cases) included from the Global Biobank Meta-analysis Initiative (GBMI)29.
The corresponding outcomes in the East Asian participants were accessed from the Pan-ancestry GWAS of the UK Biobank (Pan-UKB)46 on Apo-A1 (n = 2325), Apo-B (n = 2553) and Lp(a) (n = 2275). Additional cardiometabolic biomarker data were sourced from Biobank Japan (BBJ)12 on SBP, DBP, PP, glucose, HbA1c, CRP, BMI, ALT and AST, for between 71,221 and 150,545 participants (Supplementary Data 6–7). BBJ provided data on CHD (32,512 cases), angina (14,007 cases), PAD (4112 cases), ischaemic stroke (22,664 cases), subarachnoid (1203 cases) and intracerebral (1456 cases) haemorrhage, ventricular arrhythmia (1673 cases), T2D (45,383 cases), CKD (2117 cases), glaucoma (8448 cases), pneumonia (7423 cases), breast cancer (6325 cases), lung cancer (4444 cases), prostate cancer (5672 cases). Finally, the following outcomes were sourced from the East Asian GBMI release: HF (12,665 cases), COPD (19,044 cases), and any stroke (23,345 cases).
Additional outcomes were sourced for the protein quantitative trait loci (pQTL) drug target MR, considering outcomes available only for European populations, this included: non-alcoholic fatty liver disease (NAFLD, 1483 cases)47, small vessel stroke (13,620 cases), large artery stroke (9219 cases), cardioembolic stroke (12,790 cases)37, atrial fibrillation (AF, 60,620 cases)48, and eGFR (n = 1,508,659)49.
Cross-ancestry colocalization of the LDL-C and HDL-C CETP signals
Due to sampling variability as well as linkage disequilibrium (LD), the most significant variant at a given locus may not reflect the causal variant. Colocalization identifies potential shared causal variants between two traits, while accounting for sampling variability and LD50. Due to the larger sample size available in the European GLGC GWAS, rs183130 (16:g.56991363C>T, GRCh37) has been robustly identified as a causal CETP variant for both LDL-C and HDL-C. We leveraged coloc51 to determine whether this European fine-mapped variant was also causal for LDL-C and HDL-C in East Asian participants. We considered genetic variants within a ±50 kb flank of the CETP genomic region and a minor allele frequency (MAF) ≥ 0.01, applying the following posterior probabilities: PP.H1, PP.H2 = 10−4 to detect if at least a single genetic variant was associated with the plasma lipids in Europeans (PP.H1), in East Asians (PP.H2), or with plasma lipids in both populations (PP.H4 = 10−6) at the CETP locus. A posterior probability for a shared genetic signal larger than 0.80 was considered as evidence of colocalization50.
Mendelian randomisation analysis
To proxy the effect of CETP inhibition we capitalised on CETP variants strongly associated with HDL-C in both populations and performed a biomarker weighted drug target MR, by exploring the causal effects of CETP inhibition scaled towards a SD increase in HDL-C. Despite weighting by an intermediate biomarker, the inference of such a “biomarker” drug target MR analysis is on the protein, not on the potential causality of the intermediate biomarker (see mathematical derivations below). The assumptions of MR include: the relevance assumption (the genetic variant is associated with the exposure), the exclusion restriction assumption (the genetic variant is associated with the outcome only through the effect of the exposure) and the exchangeability assumption (there are no common causes of the genetic variant and the exposure or outcome)52.
To identify instruments for CETP inhibition, weighted by HDL-C or CETP plasma concentration, genetic variants within ±50 kb of the CETP gene (Chr 16:56,995,762-57,017,757, GRCh37) were identified, based on an F-statistic of at least 15, MAF ≥ 0.01, and LD-clumping threshold of an r-squared < 0.3 against EUR or EAS reference samples (Supplementary Data 8–9). Depending on the employed GWAS genotyping array and the imputation quality, the exact set of available genetic variants in the CETP gene region CETP variants will differ per outcome GWAS. We therefore selected variant after harmonising and linking the exposure and outcome GWAS, automatically identifying the optimal set of exposure variants without needing to manually identify proxy variants for each individual outcome. Ancestry specific LD reference matrices were generated by selecting a random subset of 5000 unrelated Europeans, and the entire subset of East Asians (n = 2000) from UKB (Supplementary Data 10, 11). The self-defined European and East Asian individuals were assigned to their respective ancestry groups based on principle component analysis, implemented with PC-AiR for the detection of population structure, followed by PC-Relate to account for cryptic relatedness53, as described by Giannakopoulou et al.54.
Residual LD was modelled through generalised least squares52 implementations of the IVW and MR-Egger estimators, where the MR-Egger estimator is more robust to the presence of potential horizontal pleiotropy55. To further minimise the potential influence of horizontal pleiotropy, we excluded variants with a leverage statistic larger than three times the mean, or outlier (chi-square) statistics larger than 10.83, and used the Q-statistic (P value < 0.001) to identify possible remaining violations56. A model selection framework was applied to select the most appropriate estimator between IVW or MR-Egger for each specific exposure-outcome relationship56,57. This model selection framework, originally developed by Gerta Rücker58, utilises the difference in heterogeneity between the IVW Q-statistic and the Egger Q-statistic, preferring the latter model when the difference is larger than 3.84 (i.e., the 97.5% quantile of a Chi-square distribution with 1 degree of freedom). The results were reported as odds ratios (OR) or mean differences (MD) with 95% confidence intervals.
Blauw et al.15 (n = 5672) previously conducted a GWAS on plasma CETP concentration in the European participants of the NEO cohort. As a further sensitivity analysis, we replicated our HDL-C weighted analysis in European participants by selecting variants based on their association with CETP plasma concentration (pQTL), applying the same instrument selection strategy as described above (Supplementary Data 12). Given the absence of East Asian data on CETP concentration, we expanded our analysis to consider eGFR, stroke subtypes (large artery stroke, small vessel stroke, cardioembolic stroke), AF and NAFLD, which were unavailable in sufficiently large numbers in GWAS of East Asian participants.
Interaction test
Potential differences between European and East Asian participants in the drug target MR effects of on-target CETP inhibition were formally tested using interaction tests59. Briefly, an interaction effect represents the difference between the ancestry-specific MR effects, where the standard error of this difference is equal to the square root of the sum of the variance of the ancestry-specific effect estimates. For binary outcomes, where the ancestry-specific effect represents an OR, instead of a difference, the interaction effect was calculated as the ratio between the European and East Asian ancestry-specific OR (i.e., representing a difference on the logarithmic scale).
Multiple testing
The focus of the presented analysis was the evaluation of potential differential effects of CETP inhibition between participants of East Asian and European populations. To guard against multiplicity, interaction tests were evaluated against a corrected alpha of 0.05/32 = 1.6 × 10−3, accounting for the 32 evaluated traits. We did not apply a similar multiple testing corrected alpha for the ancestry specific findings, and instead focussed on associations significant in both ancestries. Focussing on replicated associations resulted in an alpha of 0.052 = 0.0025, and an expected number of false positive results close to zero: 32 × 0.0502 = 0.08.
Inference in a biomarker drug target Mendelian randomisation analysis
As detailed in Schmidt et al.14, Schmidt et al.28, and described next, the inference in biomarker weighted drug target MR is on the drug target itself, not on the downstream biomarker (e.g. HDL-C). Furthermore, the biomarker does not need to cause disease if the drug target affects the disease through alternative pathways (i.e. post-translation horizontal pleiotropy). We now further expand these derivations to show that the biomarker weighted drug target MR will approximate an interaction test of the difference in protein effects, only when the protein effect on the biomarker is equal in both populations. Alternatively, assuming directional concordance of the protein effect on the biomarker, more robust inference will be obtained by applying interaction testing to identify directionally discordant outcome effects.
To show these derivations we encode the data generating model of a drug target MR in Fig. 5. Here, the absence of an arc between the genetic variants \({{{{{\boldsymbol{G}}}}}}\) and the outcome \({{{{{\boldsymbol{D}}}}}}\) ensures there is no pre-translational horizontal pleiotropy, which would otherwise bias the drug target MR effect of the protein \({{{{{\boldsymbol{P}}}}}}\) on the outcome. This protein drug target effect can be referred to as:
$$\omega=\mu \theta+{\phi }_{{{{{{\boldsymbol{P}}}}}}}$$
(1)
which consists of the direct effect \(\mu \theta\) mediated by biomarker \({{{{{\boldsymbol{X}}}}}}\), and the indirect effect \({\phi }_{{{{{{\boldsymbol{P}}}}}}}\) a protein might have through a pathway (or pathways) side-stepping \({{{{{\boldsymbol{X}}}}}}\). Depending on the application, there might be multiple intermediate biomarkers, resulting in a straightforward expansion of the Eq. 1.
Fig. 5: Drug target Mendelian randomisation pathways.
Nodes are presented in bold face, with G representing a genetic variant, \({{{{{\boldsymbol{P}}}}}}\) a protein drug target, \({{{{{\boldsymbol{X}}}}}}\) a biomarker, \({{{{{\boldsymbol{D}}}}}}\) the outcome, and \({{{{{\boldsymbol{U}}}}}}\) (potentially unmeasured) common causes of both \({{{{{\boldsymbol{P}}}}}}\), \({{{{{\boldsymbol{X}}}}}}\), \({{{{{\boldsymbol{D}}}}}}\). Labelled paths represent the effect magnitudes between nodes.
Because there are confounding factors \({{{{{\boldsymbol{U}}}}}}{{{{{\boldsymbol{,}}}}}}\) which are a common cause for both \({{{{{\boldsymbol{P}}}}}}\) and \({{{{{\boldsymbol{D}}}}}}\), simply regressing \({{{{{\boldsymbol{D}}}}}}\) on \({{{{{\boldsymbol{P}}}}}}\) is not expected to provide an unbiased estimate of \(\omega\). Instead, given that the genetic effects on the outcome and the protein are unaffected by confounders, MR can be employed, where the fraction of the genetic effect on the outcome by the genetic effect on the protein results in the intended estimate:
$$\omega =\frac{\widetilde{\delta }\left(\mu \theta+{\phi }_{{{{{{\boldsymbol{P}}}}}}}\right)}{\widetilde{\delta }}\\ =\mu \theta+{\phi }_{{{{{{\boldsymbol{P}}}}}}}$$
While there is a growing resource on genetic protein associations, sufficient information on \(\widetilde{\delta }\) might not always be available. Instead, in some cases there might be more information and data on the genetic effect on a non-protein biomarker (e.g. lipids), which is known to be affected by the protein (\(\mu\)). In these cases, a biomarker weighted (bw) drug target MR analysis can be calculated, by replacing \(\widetilde{\delta }\) with the genetic association on the biomarker:
$${\omega }_{{bw}}=\frac{\widetilde{\delta }\left(\mu \theta+{\phi }_{{{{{{\boldsymbol{P}}}}}}}\right)}{\widetilde {\delta } \mu }=\frac{1}{\mu }\omega$$
Clearly, \({\omega }_{{bw}}\) is a bias estimand of \(\omega\), however assuming sufficient detail is available on the sign of \(\mu\), that is information on whether the protein increases or decreases the biomarker concentration, \({\omega }_{{bw}}\) can provide key information on the anticipated effect direction of \(\omega\). Furthermore, given that \({\omega }_{{bw}}=0\,\iff \omega=0,\) a biomarker weighted drug target MR provides a valid null-hypothesis test of \(\omega,\) irrespective of the amount of bias due to \(\frac{1}{\mu }\).
Given two distinct populations, European and East Asian, one might be interested in determining to what extent there is a difference in the drug-target effect on the same outcome. In the presence of genetic information on the protein expression in both populations, this can be estimated through a drug target MR:
$${\omega }_{j}-{\omega }_{k}=\left({\mu }_{j}{\theta }_{j}+{{\phi }_{{{{{{\boldsymbol{P}}}}}}}}_{j}\right)-\left({\mu }_{k}{\theta }_{k}+{\phi }_{{{{{{{\boldsymbol{P}}}}}}}_{k}}\right)$$
(2)
here \(j\) and \(k\) represent effects from Fig. 5 for two non-overlapping subgroups, such as European and East Asian participants, respectively. An interaction test for \({\omega }_{j}-{\omega }_{k} \, \ne \, 0\) would provide evidence for a difference.
In the absence of information on protein expression in both populations, one could consider conducting a biomarker weighted drug target MR in both populations to determine the difference in effects between two populations. Given that a biomarker weighted MR provides a biased estimate of \(\omega\), one needs to additionally assume that the amount of bias in both populations is equal, specifically to assume that \(\mu_j=\mu_k\). To see this, let us assume there is no difference between the protein effect on the outcome in both populations, which is:
$${\omega }_{j}-{\omega }_{k}=0$$
Furthermore, if we assume (as implicitly above) \(\omega_j,\omega_k \,\ne\, 0\), then the biomarker weighted drug target analysis becomes:
$${\omega }_{b{w}_{j}}-{{\omega }_{{bw}}}_{k}=\frac{{\mu }_{j}{\omega }_{j}-{\mu }_{k}{\omega }_{k}}{{\mu }_{j}{\mu }_{k}}$$
Clearly, this can only equal zero when \({\mu }_{j}={\mu }_{k}\).
Biomarker weighted drug target MR can be used to obtain a valid null-hypothesis of \({\omega }_{j}-{\omega }_{k} \, \ne \, 0\), if we assume that the protein effect on the downstream biomarker is equal in both populations. In the absence of an exact agreement between \({\mu }_{j}\) and \({\mu }_{k}\), the false positive (i.e. type 1 error) rate of the interaction tests will be inflated proportional to the difference \({\mu }_{j}-{\mu }_{k}\). Depending on the application, \({\mu }_{j}={\mu }_{k}\) might be too strong an assumption to make. Instead, if we are more comfortable assuming the sign of \({\mu }_{j}\) and \({\mu }_{k}\) is the same (i.e. that a unit increase in the protein does not increase the biomarker in one population, while decreasing in the second), more robust interaction tests can be obtained by focussing on directional discordance between populations. Therefore, focussing on direction of effects might offer a more robust interpretation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.