Genetic Architecture of Vitamin B and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets
Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations.
Published in the journal:
. PLoS Genet 9(6): e32767. doi:10.1371/journal.pgen.1003530
Category:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1003530
Summary
Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations.
Introduction
One-carbon metabolism (OCM) is a process whereby folate transfers one-carbon groups in a range of biological processes including DNA synthesis, methylation and homocysteine metabolism [1], [2]. The water-soluble B vitamins, vitamin B12 (B12) and folate play key roles as enzyme cofactors or substrates in OCM. Individuals with deficiencies in these vitamins can develop anemia and, in the case of B12 deficiency, serious neurological problems. In adults, epidemiological studies have also suggested that subclinical B12 or folate deficiencies are associated with increased risk of cardiovascular disease [3], [4], different cancers [5], [6] and neurodegenerative disease such as Alzheimer's disease [7]. Serum levels of B12 and folate are in addition to nutrition influenced by several biological processes including absorption, transportation and cellular uptake, as well as processing of precursors into active molecules. Heritability, utilizing di- and monozygotic twins, is estimated to be 59% and 56% for B12 and folate levels, respectively, indicating that there is a substantial genetic component to the population diversity in these physiological variables [8]. Identification of sequence variants that affect circulating levels of B12 and folate can thus give insights into the interplay of diet, genetics and human health. Genome-wide association studies (GWAS) have yielded some sequence variants influencing B12 levels [9]–[12], but have been less successful in identifying variants affecting folate levels [10], [11]. Thus, genome-wide significant associations with serum B12 levels have been convincingly reported for four loci, FUT2, MUT, CUBN and TCN1 in European populations [9]–[11] and additional four loci, MS4A3, CLYBL, FUT6 and 5q32 in a Chinese population [12]. No genome-wide significant GWAS associations have been reported for serum folate levels, however, significant association with the MTHFR A222V variant was demonstrated prior to the GWAS era [13], [14] and suggestive associations have been reported in European populations for two loci (FIGN and PRICKLE2) [10], [11].
The classic GWAS applied commercial chip-based genotyping and imputation of HapMap variants of which a majority were common single nucleotide variants (SNVs) with very few rare variants with minor allele frequency (MAF) <1% [15], [16]. However, the search for the truly associated functional variants and the targeted gene at each locus has been hindered by the lack of coverage of the full spectrum of the sequence variation of the human genome. Recently, focus has turned to the use of next generation sequencing of whole genomes (WGS) [17], exomes (WES) [18] or specific targets [19], all contributing to a better understanding of the spectrum of allelic variations in the human genome. We expect that attempts to directly cover low frequency and rare sequence variants through next generation sequencing, in addition to the common variants, will improve the search for functional variants and thus the understanding of the underlying biology of human traits and diseases.
Here we aimed to identify and characterize associations of SNVs across the allele frequency spectrum with serum levels of B12 and folate by compiling data in up to 45,576 individuals based on sequencing initiatives in Iceland and Denmark. For the first time we apply next generation sequence data to identify sequence variants affecting serum levels of B12 and folate and the present datasets are the largest utilized to date for the analysis of these traits.
Results
Heritability of serum B12 and folate levels
We estimated the heritability of B12 and folate serum levels based on 38,229 and 21,708 Icelandic sibling pairs, respectively. Our analysis revealed estimates of 27% for B12 and 17% for folate which are lower than previously reported [8].
Experimental design
To search for sequence variants affecting serum B12 and folate levels we compiled data from two sequencing initiatives in Iceland and Denmark. In Iceland, a large population-based resource has been generated applying WGS and highly accurate imputation of the sequence information into a large fraction of the population [20], [21]. Utilizing this resource many low frequency and rare causative sequence variants have recently been discovered that affect the risk of common diseases [22]–[26]. In the Danish samples, WES was used to search for low frequency variation associated with complex traits [27], [28]. The outline of the present study is depicted in Figure 1. In the Icelandic study sample, 1,176 individuals were whole genome sequenced to an average depth of >10× and 22.9 million SNVs were identified. These variants were then imputed into 25,960 and 20,717 chip-genotyped Icelanders with serum B12 and folate measurement, respectively, using highly accurate long-range phasing based imputation [20]. The Icelandic genealogical database allowed for further propagation of the sequence information, applying genealogy based imputation, into 11,323 and 8,196 relatives of the chip-genotyped individuals, for a total sample size of 37,283 and 28,913, respectively, for the two phenotypes [25] (Text S1 and Table S1). In the Danish part of the study whole exomes of 2,000 Danes were sequenced to an average sequencing depth of 8× [28]. From that effort, 16,192 coding SNVs with allelic frequency above 1% were selected for Illumina iSelect genotyping in two Danish population-based cohorts of 8,293 individuals with measurements of serum B12 and 8,428 individuals with measurement of serum folate (Table S2). Of the 16,192 SNVs, 15,994 overlapped with the Icelandic variants.
A generalized form of linear regression was used to test for association of serum levels of B12 or folate with SNVs, taking into account relatedness and population stratification within each sample set, applying the method of genomic control (GC). Analyses were performed in three steps; sequence variants were analyzed in the Icelandic and Danish samples separately, then by combining in a meta-analysis the overlapping sequence variants identified in both study samples. Loci that associated significantly with B12 or folate levels from these studies were fine mapped using the Icelandic WGS data imputed into chip genotyped individuals and the same data set was used to identify additional signals at each of these loci trough conditional analysis. Finally, the full Icelandic data of 22.9 million SNVs were used in GWAS to identify additional loci represented by non-coding variants or rare coding signals not genotyped in the Danish design. Genome-wide significance (GWS) level in the study was set at P<2.2×10−9, based on Bonferroni correction for the 22.9 million SNVs (Figure 1).
Discovery analyses for serum B12 and folate
In the separate and combined analyses of SNVs with serum B12 and serum folate levels in the Icelandic and Danish data, a total of 13 genetic loci were found to associate at GWS, P<2.2×10−9 (Table 1 and 2, Figure S1 and S2). Of the 11 loci associated with serum B12, five (CD320, TCN2, ABCD4, MMAA and MMACHC) were novel and six were previously reported either in populations of European or East-Asian ancestry [9]–[12] (Table 1). Association analyses with serum folate yielded one novel locus (FOLR3) and confirmed the reported MTHFR locus (Table 2).
Since only coding variants were in the combined analysis we used the Icelandic WGS-based data to screen for stronger non-coding signals at the loci identified in meta-analysis of coding variants. Interestingly, the strongest signal at 10 of the 11 B12-associated loci in the Icelandic data corresponded to missense (n = 9) or nonsense (n = 1) mutations with only the FUT6 locus having a stronger non-coding signal (rs708686) than the missense P124S mutation (Table S3). As only SNVs had been called from the WGS data and imputed into the Icelandic samples we reassessed each of the 13 B12 and folate loci with INDEL data called using the GATK algorithm (http://www.broadinstitute.org/gatk/). None of the INDELs detected at the 11 B12 loci associated more strongly than the lead SNVs. However, when reassessing each of the two folate-associated loci we detected a two nucleotide insertion (rs139130389, NM_000804:exon3:c.318_319insTA) encoding a common (MAF 10.0%) frameshift mutation in exon 3 of FOLR3, that associated more strongly with folate levels than the intronic SNV rs652197 identified in the initial scan (rs139130389: P = 2.45×10−12; effect = 0.087 SD, Table 2). The insertion and rs652197 are in linkage disequilibrium (LD) in the Icelandic sequencing data (r2 = 0.51). Upon further inspection, we found that the ancestral sequence contained the insertion indicating the occurrence of a two base deletion in humans. The deletion with an allelic frequency of 90% in Iceland creates a premature stop codon at amino acid position 107 compared to the full-length protein consisting of 245 amino acids. Coding variants are thus lead signal of both folate loci (FOLR3 and MTHFR).
The lead SNVs included both rare, low frequency and common variants with MAFs ranging from 0.2% to 48% (Table 1 and 2). Of the six novel loci, four contained a lead variant with MAF below 6% with the rare missense rs12272669 variant (MAF 0.22%) in MMACHC that associates with B12 found in the Icelandic data being at the extreme (Table 1). This variant has been observed in other populations than the Icelandic, albeit at much lower frequency (MAF 0.02%) (Exome Variant Server, http://evs.gs.washington.edu/EVS/). For TCN1 and FUT6 previously reported to associate with serum B12 levels we confirmed the association, yet with different SNVs than reported. At the TCN1 locus the strongest associated SNV in the Icelandic data was rs34324219 (Table 1) encoding a D301Y missense mutation, whereas the reported [10], [11] and correlated (r2 = 0.28) non-coding rs526934 was more weakly associated (Table S4). At the FUT6 locus, the P124S missense mutation (rs778805) identified in the combined analysis of Icelandic and Danish data associated more strongly (Table 1) than the previously reported promoter rs3760776 variant (Table S4). For the remaining four reported B12-associated loci, MUT, FUT2, CUBN and CLYBL, we confirmed the association signal [9]–[11] (Table 1). At the MTHFR locus the strongest folate association was for the major allele of the common A222V (rs1801133) for which previous association with serum folate has been reported [10], [13], [14] (Table 2).
For the two loci reported to associate with B12 levels in individuals of East-Asian ancestry (MSRA and 5q32) the variant was either not present in the Icelandic data or at very low frequency (Table S4) whereas the reported non-coding folate signals at FIGN and PRICKLE2 loci did not replicate in the Icelandic folate data (Table S5).
At a less stringent significance level of P<1×10−6 we found three additional loci, CPS1, SPACA1 and ZBTB10 with suggestive associations with serum B12 levels (Table S6) while suggestive association with folate levels at P<1×10−6 was found for eight additional loci (Table S7).
Analyses conditional on the identified associated sequence variants
For the 13 loci associated with serum B12 or folate levels we performed stepwise conditional analyses to search for secondary signals applying Icelandic WGS data imputed into the 25,960 and 20,717 chip-genotyped Icelanders with serum B12 and folate information. We detected additional signals at five loci, CUBN, TCN1, TCN2, FUT6 and MTHFR (Figure 2). For the serum B12-associated loci, secondary independent association signals at P<5×10−8 were detected at three, CUBN, TCN1 and TCN2 (Figure 2, Table 3, Table S8), while the secondary independent signal at FUT6 (observed for the reported B12-associated rs3760776 upstream of FUT6 [12]) did not reach the threshold of significance (P = 4.4×10−6). The secondary signal at the CUBN locus was shown for a group of correlated markers represented by rs56077122 (located in an intron of the neighboring TRDMT1) (Figure 2). In TCN1 two additional independent signals at P<5×10−8 for serum B12 were found including a missense variant (R35H) and an intergenic variant whereas one secondary signal in the TCN2 locus, represented by rs5753231, was located immediately 5′ to TCN2 (Figure 2, Table 3). In the folate-associated loci, a secondary independent signal was found at the MTHFR locus represented by rs17421511 located in intron 4 of the MTHFR gene (Figure 2, Table 3). In contrast to the lead SNVs a large fraction of the secondary B12 or folate signals were non-coding.
Of the identified variants (lead and secondary) the fraction of variance in serum B12 or folate levels explained is estimated to be 6.3% for B12 and 1.0% for folate (Text S1).
Mapping effects of associated sequence variants on gene expression
To determine whether any of the lead or secondary association signals at the B12 or folate loci affect the expression of the target gene we analyzed genome-wide expression QTL (eQTL) data from white blood cells (n = 1,001) and adipose tissue (n = 673) from Icelanders with information on 22.9 million SNVs [29]. Of the lead and secondary B12 or folate signals that are coding (Tables 1–3) two showed strong association with the expression of the target gene; the R532H missense variant in MUT (P = 9.1×10−59 in white blood cells and P = 2.5×10−16 in adipose tissue) and the frameshift INDEL in FOLR3 (P = 7.1×10−110 in white blood cells and P = 2.5×10−62 in adipose tissue; Table S9). Of all the cis variants at the MUT locus the R532H missense mutation had by far the strongest effect on MUT expression indicating that this effect is not mediated by a non-coding regulatory variant in LD with the R532H mutation. The large effect of the frameshift mutation on FOLR3 expression is likely caused by nonsense-mediated decay of transcripts containing the premature termination mutation [30]. A similar effect was not seen for the nonsense mutation in the CLYBL gene which can likely be explained by the closeness of the mutation to the N-terminal of the CLYBL protein (amino acid 259 of 340) (Table S9). Of the non-coding lead or secondary B12 or folate signals a statistically significant effect on expression was only seen for the TCN2 promoter variant, however, other markers in the region, that had no effect on serum B12 levels associated more strongly with TCN2 expression. Although lack of appropriate tissue to evaluate the effect of the B12 and folate mutations on expression cannot be excluded, these data suggest that except for the MUT gene the effects of both the coding and non-coding mutations are unlikely to be through expression.
Association of identified sequence variants with other traits linked to B12 and folate levels
Rare mutations in some of the B12 genes described here i.e. MMACHC, MMAA, MUT, CD320, TCN2 and CUBN have been described in connection with rare conditions of methylmalonic aciduria and megaloblastic anemia that all relate to defects in B12 metabolism (OMIM database, http://www.ncbi.nlm.nih.gov/omim/). In addition, epidemiological studies have suggested a link between reduced B12 and folate levels and the risk of common conditions such as cardiovascular diseases [3], [4], cancers [5], [6] and neurodegenerative disorders [7]. To evaluate the effect of the B12 or folate variants on these conditions we analyzed the association with coronary artery disease (CAD), stroke, colon cancer, prostate cancer and Alzheimer's disease in data obtained from deCODE's phenotype database. As outlined in Table S10, variants associated with serum B12 or folate levels did not consistently affect the risk of the diseases tested; the B12 or folate increasing allele for some variants was weakly protective and for others weakly at risk, and only two loci (CUBN associated with CAD and MTHFR with stroke) were statistically significant (P<0.0018) but with opposite effects on these diseases. B12 or folate deficiencies can lead to increased serum homocysteine [2], yet of all the B12 or folate loci tested only two associated significantly with homocysteine levels, with the B12 or folate increasing allele decreasing the homocyteine levels as expected (Table S10). These loci were the folate-associated MTHFR variant previously reported to associate with homocysteine [10], [31], [32] and the B12-associated variant at the MUT locus. Neither of these loci associated with cardiovascular disease or Alzheimer's disease, despite increased homocysteine has been suggested to increase the risk of these diseases. Deficiency of B12 or folate is associated with megaloblastic anemia characterized by the presence of abnormally large red blood cells, increased mean corpuscular volume (MCV) and increased mean corpuscular hemoglobin (MCH). None of the identified variants associated significantly with MCV and MCH (Table S10). We also tested the recessive model for the B12 or folate variants in relation to these conditions, but did not detect any new associations. Inconsistency in the direction of the effect of each of the variants on these conditions (increased or decreased risk) (Table S10) indicates that for a given condition the combined effect of all the variants would be consistent with lack of association. The absence of observed directional consistent effects of the B12 and folate variants on the phenotypes tested suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions, likely reflecting that B12 and folate levels have weak effects on these conditions. However, we recognize that for some of the conditions analyzed sample sizes are too small to detect weak effects, calling for cautious interpretation.
Evaluation of pleiotropic effects of the identified variants
One of the B12-associated loci, FUT2, has previously been associated with reduction in liver enzymes including alkaline phosphatase (ALP) [33] and cholesterol levels [34], increased risk of Crohn's disease [35], [36], psoriasis [37], retinal vascular caliber [38] and type 1 diabetes [39] and protection against Norovirus infection [40]. These associations can be explained by the function of FUT2 in cell surface glycobiology as determinant of the Lewis antigen blood group. To evaluate pleiotropic effects of the identified B12 and folate variants, we screened the deCODE phenotype database, which contains information on the majority of common diseases and their associated risk factors (n = 400), applying both multiplicative and recessive genetic models (P = 3.5×10−6 after Bonferroni correction). We found that the FUT2 variant associated strongly with serum levels of ALP (P = 1.1×10−73) and also with psoriasis (P = 4.3×10−3) as previously reported. We also detected a strong association with serum levels of cancer antigen 19-9 (P = 1.1×10−146), lipase (P = 2.2×10−24) and suggestive association with bone mineral density (BMD) (P = 1.3×10−5) with the B12-increasing allele decreasing ALP levels, increasing the serum levels of the cancer antigen 19-9 and lipase and increasing the risk of developing low BMD (osteoporosis) (Table S11). An increase in serum lipase is associated with Crohn's disease [41], but the causal link is unclear. The increased risk for low BMD observed for the FUT2 variant may be secondary to reduced ALP activity that might be a reflection of reduced bone remodeling. When applying the recessive model to the B12 and folate variants we found suggestive associations of the FUT6 variant with abdominal aortic aneurysm (AAA) and of the folate-associated variant in MTHFR with thoracic aortic aneurysm (TA). In both cases the effect of the B12- or folate-increasing allele was protective (Table S11). These associations could be mediated through the effect of these variants on B12 and folate levels as reduced levels of B12 and folate have been linked to the development of aortic aneurysm [42].
Discussion
Here we performed association analyses of up to 22.9 million SNVs, identified through WGS and WES, in up to 45,576 individuals to identify and characterize genetic variation influencing population diversity in serum levels of B12 and folate. We discovered five novel loci that associate with serum B12 levels and one novel locus for folate levels and replicated the six reported B12 loci and one folate locus. In addition, we identified five novel secondary independent signals at both the new and previously reported loci. The fraction of variance in serum B12 or folate levels explained by the identified variants is estimated to be 6.3% for B12 and 1.0% for folate (Text S1). Of the identified SNVs, both common and rare, we find that a large fraction (13 of 18) is represented by coding variants which is an unusually high fraction of coding variants compared to previous GWAS for other traits. Furthermore, of the 13 loci that associate with serum B12 and folate levels the genes at 11 of them can be directly linked to the current understanding of B12 and folate metabolism such as absorption, transport or enzymatic processes and one (FUT6) has potential links with these processes (Figure 3). Only CLYBL has a function that cannot be directly related to these pathways. Specifically, eight loci are involved in transporting B12 and folate between different tissues, four of them TCN1, FUT2, FUT6 and TCN2 as co-factors or regulators of co-factors necessary for the transport and the other four, CUBN, CD320, ABCD4 [43] and FOLR3 as membrane transporters actively facilitating membrane crossing. MUT and MTHFR catalyze enzymatic reactions in the OCM where MMACHC and MMAA are involved in co-enzymatic processes (Figure 3). Moreover, we note that of the 13 genes, two (TCN2 and CD320) are known and two (MUT and MMAA) are suggested to interact in vivo [44] (Figure 3). Together with the high fraction of coding mutations these data indicate that the target genes at all of the loci have been identified.
By screening the deCODE database for pleiotropic effects of the B12 and folate variants we replicated some of the previous associations of the FUT2 gene and detected novel suggestive association with increased risk of osteoporosis (low BMD) potentially mediated through diminished bone remodeling as a consequence of reduced ALP activity. We also detected suggestive associations of the FUT6 and the MTHFR variants with AAA and TA, respectively. However, we did not demonstrate association of any of the variants with the cardiovascular diseases, CAD and stroke, colorectal cancer, prostate cancer or Alzheimer's disease and only two of the variants associated with homocysteine levels. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions.
Materials and Methods
Ethics statement
All participants gave written informed consent. The studies were conducted in accordance with the Declaration of Helsinki II and were approved by the local Ethical Committees (approval numbers Denmark: H-3-2012-155, KA 98155 and KA-20060011, DeCode 08-105-V3-S1 (issued 30.08.2011) ref. VSNb2008060006/03.1).
Study participants in Iceland
For the Icelandic samples, serum B12 and folate levels were assessed in blood samples from Icelanders at the Landspitali University Hospital Laboratory or at the Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM), between the years 1990 and 2011. B12 and folate levels were normalized to a standard normal distribution using quantile normalization and then adjusted for sex, year of birth and age at measurement. For individuals for which more than one measurement was available we used the average of the normalized value.
Study participants in Denmark
The Danish data were generated in two population-based study samples recruited in Copenhagen. The Inter99 cohort is a randomized, non-pharmacological intervention study for the prevention of ischaemic heart disease, conducted on 6,784 randomly ascertained participants aged 30 to 60 years at the Research Centre for Prevention and Health in Glostrup, Denmark [45] (ClinicalTrials.gov: NCT00289237). Detailed characteristics of Inter99 have been published previously [45]–[47]. The Inter99 cohort included 5,481 and 5,624 individuals with genotypes and measurement of serum B12 and folate, respectively. Health2006 is a population-based epidemiological study of general health, diabetes and cardiovascular disease of 3,471 individuals aged 18–74 years [48]. Health2006 was also conducted at the Research Centre for Prevention and Health in Glostrup, Denmark. The Health2006 cohort included 2,812 and 2,804 individuals with valid genotypes and measurement of serum B12 and folate, respectively. In Inter99 serum B12 and folate were measured by a competitive chemiluminescent enzyme immunoassay (Immulite 2000 System; Siemens Medical Solutions Diagnostics, Los Angeles, CA, USA) as previously reported [14]. In Health2006, serum B12 and folate were measured by chemiluminescent immunoassay (Dimension Vista platform, Siemens Healthcare Diagnostics GmbH, Eschborn, Germany).
Genotype data generation
In the Icelandic part, SNVs were identified through the Icelandic WGS project. A total of 1,176 Icelanders were selected for sequencing based on having various neoplasic, cardiovascular and psychiatric conditions. All of the individuals were sequenced to a depth of at least 10×. The generation of genotypic data in Iceland is detailed in earlier reports [23] and in Text S1, and consisted of the following steps: SNV calling and genotyping in WGS, long range phasing, genotype imputation and in silico genotyping.
In the Danish part of the study 16,192 SNVs for genotyping were selected from a WES study of 2,000 individuals [28]. In brief, exon capture and Illumina sequencing to a depth of 8× were performed in 2,000 Danes by methods previously described [27]. The exome was captured by a NimbleGen 2.1M HD array with a target region of 34.1 Mb including 18,954 genes defined by CCDS (Consensus Coding Sequence database). The average number of reads sequenced for each individual was 22.3 million with most reads being 30 to 80 bases long. After alignment to the human reference genome (assembly hg18, NCBI build 36.3) and stringent quality assurance, including uniqueness of genomic mapping and Q-score >20, the median coverage per individual was 91% of the target region and had an average depth of 8× (96% coverage and 11× depth before filtering). After applying quality criteria 70,182 SNVs with an estimated MAF above 1% based on the reads using maximum likelihood were identified [49]. The details of the WES have been described previously [28]. 20,005 SNVs were, as part of a published study, selected from the exome sequencing for genotyping in 16,888 samples by a custom-designed Illumina iSelect array. First, 18,358 SNVs annotated to the most likely deleterious categories (179 nonsense, 15,789 nonsynonymous, 219 located in splice sites and 2,171 in untranslated regions) were prioritized. Second, 1,048 SNVs nominally associated with type 2 diabetes (P<0.05) in a sequencing-based association study were selected. Finally, we selected 599 synonymous variants in 192 loci previously associated with common metabolic traits at GWS. Genotype data was obtained for 18,744 SNVs. Quality control of samples included removing closely related individuals, individuals with an extreme inbreeding coefficient, individuals with a low call rate, individuals with a mislabeled sex and individuals with a high discordance rate to previously genotyped SNVs. 15,989 individuals passed all quality control criteria. The SNVs were filtered based on their MAF (>0.5%), genotype call rate (>95%), Hardy-Weinberg equilibrium (P>10−7) or cross-hybridization with the X-chromosome. 16,192 SNVs passed all filters [28]. Genotyping of FOLR3 rs652197 in Danish samples was done by KASPar SNP Genotyping System (KBioscience, Hoddesdon, UK).
Statistical analyses
Icelandic analyses and quantitative trait association testing
A generalized form of linear regression was used to test for association of serum B12 and folate with SNVs. Let be the vector of quantitative measurements, and let be the vector of expected allele counts for the SNV being tested. We assume the quantitative measurements follow a normal distribution with a mean that depends linearly on the expected allele at the SNV and a variance covariance matrix proportional to the kinship matrix:
where
is based on the kinship between individuals as estimated from the Icelandic genealogical database and estimate of the heritability of the trait . It is not computationally feasible to use this full model and we therefore split the individuals with in silico genotypes and serum B12 and folate measurements into smaller clusters. Here we chose to restrict the cluster size to at most 300 individuals.
The maximum likelihood estimates for the parameters , , and involve inverting the kinship matrix. If there are individuals in the cluster, then this inversion requires calculations, but since these calculations only need to be performed once the computational cost of doing a GWAS will only be calculations; the cost of calculating the maximum likelihood estimates if the kinship matrix has already been inverted.
Multivariate regression and conditional analyses
For the multivariate regression analysis we only used Icelandic individuals which have been genotyped using the Illumina chip-genotyping platform. The multivariate linear regression analysis was performed conditioning for a given marker by adjusting for the estimated allele count based on imputation of this marker. The GC correction factor was the same as used for the unadjusted association analysis. A forward selection multiple logistic regression model was used to further define the extent of the genetic association. Briefly, all imputed SNVs located within an interval around the lead SNVs were tested for possible incorporation into a multiple regression model. In a stepwise fashion, a SNV was added to the model if it had the smallest P-value among all SNVs not yet included in the model and if it had a P<5×10−8. In the last step none of the SNVs remained significant at this threshold.
Association analyses of serum B12 and folate in Danish samples
Association analysis of each SNV in the Danish data was performed using linear regression assuming an additive model. Principal component analysis was performed using the covariance matrix and the first principal component and sex were included in the model as covariates. All quantitative traits were quantile normalized to a normal distribution prior to analysis. Association analyses were done using PLINK software (version 1.07, http://pngu.mgh.harvard.edu/purcell/plink/). All P-values were corrected by GC. Inflation factors (λ) were at acceptable levels: B12: Inter99: 1.027, Health2006: 1.014 and folate: Inter99: 1.024, Health2006: 1.010.
Meta-analyses
For all SNVs with data from more than one study sample (Icelandic, Inter99 and/or Health2006) we performed meta-analyses of summary association data where we estimated the combined effect in a fixed-effects meta-analysis using the METAL software (http://www.sph.umich.edu/csg/abecasis/Metal/) [50]. An overall z-statistic relative to each reference allele was estimated based on P-value and direction of effect adjusted for the number of individuals in each sample.
Supporting Information
Zdroje
1. ScottJM (1999) Folate and vitamin B12. Proc Nutr Soc 58: 441–448.
2. MarkleHV, GreenwayDC (1996) Cobalamin. Crit Rev Clin Lab Sci 33: 247–356.
3. ClarkeR, DalyL, RobinsonK, NaughtenE, CahalaneS, et al. (1991) Hyperhomocysteinemia: An Independent Risk Factor for Vascular Disease. N Engl J Med 324: 1149–1155.
4. StampferMJ, MalinowMR, WillettWC, NewcomerLM, UpsonB, et al. (1992) A prospective study of plasma homocyst(e)ine and risk of myocardial infarction in US physicians. JAMA 268: 877–881.
5. BirdCL, SwendseidME, WitteJS, ShikanyJM, HuntIF, et al. (1995) Red cell and plasma folate, folate consumption, and the risk of colorectal adenomatous polyps. Cancer Epidemiol Biomarkers Prev 4: 709–714.
6. GiovannucciE, StampferMJ, ColditzGA, RimmEB, TrichopoulosD, et al. (1993) Folate, Methionine, and Alcohol Intake and Risk of Colorectal Adenoma. J Natl Cancer Inst 85: 875–883.
7. ClarkeR (1998) Folate, vitamin B12, and serum total homocysteine levels in confirmed Alzheimer disease. Arch Neurol 55: 1449–1455.
8. NilssonSE, ReadS, BergS, JohanssonB (2009) Heritabilities for fifteen routine biochemical values: findings in 215 Swedish twin pairs 82 years of age or older. Scand J Clin Lab Invest 69: 562–569.
9. HazraA, KraftP, SelhubJ, GiovannucciEL, ThomasG, et al. (2008) Common variants of FUT2 are associated with plasma vitamin B12 levels. Nat Genet 40: 1160–1162.
10. HazraA, KraftP, LazarusR, ChenC, ChanockSJ, et al. (2009) Genome-wide significant predictors of metabolites in the one-carbon metabolism pathway. Hum Mol Genet 18: 4677–4687.
11. TanakaT, ScheetP, GiustiB, BandinelliS, PirasMG, et al. (2009) Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am J Hum Genet 84: 477–482.
12. LinX, LuD, GaoY, TaoS, YangX, et al. (2012) Genome-wide association study identifies novel loci associated with serum level of vitamin B12 in Chinese men. Hum Mol Genet 21: 2610–2617.
13. HustadS, MidttunO, SchneedeJ, VollsetSE, GrotmolT, et al. (2007) The methylenetetrahydrofolate reductase 677C→T polymorphism as a modulator of a B vitamin network with major effects on homocysteine metabolism. Am J Hum Genet 80: 846–855.
14. ThuesenBH, HusemoenLL, OvesenL, JorgensenT, FengerM, et al. (2010) Lifestyle and genetic determinants of folate and vitamin B12 levels in a general adult population. Br J Nutr 103: 1195–1204.
15. HindorffLA, SethupathyP, JunkinsHA, RamosEM, MehtaJP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106: 9362–9367.
16. International HapMap Consortium (2007) FrazerKA, BallingerDG, CoxDR, HindsDA, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.
17. Genomes Project Consortium (2010) DurbinRM, AbecasisGR, AltshulerDL, AutonA, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
18. TennessenJA, BighamAW, O'ConnorTD, FuW, KennyEE, et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69.
19. NelsonMR, WegmannD, EhmMG, KessnerD, StJP, et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104.
20. KongA, MassonG, FriggeML, GylfasonA, ZusmanovichP, et al. (2008) Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet 40: 1068–1075.
21. KongA, SteinthorsdottirV, MassonG, ThorleifssonG, SulemP, et al. (2009) Parental origin of sequence variants associated with complex diseases. Nature 462: 868–874.
22. HolmH, GudbjartssonDF, SulemP, MassonG, HelgadottirHT, et al. (2011) A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet 43: 316–320.
23. SulemP, GudbjartssonDF, WaltersGB, HelgadottirHT, HelgasonA, et al. (2011) Identification of low-frequency variants associated with gout and serum uric acid levels. Nat Genet 43: 1127–1130.
24. GudmundssonJ, SulemP, GudbjartssonDF, MassonG, AgnarssonBA, et al. (2012) A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet 44: 1326–1329.
25. RafnarT, GudbjartssonDF, SulemP, JonasdottirA, SigurdssonA, et al. (2011) Mutations in BRIP1 confer high risk of ovarian cancer. Nat Genet 43: 1104–1107.
26. JonssonT, AtwalJK, SteinbergS, SnaedalJ, JonssonPV, et al. (2012) A mutation in APP protects against Alzheimer's disease and age-related cognitive decline. Nature 488: 96–99.
27. LiY, VinckenboschN, TianG, Huerta-SanchezE, JiangT, et al. (2010) Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42: 969–972.
28. AlbrechtsenA, GrarupN, LiY, SparsoT, TianG, et al. (2013) Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia 56: 298–310.
29. EmilssonV, ThorleifssonG, ZhangB, LeonardsonAS, ZinkF, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–428.
30. BalasubramanianS, HabeggerL, FrankishA, MacArthurDG, HarteR, et al. (2011) Gene inactivation and its implications for annotation in the era of personal genomics. Genes Dev 25: 1–10.
31. PareG, ChasmanDI, ParkerAN, ZeeRR, MalarstigA, et al. (2009) Novel associations of CPS1, MUT, NOX4, and DPEP1 with plasma homocysteine in a healthy population: a genome-wide evaluation of 13 974 participants in the Women's Genome Health Study. Circ Cardiovasc Genet 2: 142–150.
32. LangeLA, Croteau-ChonkaDC, MarvelleAF, QinL, GaultonKJ, et al. (2010) Genome-wide association study of homocysteine levels in Filipinos provides evidence for CPS1 in women and a stronger MTHFR effect in young adults. Hum Mol Genet 19: 2050–2058.
33. ChambersJC, ZhangW, SehmiJ, LiX, WassMN, et al. (2011) Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 43: 1131–1138.
34. TeslovichTM, MusunuruK, SmithAV, EdmondsonAC, StylianouIM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707–713.
35. FrankeA, McGovernDP, BarrettJC, WangK, Radford-SmithGL, et al. (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet 42: 1118–1125.
36. McGovernDP, JonesMR, TaylorKD, MarcianteK, YanX, et al. (2010) Fucosyltransferase 2 (FUT2) non-secretor status is associated with Crohn's disease. Hum Mol Genet 19: 3468–3476.
37. EllinghausD, EllinghausE, NairRP, StuartPE, EskoT, et al. (2012) Combined Analysis of Genome-wide Association Studies for Crohn Disease and Psoriasis Identifies Seven Shared Susceptibility Loci. Am J Hum Genet 90: 636–647.
38. IkramMK, SimX, JensenRA, CotchMF, HewittAW, et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo. PLoS Genet 6: e1001184.
39. SmythDJ, CooperJD, HowsonJMM, ClarkeP, DownesK, et al. (2011) FUT2 Nonsecretor Status Links Type 1 Diabetes Susceptibility and Resistance to Infection. Diabetes 60: 3081–3084.
40. CarlssonB, KindbergE, BuesaJ, RydellGE, LidónMF, et al. (2009) The G428A Nonsense Mutation in FUT2 Provides Strong but Not Absolute Protection against Symptomatic GII.4 Norovirus Infection. PLoS ONE 4: e5593.
41. HegnhojJ, HansenCP, RannemT, SobirkH, AndersenLB, et al. (1990) Pancreatic function in Crohn's disease. Gut 31: 1076–1079.
42. WarsiAA, DaviesB, Morris-StiffG, HullinD, LewisMH (2004) Abdominal Aortic Aneurysm and its Correlation to Plasma Homocysteine, and Vitamins. Eur J Vasc Endovasc Surg 27: 75–79.
43. CoelhoD, KimJC, MiousseIR, FungS, duMM, et al. (2012) Mutations in ABCD4 cause a new inborn error of vitamin B12 metabolism. Nat Genet 44: 1152–1155.
44. KorotkovaN, LidstromME (2004) MeaB is a component of the methylmalonyl-CoA mutase complex required for protection of the enzyme from inactivation. J Biol Chem 279: 13652–13658.
45. JørgensenT, Borch-JohnsenK, ThomsenTF, IbsenH, GlumerC, et al. (2003) A randomized non-pharmacological intervention study for prevention of ischaemic heart disease: Baseline results Inter99 (1). Eur J Cardiovasc Prev Rehab 10: 377–386.
46. BoesgaardTW, GrarupN, JørgensenT, Borch-JohnsenK (2010) Meta-Analysis of Glucose and Insulin-Related Trait Consortium (MAGIC), (2010) et al. Variants at DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci are associated with reduced glucose-stimulated beta cell function in middle-aged Danish people. Diabetologia 53: 1647–1655.
47. GlümerC, JørgensenT, Borch-JohnsenK (2003) Prevalences of diabetes and impaired glucose regulation in a Danish population: the Inter99 study. Diabetes Care 26: 2335–2340.
48. ThyssenJP, LinnebergA, MenneT, NielsenNH, JohansenJD (2009) The prevalence and morbidity of sensitization to fragrance mix I in the general population. Br J Dermatol 161: 95–101.
49. KimSY, LohmuellerKE, AlbrechtsenA, LiY, KorneliussenT, et al. (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12: 231.
50. WillerCJ, LiY, AbecasisGR (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26: 2190–2191.
Štítky
Genetika Reprodukční medicínaČlánek vyšel v časopise
PLOS Genetics
2013 Číslo 6
Nejčtenější v tomto čísle
- BMS1 Is Mutated in Aplasia Cutis Congenita
- Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits
- Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen
- Distinct Neuroblastoma-associated Alterations of Impair Sympathetic Neuronal Differentiation in Zebrafish Models