Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain
A fundamental challenge in the post-genome era is to understand and annotate the consequences of genetic variation, particularly within the context of human tissues. We present a set of integrated experiments that investigate the effects of common genetic variability on DNA methylation and mRNA expression in four human brain regions each from 150 individuals (600 samples total). We find an abundance of genetic cis regulation of mRNA expression and show for the first time abundant quantitative trait loci for DNA CpG methylation across the genome. We show peak enrichment for cis expression QTLs to be approximately 68,000 bp away from individual transcription start sites; however, the peak enrichment for cis CpG methylation QTLs is located much closer, only 45 bp from the CpG site in question. We observe that the largest magnitude quantitative trait loci occur across distinct brain tissues. Our analyses reveal that CpG methylation quantitative trait loci are more likely to occur for CpG sites outside of islands. Lastly, we show that while we can observe individual QTLs that appear to affect both the level of a transcript and a physically close CpG methylation site, these are quite rare. We believe these data, which we have made publicly available, will provide a critical step toward understanding the biological effects of genetic variation.
Published in the journal:
. PLoS Genet 6(5): e32767. doi:10.1371/journal.pgen.1000952
Category:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1000952
Summary
A fundamental challenge in the post-genome era is to understand and annotate the consequences of genetic variation, particularly within the context of human tissues. We present a set of integrated experiments that investigate the effects of common genetic variability on DNA methylation and mRNA expression in four human brain regions each from 150 individuals (600 samples total). We find an abundance of genetic cis regulation of mRNA expression and show for the first time abundant quantitative trait loci for DNA CpG methylation across the genome. We show peak enrichment for cis expression QTLs to be approximately 68,000 bp away from individual transcription start sites; however, the peak enrichment for cis CpG methylation QTLs is located much closer, only 45 bp from the CpG site in question. We observe that the largest magnitude quantitative trait loci occur across distinct brain tissues. Our analyses reveal that CpG methylation quantitative trait loci are more likely to occur for CpG sites outside of islands. Lastly, we show that while we can observe individual QTLs that appear to affect both the level of a transcript and a physically close CpG methylation site, these are quite rare. We believe these data, which we have made publicly available, will provide a critical step toward understanding the biological effects of genetic variation.
Introduction
With the widespread application of highly parallel SNP genotyping arrays much of the recent effort in human genetics has focused on defining the role of genetic variation in disease and physical traits. A small subset of this work, however, has attempted to examine the more proximal effects of genetic variation on mRNA and protein levels [1]–[5]. This has the potential to inform on several levels; first, it is a critical step toward understanding the pathobiological consequences of genetic variants linked to disease; second, it affords the opportunity to form inferences regarding relationships between genes based on patterns of co-regulation; and third, it provides a more complete view of multiple levels of regulation of gene expression than that provided by the traditional reductionist method [6], [7].
Epigenetic alterations, including DNA methylation, histone modification and RNA mediated gene silencing, are defined as heritable changes in gene function that occur without an alteration of the underlying DNA sequence and which afford a level of transcriptional regulation above and beyond DNA sequence [8]. DNA methylation, which occurs at discrete CpG dinucleotide motifs, is believed to be an important mediator of gene expression; this observation has been most frequently linked to DNA methylation at CpG islands, regions of the genome that contain a high density of CpG sites, often proximal to gene promoter regions. A classical inverse relationship between the extent of DNA methylation at CpG islands and expression levels of the proximal gene product has been most often described [8]. To date the relationship between genetics, DNA methylation and gene expression is one that has been largely and necessarily confined to observations at single loci and transcripts in individual cell systems or tissues. The recent development of genome-scale technologies provides unprecedented opportunities to expand upon these experiments. The integration of genetic, epigenetic and expression data promises to provide general observations regarding the relationship between genetic variation and expression. Beyond these observations these data can be readily mined to unravel the network of effects associated with genomic variants. This may reveal some of the rather cryptic intermediate events that occur between DNA variant and phenotype.
Because of our interest in genomic regulation of expression and neurological disorders we embarked upon a series of experiments to provide a brain region-specific contextual framework for genetic and epigenetic regulation of gene expression. We were particularly interested in mapping the effects of common genetic variation on gene expression and DNA methylation; the widespread adoption of genome wide association studies for disease and traits has generated a large number of associated loci, and such a map would allow these loci to be associated with a biological consequence. We obtained frozen brain tissue from the cerebellum (CRBLM), frontal cortex (FCTX), pons (PONS) and temporal cortex (TCTX) from 150 subjects (total 600 tissue samples). We undertook three separate assays across this series; first, genome-wide SNP genotyping; second, assay of 27,578 CpG methylation sites in each of the four brain regions; third, mRNA expression profiling of 22,184 transcripts in all four brain regions. Here we discuss the results of these experiments, particularly in the context of integrated datasets to define expression quantitative trait loci (eQTL) and CpG methylation quantitative trait loci (methQTL), where quantitative trait loci (QTL) are correlations between genetic variation within genomic region(s) (in this case single nucleotide polymorphisms) and a quantitative trait (DNA methylation or expression), and detailing differences and similarities across brain regions.
Results
CpG methylation and mRNA levels differ between brain regions
To assess whether differences in CpG methylation and expression were consistently different between brain regions, our initial analyses focused on global comparison of these measures across tissues. Performing a hierarchical cluster analysis [9] of these data demonstrated that the four brain regions have different epigenetic and expression profiles (Figure 1A and 1B). Expression pattern differences were distinct between cerebellum, pons and cortical tissue, with frontal and temporal cortices clearly separating within the mRNA dataset, but overlapping within the CpG methylation data. These data show that CpG methylation and mRNA levels vary measurably between brain regions. These data are in agreement with previous work that demonstrated distinct patterns of gene expression and DNA methylation in human cerebellum compared to cortical tissues [10], [11].
The next analysis was limited to the CpG methylation and mRNA datasets of those probes where an Illumina detection p-value of less than or equal to 0.01 was observed in 95% of samples analyzed in each of the four tissue region. This provided data on a total of 27,277 unique CpG methylation sites and 8,076 probes against individual mRNA transcripts that were present in all four brain regions (Figure S2). The distribution of observed CpG methylation levels and transcript abundance was plotted as a histogram for each tissue (Figure 1C and 1D). A high proportion of CpG sites interrogated within this assay were infrequently methylated, whereas a smaller proportion were highly methylated, a trend that was apparent across tissues. This distribution of DNA methylation (Figure 1C) closely matches distributions of DNA methylation reported previously by sequencing in multiple human cell types [12], [13]. CpG sites that are within CpG Islands [14], [15] were predominately unmethylated. Grouping of mRNAs into high through low expression groups revealed abundance profiles similar to those previously reported (Figure 1D). We next compared CpG methylation and mRNA expression levels at individual loci directly between each possible pair of tissue regions (Figure 1C and 1D, scatterplots). In general levels of DNA methylation and expression were quite similar between tissues. Measures within frontal and temporal cortices were consistently the most alike and cerebellar tissue provided the most distinct profile of the four regions.
Genotype influences CpG methylation and mRNA expression
A primary aim underlying these experiments was to examine the extent of genetic control of DNA methylation and expression within brain tissues. To investigate this process, we undertook a series of QTL analyses. From the 537,411 genotyped SNPs that passed quality control filtering we then imputed genotypes for 2,545,178 SNPs using MACH [16] and HapMap CEU phase data. After additional quality and analysis specifications filtering of the imputed genotypes 1,629,853 SNPs (average) were used for analysis. The QTL analysis of each tissue region was performed separately so we expanded our trait selection from those that were detected in 95% of samples across all four brain regions to those that were detected in 95% of samples within a specific brain region. Changing this trait selection threshold based on what is present within a specific brain region allowed us to analyze additional mRNA transcripts that are well detected within one or more brain regions but not all four tissue regions. The number of SNPs, CpG sites and mRNA transcripts tested for each brain tissue can be found in Table 1. With these data for each tissue we then performed linear regression of allele dosage against each measure using CpG methylation or expression of mRNA as the dependent variable and genotype as the independent variable. This regression analysis was performed with Plink [17] to correlate allele dosage with the quantitative trait. We corrected for number of tests per trait by permutation, computing a genome-wide empirical p-value for each of the ∼1.6 million SNPs tested against each individual trait. To correct for the number of traits tested within each brain region by assay type an FDR threshold for significance was determined based on the empirical p-values from the proceeding step. This yielded a necessarily conservative threshold for significance [18]. Prior to analysis each trait was adjusted using available biological and methodological covariates in an attempt to reduce the influence of systematic confounding effects (Text S1). Post hoc we annotated significant QTLs as cis if the SNP lay within 1MB of either the CpG methylation site in question or the transcript being tested; all other SNP-dependent variable tests were designated as trans. Notably, because designation of cis and trans QTL tests was performed post hoc, there was no distinction in terms of level of statistical correction between these groups.
Our previous data from the human cerebral cortex [2], as well as data from HapMap lymphoblastoid cell lines (LCL) [5] have suggested that SNPs proximal to genes including SNPs upstream of the transcriptional start site (TSS) within the gene and downstream of the transcript end site (TES) have a greater influence on gene expression than those further away. This is presumably because genetic variation around promoter elements, splice sites and 3′ UTR affects transcription, splicing and mRNA stability [19] that results in a relative enrichment of cis over trans eQTLs. To see if this observation generalized to multiple brain regions and also to QTLs linked to CpG methylation, we plotted all significant QTLs by genomic position of the SNP and transcript or SNP and CpG site (Figure 2).
Previous work has shown that SNPs are weakly correlated with the methylation status of CpG islands as compared to other DNA characteristics such as sequence and structure [15]; however we detected a large number of significant correlations detected between genetic variation and the methylation status of CpG sites (Table 2). Genome-wide visualization of detected methQTLs illustrated a strong positional effect of this relationship with an excess of the number and magnitude of cis associations (Figure 2A). The detected methQTLs accounted for between 18% and 88% of corrected methylation levels at individual loci. The number of CpG sites with methQTLs that were significant after correction for genome-wide multiple testing ranged from 1085 (4%) in the cerebellum to 1417 (5.1%) in the temporal cortex (Table 2). All significant methQTL results can be found in Table S3. Detection of cis QTL for DNA methylation was more likely when the CpG site was outside of an island (as defined in [14]); while 42% of sites on the array are within a CpG island, only 18% of the CpG sites with a significant QTL were in islands.
A similar number of mRNA transcripts with significant eQTLs were detected in each of the four brain regions, ranging from 280 (3.2%) in the pons to 391 (4.2%) in the temporal cortex (Table 3). These eQTLs accounted for between 18% and 77% of corrected expression levels of associated transcripts between individuals. All significant eQTL results can be found in Table S4.
Proportionally a similar number of QTLs were observed for CpG methylation and mRNA expression. In order to determine whether SNP variability was more influential in CpG methylation or expression, we performed a random sampling, with 10,000 iterations, of QTL calculations for each independent measure in a core set of 100 individuals (Text S1). The results of this analysis show on average that 2.34% of CpG sites and 1.99% of mRNAs significantly correlate with QTLs. Each mRNA correlated with twice as many SNPs, on average, than CpG sites did. However the average R2 of the SNPs correlated with mRNAs was equal to CpGs (Figure S6).
The abundance of cis QTL for CpG methylation and mRNA expression prompted us to examine the distribution of cis methQTLs and eQTLs (Figure 2C–2L). This revealed that both the number of significant methQTLs and the strength of association between SNP and DNA methylation level were inversely correlated with physical distance between the genetic and epigenetic variants in question (Figure 2C–2F). Furthermore, this relationship was also evident for cis eQTLs (Figure 2H–2K). The average distance between correlated cis SNP and trait was 81Kb for CpG sites and 121Kb for mRNA transcripts. The largest effect QTLs for both cis methQTLs and eQTLs tended to be present in all four tissues tested (Figure 2G and 2L). Of the mRNA transcripts where a cis eQTL was significantly detected in at least one brain region 53% have been previously reported. This number increased to 70% when analysis is limited to those transcripts with a cis eQTL consistently detected in all four tissues [2]–[4] (Table S2).
To assess the enrichment of detected cis QTLs relative to those in trans we calculated the number of observed and possible cis and trans QTLs for CpG methylation and mRNA expression levels. The calculation of possible cis QTLs was performed by counting the number of all testable, genotyped SNPs within 1Mb of the CpG probe or transcript in question, regardless of significant association; calculation of possible trans QTLs was performed by counting the number of all testable, genotyped SNPs throughout the genome, excluding those within 1Mb of the CpG probe or transcript in question, again, regardless of significant association. This analysis was repeated for every tested CpG probe and transcript. These data showed an extreme enrichment of cis methylation and expression QTLs relative to trans (∼4400-fold and ∼7300-fold respectively). Although it should be noted that this calculation of enrichment does not take into account the differences in power to detect cis versus trans affects, thus the extreme nature of these fold changes may be inflated. While the average distance between correlated cis SNP and trait was 81Kb for CpG sites and 121Kb for mRNA transcripts when cis is defined as 1Mb, the peak enrichment of the number of significant cis QTLs was observed when the threshold distance for what is considered cis was set at ∼45bp for CpG sites and ∼68kb for mRNA transcripts (Figure 3).
Large effect QTLs are consistent across brain regions
Based on our results that reached statistical significance many QTLs appear to be tissue specific, where 49% of CpG sites and 54% of mRNA transcripts with a significant cis QTL were only detected within one tissue. Table 4 shows summary counts of CpG sites and mRNA transcripts with a QTL found in all four tissues. Because this analysis relies on a threshold for significance, it has the potential to be misleading, discounting QTLs that do not quite reach the threshold for significance. Thus, in order to compare detected methQTLs and eQTLs between tissues, we selected every SNP-CpG methylation pair and SNP-transcript pair that passed the defined threshold for significance in at least one tissue. We then compared R2 values for each of these SNP-CpG pairs or SNP-transcript pairs in all four tissues, including results from tissues where the SNP-CpG pair or SNP-transcript pair was non-significant, using ternary plots (Figure 4). The majority of large effect and many moderate effect QTLs were shared across the four brain regions (Figure 4) when significant effects from a tissue are compared with corresponding (possible non-significant) effects from another tissue. Of interest, a subset of trans methQTLs (defined here as any SNP CpG pair not within 1MB of each other) was also shared between regions and had high R2 values. This suggests that, while methQTLs show cis enrichment, there are also a number of robust distal effects where SNP and CpG methylation show significant association.
These plots illustrate that the majority of eQTLs or methQTLs with strong effect sizes were consistent across tissues. For example, a large effect eQTL was found for CHURC1, which encodes a protein proposed to be involved in transcriptional regulation, in all tissues (Figure S7 and as reported previously [3]). However, there were also rare, but observable, events where a large effect QTL was detected within a single tissue and was completely absent in the other three tissues. For example the cis eQTL for PPAPDC1A, encoding a phosphatidic acid phosphatase that displays hydrolase and phosphotase activity at the membrane, has a large effect that appears to be restricted to the cerebellum, despite reliable detection of the transcript in all four brain regions (Figure 4Q and 4R–4T and Figure S8).
methQTL and eQTL capture independent traits
We next sought to investigate whether the observed methQTLs and eQTLs represent individual loci where the underlying variant influences both CpG methylation and gene expression. These data revealed that in general, there was little co-association between methQTLs and eQTLs. Of the SNP, CpG and mRNA combinations considered (see Materials and Methods), 4.8% (average across tissues) were significant as both a methQTL and eQTL; this includes 2.6% (average of 13 sites per tissue) of CpG methylation sites where a significant methQTL is present and 8.2% (average of 11 transcripts per tissue) of mRNA transcripts that have a significant eQTL.
For this 4.8% with a shared QTL both the mean distance between the CpG site and mRNA TSS and the strength of the correlation differed from that of the considered combinations. For the shared QTLs the distance between CpG site and mRNA TSS is 27.5Kb whereas in all considered the mean distance was 394.4Kb. The mean R2 between CpG methylation and mRNA expression within the shared QTL set was 0.255 as compared to 0.033 in the considered set. Within these possibly shared QTLs, almost half of the significant methQTLs and eQTLs were correlated in the same direction, i.e. the CpG methylation level was positively correlated with the mRNA expression level (Figure S9). However; when considering only those shared QTLs where the CpG site was within a CpG island (as defined in [14]), 91% of these CpG methylation sites and mRNA expression pairs are inversely correlated, in line with the traditional view that increased CpG island methylation is associated with decreased expression at geographically local transcripts (Figure 5).
Discussion
Elucidating the genetic control of biological processes is a critical issue in the post genome era. This will inform on the basic biological level and lead to clearer understanding of the pathophysiology underlying human disease. The work we have described here and the public release of the data resulting from this effort, aims to facilitate an understanding of the initial consequences of common genetic variation on DNA methylation and gene expression in brain (GEO Accession Number: GSE15745; dbGAP Study Accession: phs000249.v1.p1) [20]–[22].
Our data show clearly that patterns of expression are measurably different across brain tissues. QTL analyses reveal an abundance of eQTLs and, as previously reported, significant eQTLs are predominantly cis in nature [2], [5]. It is notable, particularly given the systematic differences in expression patterns among the four tissues, that the majority of large effect eQTLs detected were consistent across brain tissues. Large effect tissue-specific QTLs are also observable in the current data set, suggesting that there are some genetic effects on expression that are dependent on the tissue type used irrespective of expression levels of the mRNA. Our results that reached statistical significance are consistent with recent work showing that many eQTLs are likely tissue specific, but they typically have a smaller effect and are more distant [23]–[25]. However within our data when we observe a QTL that is only significant in a single tissue, examination of the correlation between the same SNP and trait in other tissues generally reveals a suggestive, although non-significant, correlation in the other tissues; thus both large and many moderate size effects appear to be shared. While this aspect of our data may seem at odds with the previous studies which in part used purified cells or cell lines, the current samples within our study were exclusively from heterogeneous tissues. Thus one might expect to see quantitative trait loci that are generalizeable across tissues because in essence each tissue is a heterogeneous mix of cells and we are more likely to detect QTLs that are in a majority of cell types. The detection of cell type specific QTLs is apparently still feasible, however, such correlations will have to be large enough to detect within the background of a heterogeneous tissue.
We have shown for the first time that there are large numbers of methQTLs in human tissue. It is worth noting, however, that the CpG methylation sites assayed here have not been defined through experimental work; as such a proportion of these may be consistently non-methylated, and thus our estimate of the proportion of CpG sites with an associated methQTL may be an underestimate. As for mRNA expression, DNA methylation patterns are sufficiently different to predict the originating tissue; however, we also see that SNP variation strongly influences the level of DNA methylation at sites that show intermediate levels of methylation. The strength of effect for methQTLs is similar to those for eQTLs and as discussed above eQTLs and methQTLs tend to be found consistently across tissues, supporting the robustness of these loci. Additionally methQTLs also show cis-enrichment, suggesting that local genomic variants influence the propensity of a particular genomic site to be methylated.
Combined analysis of methQTLs and eQTLs did not reveal a simple relationship between genetically influenced DNA methylation and expression. QTLs that influenced both DNA methylation and expression levels of a physically close transcript were identified, these were in the minority; for the most part strong methQTLs were not also QTLs for expression. It is plausible that this observation reflects a genuine biological disconnect between CpG methylation and gene expression. Alternatively, this may be a result of poor power within the series to detect such effects or, as discussed below, a consequence of differences in cellularity confounding our results. Notably, when analyzing joint methQTL-eQTLs where the methylation site in question resided in a CpG island, the classical inverse relationship between CpG methylation and expression was clearly observed.
Increasing the sample size of the current study will likely yield a larger number of QTLs, particularly those with small to moderate effect size. It would also be of interest to extend this initial work to other tissues, which will not only be important in defining tissue specific QTLs, but also will afford us the opportunity to examine QTLs at transcripts not typically detected in brain. Additionally, we have limited our sampling to a population that represents only a small component of worldwide human genetic diversity [26]. Adding tissues from other populations would provide an understanding of genetic control of expression in different groups and a framework for interpretation of GWAS results in these groups. Furthermore, inter-population comparison would help refine the basis by which differences in genetic architecture affect expression. We also note that our coverage of DNA modifications is currently incomplete as there are other epigenetic modifications, such as acetylation, that are not currently accessible to high throughput techniques, and many more methylation sites exist than assayed here. Whether genotypic differences influence other components of the epigenome is unknown. Finally, in a complex organ such as brain cellularity will confound analyses correlating measures that vary between cell types. Thus, CpG methylation to mRNA correlations pose a considerable challenge in heterogeneous tissues. In the context of human tissues this problem may be solved by selection of cell type by laser dissection or by culture and terminal differentiation of cohorts of pluripotent stem cells. Both solutions have considerable technical limitations at this time.
In summary, we show data that we argue convincingly demonstrates QTLs for DNA methylation and expression exist across human tissues. The data presented here provides an initial basis for understanding how genetic variance in humans influences epigenetic marks and expression; we argue that these observations may be useful in the understanding of gene contributions to human phenotypes. As well as providing a more complete model of the complex control of gene expression in the brain, our data may be useful in moving rapidly from locus to mechanism of disease.
Materials and Methods
Samples and assays
Frozen tissue samples of the cerebellum (CRBLM), frontal cortex (FCTX), caudal pons (PONS) and temporal cortex (TCTX) were obtained from 150 neurologically normal Caucasian subjects (Text S1). 100–200mg aliquots of frozen tissue were sub-dissected from each tissue from all 150 subjects resulting in 600 tissue samples and used for methylation assay and expression assays. Genotyping was performed using Infinium HumanHap550 beadchips (Illumina) to assay genotypes for 561466 SNPs, from the cerebellum tissue samples. CpG methylation status was determined using HumanMethylation27 BeadChips (Illumina), which measure methylation at 27,578 CpG dinucleotides at 14,495 genes. Profiling of 22,184 mRNA transcripts was performed using HumanRef-8 Expression BeadChips (Illumina) as previously described [27].
Genotype data
The threshold call rate for inclusion of the subject in analysis was 95%. All 150 subjects had a call rate greater than 95%, and were included in the subsequent analyses (average call rate = 99.86%; range 97.72%–99.95%, based on the missing procedure within the PLINK v1.04 software toolset [17].
The gender of the subjects reported by the brain banks was compared against their genotypic gender using PLINK's check-sex algorithm, which determines a sample's genotypic gender based on heterozygosity across the X chromosome. A gender discrepancy was detected for one subject and excluded from further analysis.
To confirm the ethnicity of the samples, Identity-By-State (IBS) clustering and multidimensional scaling analyses were performed within PLINK using the genotypes from the brain samples that had been merged with data from the four HapMap [28] populations (n = 32 Caucasian (CEU), 12 Han Chinese, 16 Japanese and 24 Yoruban non-trio samples previously genotyped by Illumina and assayed on the Infinium HumanHap500 version genotyping chips). Two samples were outliers based on population and excluded from further analysis (Figure S1).
Genotype data of the samples were compared for cryptic relatedness using the Identity-By-Descent (IBD) procedure within PLINK. No samples were found to be from related individuals.
Mach software version 1.0.16 [16] and HapMap CEU phase data (release 22) were used to impute genotypes for ∼2.5 million SNPs. Imputed SNPs were excluded if the linkage disequilibrium r2 values between imputed and known genotypes was less than 0.3, and if their posterior probability averages were less than 0.8 for the most likely imputed genotype. For each of the four tissue regions, SNPs were also excluded if: (a) call rate was less than 95%, (b) Hardy-Weinberg equilibrium (HWE) p-value was less than 0.001, and (c) the SNP had less than 3 minor homozygotes present. Exact numbers of SNPs used per brain tissue and assay type are shown in Table 1.
mRNA expression data
Raw intensity values for each probe were transformed using the rank invariant normalization method [29]–[31] and then log2 transformed for mRNA analysis. Four samples were excluded from further analysis, as they were outliers based on overall probe detection rate and/or mean expression level; one CRBLM, one FCTX and two PONS.
CpG methylation data
The threshold call rate for inclusion of samples in the analysis was 95%. Based on this metric, 9 CRBLM, 2 FCTX, 6 PONS and 8 TCTX samples were excluded from further analysis. The remaining brain samples had an average detection rate of 99.84% (range 95.0% to 99.98%).
The gender of the samples reported by the brain banks were compared against their assayed gender based on values of methylation from CpG sites on the X chromosome. Four samples with gender discrepancies were detected and were removed from subsequent analysis; two CRBLM, one PONS and one TCTX. The resulting HCL sample tree based on all Chromosome X probes, after removal of these four individuals is shown in Figure S3.
Clustering of samples by brain region
Performing a Hierarchical Clustering (HCL) [9] of the sample profiles using the TM4 MeV version 4.1.01 tool [32], with Euclidian distances and ‘Average Linkage clustering’ resulted in the samples separating fairly well by brain tissue region. Where in mRNA separation of samples into 4 clusters matching brain tissue region was clear. The frontal and temporal cortices could not be separated in within CpG data. For clustering all detected data was used for mRNA; only detected data for autosomal probes was used in clustering of the CpG data, otherwise sub-clusters based on gender appeared. The HCL samples trees were saved a Newick tree files and plotted again using the HyperTree tool (http://hypertree.sourceforge.net/).
Selection of traits for analysis
Traits were excluded from analysis if they were detected in less than 95% of samples for each tissue region. For each tissue region and trait type (CpG and mRNA) the 95% threshold was determined using total number of analyzable samples for this pairing of region and trait. A probe is considered detected for a sample if the reported Illumina Detection p-value was less than or equal to 0.01. Numbers of analyzable samples per trait (or assay) type and region are available in Table 1 overlap of detected probes among brain tissues is shown as Venn diagram in Figure S2.
Correction for known biological and methodological covariates
Prior to quantitative trait loci analysis each trait was adjusted using the available biological and methodological covariates in an attempt to remove the influence of these potentially confounding affects. In R each trait was regressed using the following model:Where Y is the trait profile (log2 normalized mRNA expression intensities and raw values of CpG DNA methylation) and X1 … Xn represent the biological covariates Age and Gender and the methodological covariates post mortem interval (PMI), which Brain Bank the samples was from and which preparation/hybridization batch the sample was processed in. Within this model gender, tissue bank and batch where treated as categorical covariates. After fitting each trait to the model the residuals from the model are kept and represent the trait in following analyses. Thus variance attributable to gender, age, post-mortem interval, tissue source and hybridization batch are removed prior to QTL analysis. These covariate data are available in Table S1. Histograms showing the proportion of traits that are potentially impacted by these covariates are shown in Figure S4 (CpG) and Figure S5 (mRNA).
Quantitative trait loci analysis
For each of the four brain regions, a regression analysis was performed on the residuals described in the preceding section for (a) mRNA expression and (b) the methylation values generated for CpG sites. The trait residuals were then used as the quantitative phenotype for that probe in genome-wide association analysis looking for quantitative trait loci. These analyses were performed using the assoc function within Plink, which correlates allele dosage with change in the trait. Each of the four tissue regions was analyzed separately, and independent genome-wide association analyses were performed looking for (a) expression quantitative trait loci (eQTLs) for mRNA and (b) methylation quantitative trait loci (methQTLs) for CpG sites. The Plink toolset quantitative trait association analysis fits data to the following model:Where Y is the quantitative trait and ADD represent genotypes encoded as allele dosage. See Plink Quantitative trait association and Linear and Logistic models documentation for more information.
Correction for multiple tests
To correct for the large number of SNPs tested per trait, a genome-wide empirical p-value was computed for the asymptotic p-value for each SNP by using 1,000 permutations of swapping sample labels of the traits, using the maxT permutation functionality provided within Plink. A permutation based method using label swapping of the traits is an appropriate method of test correction [18] for these analysis as it is not dependent on these quantitative traits having a normal distribution and also allows the linkage disequilibrium of the genomic regions being tested against the traits to be maintained.
To correct for the number of traits being tested per tissue region, a false discovery rate (FDR) threshold was determined based on the empirical p-values using the fwer2fdr function of the multtest package in R version 2.6.1. Empirical p-values were allowed to exceed this threshold if their linkage disequilibrium r2 was greater than or equal to 0.7 with a SNP with empirical values within the FDR threshold.
Polymorphism(s) in assay probes
Sequence variants within the sequence of the probe used to assay individual traits may cause differential hybridization and inaccurate expression and possible methylation measurements. To exclude this confound, the sequences of probes with significant correlation to a trait were examined for the presence of polymorphisms using CEU HapMap data, and, if present, that QTL was removed from the result set.
Selecting SNPs, CpG sites, and mRNA transcripts for shared QTL analysis
Within each tissue, we selected every pairing of CpG methylation sites and mRNA transcripts where the CpG methylation site was within 1Mb of the mRNA TSS and both the CpG methylation site or mRNA transcript had a significant cis QTL. These CpG-mRNA pairs were then expanded into triplets of SNP and CpG-mRNA pair, where the SNP was significantly correlated in cis with either the CpG methylation site or mRNA transcript of the CpG-mRNA pair.
Accession numbers
Data resulting from these experiments is available online (GEO Accession Number: GSE15745; dbGAP Study Accession: phs000249.v1.p1).
Supporting Information
Zdroje
1. MelzerD
PerryJR
HernandezD
CorsiAM
StevensK
2008 A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet 4 e1000072 doi:10.1371/journal.pgen.1000072
2. MyersAJ
GibbsJR
WebsterJA
RohrerK
ZhaoA
2007 A survey of genetic human cortical gene expression. Nat Genet 39 1494 1499
3. SchadtEE
MolonyC
ChudinE
HaoK
YangX
2008 Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6 e107 doi:10.1371/journal.pbio.0060107
4. StrangerBE
NicaAC
ForrestMS
DimasA
BirdCP
2007 Population genomics of human gene expression. Nat Genet 39 1217 1224
5. VeyrierasJB
KudaravalliS
KimSY
DermitzakisET
GiladY
2008 High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4 e1000214 doi:10.1371/journal.pgen.1000214
6. CooksonW
LiangL
AbecasisG
MoffattM
LathropM
2009 Mapping complex disease traits with global gene expression. Nat Rev Genet 10 184 194
7. GiladY
RifkinSA
PritchardJK
2008 Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet 24 408 415
8. EggerG
LiangG
AparicioA
JonesPA
2004 Epigenetics in human disease and prospects for epigenetic therapy. Nature 429 457 463
9. EisenMB
SpellmanPT
BrownPO
BotsteinD
1998 Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95 14863 14868
10. KhaitovichP
MuetzelB
SheX
LachmannM
HellmannI
2004 Regional patterns of gene expression in human and chimpanzee brains. Genome Res 14 1462 1473
11. Ladd-AcostaC
PevsnerJ
SabunciyanS
YolkenRH
WebsterMJ
2007 DNA methylation signatures within the human brain. Am J Hum Genet 81 1304 1315
12. ZhangY
RohdeC
TierlingS
JurkowskiTP
BockC
2009 DNA methylation analysis of chromosome 21 gene promoters at single base pair and single allele resolution. PLoS Genet 5 e1000438 doi:10.1371/journal.pgen.1000438
13. MeissnerA
MikkelsenTS
GuH
WernigM
HannaJ
2008 Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454 766 770
14. Gardiner-GardenM
FrommerM
1987 CpG islands in vertebrate genomes. J Mol Biol 196 261 282
15. BockC
WalterJ
PaulsenM
LengauerT
2007 CpG island mapping by epigenome prediction. PLoS Comput Biol 3 e110 doi:10.1371/journal.pcbi.0030110
16. LiY
DingJ
AbecasisGR
2006 Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. American Journal of Human Genetics 79 S2290
17. PurcellS
NealeB
Todd-BrownK
ThomasL
FerreiraMA
2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 559 575
18. ChurchillGA
DoergeRW
1994 Empirical threshold values for quantitative trait mapping. Genetics 138 963 971
19. KwanT
BenovoyD
DiasC
GurdS
ProvencherC
2008 Genome-wide analysis of transcript isoform variation in humans. Nat Genet 40 225 231
20. BarrettT
EdgarR
2006 Mining microarray data at NCBI's Gene Expression Omnibus (GEO)*. Methods Mol Biol 338 175 190
21. EdgarR
DomrachevM
LashAE
2002 Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30 207 210
22. MailmanMD
FeoloM
JinY
KimuraM
TrykaK
2007 The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39 1181 1186
23. HeinzenEL
GeD
CroninKD
MaiaJM
ShiannaKV
2008 Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol 6 e1 doi:10.1371/journal.pbio.1000001
24. KwanT
GrundbergE
KokaV
GeB
LamKC
2009 Tissue effect on genetic control of transcript isoform variation. PLoS Genet 5 e1000608 doi:10.1371/journal.pgen.1000608
25. DimasAS
DeutschS
StrangerBE
MontgomerySB
BorelC
2009 Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325 1246 1250
26. JakobssonM
ScholzSW
ScheetP
GibbsJR
VanLiereJM
2008 Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451 998 1003
27. van der BrugMP
BlackintonJ
ChandranJ
HaoLY
LalA
2008 RNA binding activity of the recessive parkinsonism protein DJ-1 supports involvement in multiple cellular pathways. Proc Natl Acad Sci U S A 105 10244 10249
28. 2005 A haplotype map of the human genome. Nature 437 1299 1320
29. WorkmanC
JensenLJ
JarmerH
BerkaR
GautierL
2002 A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol 3 research0048
30. SchadtEE
LiC
EllisB
WongWH
2001 Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J Cell Biochem Suppl Suppl 37120 125
31. TsengGC
OhMK
RohlinL
LiaoJC
WongWH
2001 Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res 29 2549 2557
32. SaeedAI
SharovV
WhiteJ
LiJ
LiangW
2003 TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34 374 378
Štítky
Genetika Reprodukční medicínaČlánek vyšel v časopise
PLOS Genetics
2010 Číslo 5
Nejčtenější v tomto čísle
- Common Genetic Variants near the Brittle Cornea Syndrome Locus Influence the Blinding Disease Risk Factor Central Corneal Thickness
- All About Mitochondrial Eve: An Interview with Rebecca Cann
- The Relationship among Gene Expression, the Evolution of Gene Dosage, and the Rate of Protein Evolution
- SMA-10/LRIG Is a Conserved Transmembrane Protein that Enhances Bone Morphogenetic Protein Signaling