Copy Number Variation Is a Fundamental Aspect of the Placental Genome

Download PDF České info

Generally, every mammalian cell has the same complement of each part of its genome. However, copy number variation (CNV) can occur, where, compared to the rest of its genome, a cell has either more or less of a specific genomic region. It is unknown whether CNVs cause disease, or whether they are a normal aspect of cell biology. We investigated CNVs in polyploid trophoblast giant cells (TGCs) of the mouse placenta, which have up to 1,000 copies of the genome in each cell. We found that there are 47 regions with decreased copy number in TGCs, which we call underrepresented (UR) domains. These domains are marked in the TGC progenitor cells and we suggest that they gradually form during gestation due to slow replication versus fast replication of the rest of the genome. While UR domains contain cell adhesion and neuronal genes, they also contain significantly fewer genes than other genomic regions. Our results demonstrate that CNVs are a normal feature of the mammalian placental genome, which are regulated systematically during pregnancy.

Published in the journal: . PLoS Genet 10(5): e32767. doi:10.1371/journal.pgen.1004290
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1004290

Summary

Introduction

While the accumulation of somatic copy number variations (CNVs) has been proposed to be a result of the aging process, predisposing cell types to cancer progression and neurological diseases, an alternate hypothesis is that they are a normal—or even essential—part of cell biology [1], [2]. In support of the latter, lymphocyte-specific CNVs in immunologically important genes generate the genetic diversity of receptor molecules critical to their function [3]. Although V(D)J recombination is found only in the immune system, recent reports hint that lineage-specific somatic CNVs may be essential for healthy cellular differentiation and function in a number of organs such as the liver, pancreas and skin [4], [5]. It is unknown how these lineage-specific mammalian CNVs are formed—whether by a process similar to V(D)J recombination or by an alternative mechanism.

Although the role of many cell-type specific CNVs in mammals is unclear, lineage-specific CNVs are a normal aspect of cellular development in the fruit fly Drosophila melanogaster [6]. Lineage-specific CNVs form during Drosophila egg and larval development in polyploid cells via cycles involving DNA replication in the absence of cell division (endoreplication) [6]. In egg formation, somatic CNVs form by selective amplification of genomic regions containing chorion (eggshell) genes, which facilitates secretion of chorion proteins by the ovarian follicle cells [7], [8]. Drosophila somatic CNVs can also arise due to underreplication of certain genomic regions in the salivary glands, fat body and midgut of the larva [9]–[13]. While CNVs in Drosophila polyploid cells have been observed for more than 70 years [14], it is not known whether a similar mechanism is present in mammalian cells. However, the recent observation of human tissue-specific CNVs [1]–[5] suggests that somatic CNVs are as essential in mammalian cells as they are in Drosophila.

Mammals absolutely require polyploid placental cells, corollaries to Drosophila follicle cells, for pregnancy maintenance [15]. In the placenta, polyploidy is restricted to specialized trophoblast cells that invade and remodel the uterus to promote vascularization and other maternal adaptations to pregnancy [15]. In rodents, these cells—termed trophoblast giant cells (TGCs), have 50–1,000 copies of the genome per cell. While proper TGC function depends on their polyploidy content [16], [17], it is not known what aspect of polyploidy is necessary for fetal survival. As TGCs are a class of critical polyploid support cells analogous to Drosophila follicle cells, they may similarly use differential replication of the genome to achieve highly specialized function.

Previous studies have addressed possible CNVs in rodent TGCs. Ohgane et al. [18] used restriction landmark genomic scanning (RLGS) to analyze CpG islands in rat junctional zone TGCs during late gestation (days 18 and 20). They reported that ≥97% of the spots detected by RLGS were similar to diploid controls and therefore concluded that there are no TGC CNVs. Sher et al. [19] also argued against the existence of CNVs based on array Comparative Genomics Hybridization (aCGH) and quantitative real-time PCR experiments on mouse e9.5 implantation site TGCs. However, as there are several subtypes of TGCs which all have varying ploidy and functional significance during gestation [15], [20], CNVs could be present in a subset of cell types or only at certain developmental time points. Of particular interest are parietal TGCs, which have the highest degree of polyploidy [15], and are therefore an excellent candidate for differential replication of the polyploid genome. Genetic mouse mutants affecting the parietal TGCs predominantly die before e12.5 [15]–, suggesting that this is when developmentally important CNV would be required.

Here we report that somatic CNVs are a normal part of placental cell biology. We utilized whole genome sequencing (WGS) and aCGH to identify 47 reproducibly underrepresented (UR) domains in mouse e9.5 parietal TGCs, totaling 6% of the genome. Employing a variety of genomic techniques, we demonstrate that UR domains are marked in chromatin prior to endoreplication in TGC progenitor cells and gradually form during the first half of gestation. UR domains are highly enriched for genes involved in cell adhesion and neurogenesis, as well as for gene deserts. Furthermore, we specifically show that UR domains are due to underreplication rather than somatic deletions. Together, these data reveal that lineage-specific CNVs are inherent features of the TGC genome, which are established and regulated throughout placental development.

Results

Polyploid TGCs have recurrent and reproducible CNVs

To investigate whether the 50–1,000 genomic copies in polyploid TGCs are uniformly replicated or contain CNVs, we used aCGH to compare genomic regions of mouse parietal TGCs (TGCs) and 2N embryos at e9.5 (Figure 1A, Figure S1A). We dissected four embryos and associated TGCs from one litter, representing pairs of genetically identical tissues, performed aCGH using the Agilent SurePrint G3 Mouse CGH Microarray Kit (two embryos/TGCs pooled per biological replicate), and analyzed the data using the R/Bioconductor package cghFLasso [21]. We identified 45 regions, reproducible between biological replicates, that were underrepresented within the TGC genome compared to the embryonic genome at a false discovery rate (FDR) of 0.0001, which we termed underrepresented (UR) domains (Figure 1B, Table S1). UR domains range in size from 1,037 kb to 9,429 kb (Table S1). In addition to the 45 UR domains common to both replicates, we found 30 domains specific to only one replicate (Figure 1B). However, when we reduced the FDR (to 0.01), 19/30 of these domains are found in both replicates, suggesting that while the degree of underrepresentation varies, UR domains form in specific regions of the genome. Importantly, we did not observe any overrepresented regions in TGCs (FDR = 0.0001).

**Fig. 1. UR domains are specific to TGCs.**

We next asked whether UR domains were specific to TGCs, or whether they existed in diploid trophoblast cells or other endocycling polyploid cells. We used aCGH to compare the DNA of megakaryocytes (up to 64N) to embryos, placental disk cells (mostly 2N) to embryos, and cultured trophoblast stem cells (TS cells; 2N) to embryonic stem cells (ES cells; Figure 1C, Figure S1B, Figure S2). Megakaryocytes have no detectable underrepresented regions and display one region of overrepresentation common to both replicates, indicating that TGC UR domains are not simply explained by endocycling (FDR = 0.0001; Table S2). Placental disk cells lack any over or underrepresentation (FDR = 0.0001; Table S3), although greatly reducing the FDR (to ≥0.05) revealed a weak trend towards UR domains in the same locations as in TGCs, likely explained by the normal presence of a small number of TGCs within this population (Figure 1C, Figure S2). Finally, we identified several TS and ES specific CNVs, but these were different from the TGC UR domains and presumably represent adaptations to cell culture (Tables S2 & S3) [22]. These data suggest that UR domains are important genomic features unique to TGCs.

As Sher et al. [19] have argued against the existence of CNVs in e9.5 TGCs, we compared our aCGH data to theirs. Consistent with Sher et al., we did not find any CNVs in their data using the R/Bioconductor package cghFLasso and an FDR of 0.0001 [21]. However, greatly reducing the FDR (to >0.05) revealed a trend towards UR domains in the same locations as in our TGC data (Figure S3), similar to the report by Sher et al. of finding reduced copy number using a smaller threshold. Moreover, the Sher et al. data bears a striking resemblance to our placental disk data (Figure S3), suggesting that their study, on implantation site TGCs, is on a population of trophoblast cells more akin to the placental disk than to the parietal TGCs of the mural trophectoderm described in our study. In support of this, while parietal TGCs surround the entire conceptus, TGCs over the central region of the placental disk are smaller and less polyploid than those at the periphery [20]. Together, these data suggest that the parietal TGCs of the mural trophectoderm not only have a higher degree of ploidy, but also have specific CNVs compared to the rest of the placenta.

Whole genome sequencing reveals UR domains in individuals

To quantitatively examine the extent of underrepresentation in TGCs, we performed paired-end WGS [23]. We sequenced (at 10× coverage) six individual e9.5 TGCs and their genetically matched embryos from three separate litters (2 individuals per litter; Table S4). To identify CNVs, we used a custom R/Bioconductor program based on CNVnator [24], which identifies CNVs at a p-value of 0.01. We found 47 reproducible UR domains on the autosomes in e9.5 TGCs in all samples (Table S5). UR domains range from 75 kb to 8,965 kb and cover 6% of the genome (138 Mbs of 2,717 total Mbs; Table 1). We next calculated the fold depletion of each UR domain from the normalized log 2 ratio of sequence coverage of TGC/embryo [25] and found an average reduction between 27% and 51%, with a median between 28% and 54% (Table 1). Further, the size and degree of depletion of UR domains correlate such that the larger the size of the domain, the greater the degree of underrepresentation (Figure 2A).

**Fig. 2. e9.5 UR domains characteristics.**

Next, we examined how much variation existed between individuals. First, we compared aCGH and WGS data, and found 43 UR domains common to both platforms (Figure 2B, Table 1, Figure S4, Table S1). Of the domains that differ, five additional domains in the WGS data are likely due to the greater sensitivity of WGS, as these domains can also be found in the aCGH data if the FDR is lowered (to 0.01). Three additional domains in the aCGH data are found in a majority of the WGS samples (present in four to five out of the six samples), suggesting a small amount of variability in UR domain formation (Tables S1 & S5). To examine this variability in more depth, we examined the six individual WGS samples. Besides the 47 UR domains common to all six samples, we also found underrepresented regions present in only a subset (Figure 2C, Figure S5, Table S5). In general, samples with the least number of UR domains have a subset of the domains found in the samples with the most (Figure 2C, Figure S5, Table S5). In addition, the size of a particular UR domain is generally smaller in samples with fewer UR domains (Figure 2D, Table S5). As the samples vary slightly in age, this suggests that UR domains amass over time, such that slightly younger placentas have fewer and smaller UR domains.

The number, size and degree of depletion of UR domains expands during early gestation

To test our hypothesis that UR domains develop over time, we performed WGS on e8.0 TGCs/embryos (one litter per replicate) and compared these results to e9.5. We found 24 domains common to both biological replicates at e8.0, versus 47 domains common to all samples at e9.5 (Figure 3A & 3B, Figure S6). All e9.5 individuals have 23 of these domains with 5/6 individuals containing the remaining domain (Figure 3B). We also found 10 domains unique to one of the two biological replicates at e8.0; 10/10 of these domains are contained in all e9.5 individuals (Figure 3B). Finally, we found that both size and degree of depletion of UR domains significantly increase between e8.0 and e9.5 (Figure 3C). Overall, as all UR domains at e9.5 are also present at e8.0, and UR domains at e9.5 are also more numerous, larger and more depleted, we propose that they are gradually established during early gestation.

**Fig. 3. Degree of underrepresentation varies over developmental time.**

New small and stochastic CNVs form in later gestation

We next asked whether the number and degree of depletion of UR domains continues to increases throughout development. We performed aCGH on TGCs/embryos collected from the second half of gestation—e11.5, e13.5, e16.5—and compared them to e9.5. Out of 45 UR domains present in both biological replicates at e9.5 (FDR = 0.0001), 22 of these are present in all biological replicates at e11.5, e13.5 and e16.5, and an additional 10 (32/45) are present in all samples except for one of the e16.5 replicates (Figure 3D & 3E, Figure S7). We next examined size, and found that the 32 common domains are significantly larger than UR domains that arise later in development (the 147 not present at e9.5; Figure 3D & 3E, Figure S7). However, unlike between e8.0 and e9.5, where the degree of depletion expanded, we found no significant change from e9.5 to e16.5 (Figure 3F). Although, UR domains slightly trend towards becoming less depleted over time (Figure 3D & 3F, Figure S7). There is also more intrinsic variability later in gestation, as the median degree of depletion between biological replicates at both e13.5 and e16.5 is significantly different (Figure 3F). The differences between UR domains in early (e8.0–e9.0) and later (e11.5–e16.5) gestation correlate with previous data showing that TGC polyploidy drastically increases until e10.5, and endocycling ends by e13.5 [20]. These data suggest that the increase in UR domain size and degree of underrepresentation from e8.0 to e9.5 is linked to the robust endocycles of early gestation. Furthermore, the termination of endocycles in later development may free cellular machinery to increase representation levels in UR domains.

We also found 33 overrepresented regions at e11.5–e16.5 that are not present at e9.5 (Figure 3D & 3E, Figure S7). We examined gene content of overrepresented regions common to at least two staged biological replicates (10/33), but did not find any annotated genes. Thus, while new CNV regions form during late gestation, they are more stochastic, less reproducible, and significantly smaller than those conserved between all stages.

UR domains form during in vitro differentiation

We next examined whether UR domains are also generated in vitro when differentiating TS cells into TGCs. To this end, we performed aCGH on purified TGCs harvested at 3, 5 and 7 days after differentiation [26]–[28] (Figure S8). Similar to in vivo, in vitro cells generate the same UR domains and also develop these over time (FDR = 0.0001, Figure 4A & 4B, Figure S8). At day 3, only one biological replicate has any of the UR domains found in vivo at e9.5 (3/45). At day 5, both replicates contain 1/45 domains, and one replicate contains 21/45 domains. At day 7, both replicates contain 34/45 UR domains, and one replicate contains 43/45 domains. Remarkably, in vitro cells generate the same UR domains as their in vivo counterparts (Figure 4A & 4B, Figure S8), strongly suggesting that the formation of these UR domains is a fundamental feature of TGC development.

UR domains have low gene content and expression both <i>in vivo</i> and <i>in vitro</i>. — **Fig. 4. UR domains have low gene content and expression both *in vivo* and *in vitro*.**

UR domains are highly enriched for genes involved in cell adhesion and neurogenesis

Next, we asked whether genes contained within e9.5 TGC UR domains were enriched for certain biological functions. We found that UR domains are significantly depleted of both protein-coding and non-coding genes as expected by chance (386 observed vs. 617 expected, 0.63× enrichment, p<0.001) and when compared to the rest of the genome (Figure 4C). Further, these domains are significantly enriched for 1 Mb gene deserts (regions without any Ensembl annotations; 47 observed vs. 9 expected, 4.96× enrichment, p<0.001). In total, 386 genes are present within UR domains, 106 of which are functionally annotated. When we examined these 106 genes for function using GOTERMFINDER [29], the top enrichment categories are biological adhesion (p = 2.31×10⁻⁹) and related categories, followed by neuron projection development (p = 4.23×10⁻⁸), and related neurogenesis categories. These categories were not enriched when we performed the same analyses on a list of genes found in a random set of regions that have the same length and chromosome distribution. Finally, using 3′ RNA-Seq (3SEQ) [30] from both in vivo and in vitro TGCs, we compared expression of the genes to the degree of representation and found that genes in UR domains are either not expressed or have much lower levels of transcription than genes in regularly represented regions (Figure 4D & 4E). Overall, our data show that there are specific classes of genes enriched within the UR domains and these genes are generally not expressed, raising the possibility that UR domains function to limit the expression of a particular subset of genes in TGCs.

UR domains are heterochromatic

To test whether UR domains are characterized by a specific chromatin state, we performed ChIP-Seq using anti-H3K27ac, anti-H3K4me1, anti-H3K4me3, anti-H3K9me3, and anti-H3K27me3 in both in vitro TS cells and derived TGCs [31]. We used MACS2 to determine the normalized fold change for histone occupancy [32] and then used the Pearson correlation (R) to determine how the degree of representation (normalized log 2 of e9.5 WGS) correlates with signals from histone marks. In both TGCs and TS cells, we find that UR domains tend to co-localize with the repressive marks H3K9me3 and H3K27me3 (Figure 5). Conversely, UR domains have underrepresentation of the active chromatin marks H3K4me3, H3K4me1 and H3K27ac (Figure 5). These results demonstrate that UR domains do not occur in active regions of the genome and that they are marked in the 2N progenitor cells (TS cells). Interestingly, UR domains are only a fraction of genomic heterochromatin (Figure 5B & 5C). All UR domains have increased signals for repressive histone marks and only weak signals for active histone marks. However, not all regions of the genome having repressive marks but not active marks are associated with a UR domain. Overall, this demonstrates that UR domains have a heterochromatic signature, but represent only a subset of heterochromatin.

We further examined the relationship between UR domains and heterochromatin using an alternative statistical method. We asked whether the histone marks are significantly enriched or depleted in our defined list of UR domains compared to what would be expected by chance [31]. Similar to our correlation analysis, marks associated with transcriptional activation (H3K4me3, H3K4me1 and H3K27ac) are significantly depleted in UR domains (p<0.001; Table 2). Conversely, the repressive mark H3K9me3 is enriched within UR domains (p<0.001; Table 2). Interestingly, while the repressive mark H3K27me3 is also enriched within UR domains in TS cells, it is depleted within UR domains in TGCs (p<0.001; Table 2). This observation agrees with previous data where extraembryonic cells have lower levels of H3K27me3 methylation than embryonic cells [33], and suggests that H3k27me3 is not critical for UR domain maintenance. Together, our data show that UR domains have a heterochromatic signature, both in TGCs and in their 2N progenitors.

UR domains are not caused by deletions

To examine whether UR domains are caused by genomic deletions, we carried out somatic structural variant analysis using paired-end sequencing data from the six TGC and matched embryo samples with the program SMASH [34]. If UR domains are caused by acquired genomic deletions, we would expect to find multiple library inserts that fully span the deleted regions (“discordant” paired-end reads; Figure S9). While we did detect sample-specific CNVs, we did not detect somatic deletions common to all of the six TGCs, but not the embryos. Moreover, the probability of not detecting a given deletion in each of the six samples is extremely low (p = 2•×10⁻⁵). These data show that UR domains are not a result of somatic chromosomal deletions.

UR domains are late-replicating chromosomal segments

Since our WGS data does not support genomic deletions as the source of UR domains, we investigated whether they may be due to underreplication (Figure S9B). In 2N cells, replication timing is precisely regulated such that specific regions of the genome are replicated early in S phase while others are replicated late in S phase [35]. To test whether UR domain formation is caused by incomplete replication of regions that are normally replicated late in 2N TS cells, we first generated a replication timing profile of TS cells. To this end, we captured early -⁠ and late-replicating regions in TS cells by pulsing an asynchronous cell culture with BrdU to label replicating DNA followed by FACS, and then used aCGH to compare early and late BrdU-containing DNA [36]. Next, we compared late-replicating regions in TS cells to UR domains. Using the Pearson correlation (R), we found that UR domains correlate with late replication (Figure 6A). Also, 47/47 TGC UR domains reside within late-replicating regions in TS cells (Figure 6B, Table S6). UR domains are significantly smaller than the late-replicating regions that they are nested in (Figure 6C; Table S6), suggesting that they are a subset of these larger regions.

**Fig. 6. TGC UR domains are a subset of late-replicating regions in TS cells.**

Finally, as only 45 of the 211 late-replicating regions contain a UR domain (Figure 6D, Table S6), we asked what distinguishes the late-replicating regions that form UR domains from those that do not. While there is no significant difference in the degree of late replication between these classes, late-replicating regions that contain UR domains are significantly larger (Figure 6E). However, size is not the sole characteristic determining where UR domains form, as not all regions greater than a certain size contain a UR domain. We next investigated gene content and found that late-replicating regions that contain UR domains also contain significantly fewer genes than those that do not (Figure 6F). These regions are also preferentially enriched for 1 Mb gene deserts (58 observed vs. 18 expected, 3.16× enrichment, p<0.001). Together, our data show that UR domains form from a specific class of late-replicating, heterochromatic regions with low gene content, suggesting that UR domains are not simply a byproduct of late-replicating heterochromatin, but are a precisely regulated subset.

Discussion

We report here the first mammalian example, outside of the immune system, of lineage-specific CNVs being an integral part of normal cell biology and development. Notably, we show that CNVs in placental cells form via a novel mechanism unrelated to V(D)J recombination. Using both aCGH and high-throughput WGS, we identified 47 reproducible underrepresented domains in mouse parietal TGCs totaling 138 Mbs, or 6% of the genome. We found that UR domains are highly enriched for genes involved in cell adhesion and neurogenesis, as well as for gene deserts. Furthermore, we specifically show that UR domains are due to underreplication of a specialized type of heterochromatin, rather than acquired genomic deletions. Our data reveal that lineage-specific CNVs are a normal aspect of the TGC genome that are established and regulated during gestation.

Establishment of UR domains may involve a novel chromatin remodeler

Only a subset of heterochromatic, late-replicating regions form UR domains, suggesting that UR domains are not simply a byproduct of late-replicating heterochromatin, but are precisely regulated. We propose that either this is dictated by genomic structure or that there are specific DNA binding proteins that define UR domains. We favor the latter model based on parallels found in Drosophila, whereby mutants for Suppressor of Underreplication (SuUR) have underreplicated domains that become replicated to normal levels [12], [13], [37]. However, SUUR protein does not appear to be present in species outside the Drosophilids, and we have not found any SuUR homologs in mice via BLAST, raising the possibility that presently unknown proteins in mammals may be regulating this process.

Lineage-specific CNVs in mammalian development

Lineage-specific CNVs are an overlooked aspect of the mammalian genome. Although recent data suggests that they are widespread [1]–[5], their identification and functional study has not been carried out systematically. Identification of CNVs may be particularly difficult to define in primary tissues, due to high background of cells lacking CNVs. In support of this, Abyzov et al. [4] found a low frequency of somatic CNV in human fibroblasts. Further, even in more homogenous populations, relatively small degrees of CNV may mask their presence. Van Heesch et al. [38] found tissue-specific CNVs in rat blood, brain, liver and testis, where the degree of underrepresentation does not exceed 50%. While Van Heesch et al. conclude that their findings were the result of systematic bias in DNA isolation procedures, they could never get rid of these CNVs using any analytical or experimental approach. Moreover, Manukjan et al. [39] suggest that Van Heesch et al. are identifying the signature of replication timing in their CNV analyses due to the use of proliferating cells. Intriguingly, this suggests that, analogous to polyploid TGCs in the placenta, underreplication may be crucial in organs containing a highly proliferative population of 2N cells.

Convergent evolution of CNVs in flies and mice suggests function

While CNVs in Drosophila polyploid cells have been characterized for more than 70 years [14], our work demonstrates for the first time that CNVs are a normal aspect of mammalian development. The rarity of endoreplicating polyploid cells in animals suggests that CNVs in mouse and Drosophila arose independently [6], and therefore may have species-specific differences. While Drosophila CNVs are typically 90% underrepresented, mouse CNVs are never more than 50%. We strongly suggest that there are UR domains in both mouse and Drosophila polyploid cells, and that the presence of these domains in both taxa is an example of convergent evolution due to similar selective pressures, indicative of functional importance. As both mice and flies have a fast rate of early development compared to related species, formation of UR domains could be an integral part of accelerating the cell cycle, and therefore be a key mechanism behind their rapid life cycles.

UR domains as a mechanism to drive TGC function

UR domains are a unique feature of the TGC genome, suggesting that they play a central role in placental function and pregnancy. Consistent with this, UR domains are enriched for specific classes of genes involved in cell adhesion and neurogenesis. Intriguingly, there is evidence that downregulation of both classes of proteins is crucial for placental function. Downregulation of cell adhesion genes is necessary for trophoblast invasion in both mice and humans [40], [41]. Further—and quite remarkably—Liao et al. [42] found that upregulation of genes in the SLIT/ROBO neuronal guidance system in the human placenta is associated with the pregnancy disease pre-eclampsia. UR domain formation could also enable TGCs to simply save materials and time, a hypothesis that has been proposed for polyploidy in general [43]. TGCs are essential during the first half of gestation, when it is absolutely critical for the rapidly growing embryo to establish a connection with the mother [15], [44]. Formation of UR domains could allow for more rapid maturation of TGCs by allowing replication initiation to proceed without waiting for replication of nonessential regions of the genome. In support of this, UR domains represent a significant part of the genome, 6% (138 Mbs of 2,717 total Mbs), and therefore the cell would require considerable resources to fully replicate these regions. Together, functional evidence and convergent evolution suggest that UR domains are a critical element during pregnancy. Regardless, placental UR domains are the first mammalian example, outside of the immune system, of lineage-specific CNVs being an integral part of normal cell biology and development.

Materials and Methods

Ethics statement

All animal work has been conducted according to relevant U.S. and international guidelines. Specifically, all experimental procedures were carried out in accordance with the Administrative Panel on Laboratory Animal Care (APLAC) protocol and the institutional guidelines set by the Veterinary Service Center at Stanford University (Animal Welfare Assurance A3213-01 and USDA License 93-R-0004). Stanford APLAC and institutional guidelines are in compliance with the U.S. Public Health Service Policy on Humane Care and Use of Laboratory Animals. The Stanford APLAC approved the animal protocol associated with the work described in this publication.

Mice

129-Elite, C57BL/6 and pregnant C57BL/6 mice were obtained from Charles River. Copulation was determined by the presence of a vaginal plug the morning after mating, and embryonic day 0.5 (e0.5) was defined as noon of that day. TGCs and embryos were dissected in 1× PBS (1∶10 10× PBS, pH = 7.4; Gibco) and stored on ice until further processing. After removal of the decidua, parietal TGCs of the mural trophectoderm [15] were dissected away from the placental disk, and, when possible, Reichert's membrane (Figure S1A). TGCs were identified by their extremely large cell size (Figure 1A). Using single-nucleotide polymorphism data from F1 crosses, TGCs were predicted to have, at the most, approximately 5% contamination by maternal cells (Hannibal & Baker, unpublished data). Placental disk tissue was gathered from e13.5 placental disks after the removal of the decidua and obvious parietal TGCs. For gathering 2N genomic DNA, at e8.0, the entire embryo was collected; at e9.5, the embryo body, after removal of obvious organs and head (removed at otic vesicle), was collected; and at later stages, limbs, or a mixture of limbs and the tail, were collected (Figure S1A).

Nuclear staining

For confocal imaging, TGCs/embryos were fixed in 4% paraformaldehyde at 4°C overnight. Samples were stained with 0.5 µg/mL DAPI (Life Technologies) in 1× PBS overnight, washed in 50% glycerol/1× PBS and stored in 70% glycerol/1× PBS. Confocal images were taken on a Leica DM IRE2 inverted microscope using the Leica SP2 software package, located in the Stanford Cell Sciences Imaging Facility.

Cell culture

Trophoblast stem cells were cultured as described in Chuong et al. [31] following [27]. TS cells were differentiated into parietal TGCs by replacing the FGF, Activin and Heparin in the media with retinoic acid [27], [28]. Mature TGCs are seen after 4–6 days of differentiation [26] and were collected on days 3, 5 and 7. TGCs/TS cells were further isolated for aCGH by placing cultured cells over a two-step density gradient (1.5% BSA over 3% BSA in a 15 mL tube; Figure S1B). TGCs sank to the bottom of the tube while the smaller TS cells stayed in the upper fraction.

The embryonic stem cell line CGR8 is a germ-line competent cell line established from the inner cell mass of a 129 e3.5 male pre-implantation embryo [45]. ES cells were cultured feeder-free on 0.1% gelatin coated plates. The ES cell medium was prepared by supplementing knockout DMEM (Invitrogen) with 15% FBS, 1 mM glutamax, 0.1 mM nonessential amino acids, 1 mM sodium pyruvate, 0.1 mM 2-mercaptoethanol, penicillin/streptomycin, and 1000 units of leukemia inhibitory factor (LIF; Millipore). Cell culture was maintained at 37°C with 5% CO2.

Megakaryocytes were derived and cultured as described in [46]. Briefly, fetal livers were dissected from e13.5 C57BL/6 embryos in Hanks' Balanced Salt Solution and placed in DMEM with 10% FBS supplemented with 100 ug/mL penicillin-streptomycin (Invitrogen). Livers were pooled based on sex of the embryo (males pooled and females pooled separately). To make a single cell solution, livers were aspirated through a progression of 18G, 21G and 23G needles. To promote differentiation into megakaryocytes, cells were cultured for five days in media containing thrombopoietin (TPO; R&D Systems) at 37°C with 5% CO2. Successful differentiation was identified by 1) the presence of large cells (megakaryocytes) and by 2) FACS to confirm up to 32N ploidy. For FACS, propidium iodide stained samples were run on a Cytek DxP10 modified Facscan (Cytek Technologies, BD Biosciences) using the blue laser. Approximately 10,000 events per sample were collected. Data was analyzed using FlowJo (Treestar, Inc.). Megakaryocytes were isolated for aCGH by placing cultured cells over a two-step density gradient (1.5% BSA over 3% BSA in a 15 mL tube; Figure S1B). Megakaryocytes sank to the bottom of the tube while smaller, undifferentiated, cells stayed in the upper fraction.

ArrayCGH and whole genome sequencing

Genomic DNA was extracted from fresh tissue and cultured cells using the DNeasy Blood & Tissue Kit (Qiagen). Before column purification, in vivo and in vitro samples were digested with proteinase-K (600 mAU/ml solution or 40 mAU/mg protein) overnight and for 10 minutes, respectively, at 56°C, followed by a 4 minute incubation with RNase A (100 mg/mL; Qiagen DNeasy Blood & Tissue Kit). If necessary, DNA was further concentrated via ethanol/sodium acetate precipitation following standard protocols.

For arrays performed on DNA from TGCs, placental disks and embryonic controls, genomic DNA from two individuals in the same litter were pooled for each condition. For megakaryocyte arrays, cells derived from 5–6 livers from a single litter were pooled for each condition. For controls for the megakaryocyte array, three embryos (subset of the litter from which livers were collected from) were pooled for each condition. For arrays performed on DNA from cultured cells, two replicates from different passages were used (5 million cells each). For each condition, approximately 4 µg DNA was sent to the Biomedical Genomics Core at the Research Institute at Nationwide Children's Hospital (Columbus, OH) for processing with the SurePrint G3 Mouse CGH Microarray Kit, 4×180 k (Agilent). For all arrays performed on DNA from in vivo tissue, to ensure that the arrays detect copy number variation, duplicates consist of 1) female test versus male control and 2) male test versus female control.

aCGH data was analyzed using the R/Bioconductor package cghFLasso, which utilizes reference arrays in conjunction with a FDR [21]. An FDR of 0.0001 was used in order to examine all of the autosomes simultaneously. To determine which array to use as the reference, several analyses were performed. The TS versus ES array exhibited specific CNVs, presumably due to genomic adaptations to culturing [22]. The megakaryocytes displayed only a small region of overrepresentation and the placental disk array did not display any CNVs (FDR = 0.0001). However, as the placental disk has a small amount of underrepresentation in reproducible areas of the genome (FDR≥0.05), the megakaryocyte array was used as the reference for the remainder of the analyses. aCGH data was plotted using cghFLasso [21]. For comparison with data from Sher et al. [19], data was retrieved from Gene Expression Omnibus series: GSE45787. To compare aCGH data from Sher et al. to data presented here, results were plotted using a custom R/Bioconductor program.

For WGS, for stages e9.5 and older, genomic DNA from one individual was used for each replicate, and for stage e8.0, 5–7 individuals from one litter were used for each replicate. Libraries for WGS were prepared from 40–50 ng genomic DNA using the Nextera TruSeq Dual Index Paired End Kit (Illumina) following manufacturer's instructions with the following modification: the Qiagen MinElute Reaction Cleanup Kit (Qiagen) was used to cleanup Tagmented DNA. Library quality was assessed using Qubit and Bioanalyzer, and sequenced on the Illumina HiSeq 2000 at approximately 10× coverage (Table S4) at the Stanford Center for Genomics and Personalized Medicine. 101 bps from each of the paired-ends were sequenced and sequencing reads were aligned using either the DNAnexus mapper [47] or the Novocraft Novoalign program against the mouse reference genome (mm9). Data was analyzed using custom R/Bioconductor programs and SMASH [34]. To compare aCGH versus WGS data, results were plotted using a custom R/Bioconductor program.

The final UR domain list was generated using e9.5 WGS data and a custom R/Bioconductor program with the following criteria: neighboring data points with normalized log 2 ratio of TGCs/embryo ≤−0.3. These criteria were decided upon based on the program CNVnator [24], which, while identifying UR domains with both large and small degrees of underrepresentation at a p-value of 0.01, systematically missed UR domains that are closely spaced together, which our program rectifies.

Enrichment statistics

To calculate the significance of overlap between datasets, a binomial test was used to determine whether the observed overlap for the datasets was significantly greater than an expected overlap based on the average of 1,000 randomized datasets [31]. To randomize each dataset, regions were shuffled within bins according to their chromosomal distribution and distance from gene transcriptional start sites (including 1 kb, 10 kb, 100 kb, 1,000 kb, and >1,000 kb bins).

3SEQ

Total RNA was extracted from fresh in vivo tissue by homogenizing in TRIzol Reagent (Life Technologies/Ambion) and total RNA was prepared following manufacturer's instructions. Total RNA from three individuals from the same litter were combined to make each library. mRNA was isolated from 10–20 µg of total RNA using Dynabeads Oligo(dT)₂₅ (Life Technologies/Ambion). 3SEQ Libraries were prepared from mRNA following [30]. Briefly, mRNA was heat sheared for 7.5 minutes to produce an average fragment size range of 100–400 bp, then used to generate cDNA libraries using a custom oligo dT primer containing Illumina-compatible adapter sequence. cDNA fragments were end-repaired and ligated to standard Illumina adapters. Size-selection was performed using E-gel SizeSelect agarose gels (Invitrogen), products were PCR amplified for 15 cycles and purified using Ampure XP beads (Beckman Coulter). Library quality was assessed using Qubit and Bioanalyzer, and sequenced on the Genome Analyzer IIx at the Stanford Center for Genomics and Personalized Medicine.

Total RNA was extracted and 3SEQ libraries were constructed for cultured TGCs as described in Chuong et al. [31]. Two replicates from different passages (10 million cells each) were used. 3SEQ data for TS cells was retrieved from Gene Expression Omnibus series: GSE42207 [31].

Sequences were aligned to the mouse (mm9) genome using the DNAnexus mapper [47] and raw counts for sense reads were analyzed using Unipeak 1.0 [48]. Regions of transcription were associated with the nearest ENSEMBL gene 3′ UTR within 5 kb. Data were normalized and expression levels were analyzed using the R/Bioconductor package DESeq [49].

ChIP-seq

ChIP-seq and ChIP-seq analysis were performed as described in Chuong et al. [31] using the ChIP Assay kit (Millipore) following manufacturer's instructions. Briefly, 20 million cultured TGCs were cross-linked in 2% formaldehyde for 15 minutes, and sonicated for 12 cycles (30 seconds on/off) at 60% amplitude to produce a fragment range of 300–600 bp. Immunoprecipitation was performed with 2–5 µg of antibody (H3K4me3: ActiveMotif, 39159; H3K27me3: ActiveMotif, 39535; H3K27ac: Abcam, ab4729; H3K9me3: Abcam, ab8898; H3K4me1: Abcam, ab8895) conjugated to 50 µl of protein G Dynabeads (Invitrogen) overnight. Following washing and elution of DNA per manufacturer's instructions, libraries were prepared using the Illumina genomic DNA preparation kit using barcoded linker adapters, and sequenced on the Illumina HiSeq 2000 at the Stanford Center for Genomics and Personalized Medicine. ChIP-Seq data for TS cells was retrieved from Gene Expression Omnibus series: GSE42207 [31].

High-quality reads were aligned to the mm9 genome assembly using BWA 0.5.9 [50], retaining only unique alignments. Peaks were called using MACS2 2.0.10 [32]. The “bigwig_correlation” script from the Cistrome package [51] was used to generate genome-wide correlation plots between ChIP profiles and underrepresented profiles.

Replication timing

Cultured TS cells were incubated for two hours at 37°C in the dark with a final concentration of 100 µM BrdU (Sigma Aldrich B5002). Genome-wide replication timing was analyzed as previously described [36]. Briefly, cells were dissociated into a single-cell suspension and nuclei were isolated. DNA was subsequently stained with propidium iodide and cells were FACS sorted into early and late S-phase fractions based on their DNA content. DNA from early and late S-phase fractions was purified by immunoprecipitation of the BrdU-substituted nascent DNA (BrdU-IP). Three replicates from different passages (two million cells each) were used. Data was normalized following [36]. The R/bioconductor package DNAcopy was used to define replication timing domains based on the similarity in values (a constant value across a segment defines a domain) [36]. Regions called by DNAcopy were confirmed on the genome browser. The “bigwig_correlation” script from the Cistrome package [51] was used to generate genome-wide correlation plots between replication timing profiles and underrepresented profiles.

Accession codes and data availability

SuperSeries Gene Expression Omnibus (GEO) accession number for aCGH, 3SEQ, ChIP-Seq, and replication timing data: GSE50585.

Smoothed replication timing data can also be found at: http://www.replicationdomain.com/

BioProject accession number for WGS: PRJNA213010

Supporting Information

Zdroje

1. LupskiJR (2013) One Human, Multiple Genomes—Genome Mosaicism. Science 341 : 358–359 doi:10.1126/science.1239503

2. PoduriA, EvronyGD, CaiX, WalshCA (2013) Somatic mutation, genomic variation, and neurological disease. Science 341 : 1237758 doi:10.1126/science.1237758

3. JacksonKJL, KiddMJ, WangY, CollinsAM (2013) The shape of the lymphocyte receptor repertoire: lessons from the B cell receptor. Front Immunol 4 : 263 doi:10.3389/fimmu.2013.00263

4. AbyzovA, MarianiJ, PalejevD, ZhangY, HaneyMS, et al. (2012) Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492 : 438–442 doi:10.1038/nature11629

5. O'HuallachainM, KarczewskiKJ, WeissmanSM, UrbanAE, SnyderMP (2012) Extensive genetic variation in somatic human tissues. Proc Natl Acad Sci U S A 109 : 18018–18023 doi:10.1073/pnas.1213736109

6. EdgarBA, Orr-WeaverTL (2001) Endoreplication cell cycles: more for less. Cell 105 : 297 doi:10.1016/S0092-8674(01)00334-8

7. KimJC, NordmanJ, XieF, KashevskyH, EngT, et al. (2011) Integrative analysis of gene amplification in Drosophila follicle cells: parameters of origin activation and repression. Genes Dev 25 : 1384–1398 doi:10.1101/gad.2043111

8. Orr-WeaverTL (1991) Drosophila chorion genes: cracking the eggshell's secrets. Bioessays 13 : 97–105 doi:10.1002/bies.950130302

9. BelyakinSN, ChristophidesGK, AlekseyenkoAA, KriventsevaEV, BelyaevaES, et al. (2005) Genomic analysis of Drosophila chromosome underreplication reveals a link between replication control and transcriptional territories. Proc Natl Acad Sci U S A 102 : 8269–8274 doi:10.1073/pnas.0502702102

10. HammondMP, LairdCD (1985) Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91 : 267–278 doi:10.1007/BF00328222

11. HammondMP, LairdCD (1985) Control of DNA replication and spatial distribution of defined DNA sequences in salivary gland cells of Drosophila melanogaster. Chromosoma 91 : 279–286 doi:10.1007/BF00328223

12. NordmanJ, LiS, EngT, MacAlpineD, Orr-WeaverTL (2011) Developmental control of the DNA replication and transcription programs. Genome Res 21 : 175–181 doi:10.1101/gr.114611.110

13. SherN, BellGW, LiS, NordmanJ, EngT, et al. (2012) Developmental control of gene copy number by repression of replication initiation and fork progression. Genome Res 22 : 64–75 doi:10.1101/gr.126003.111

14. BridgesCB (1935) Salivary chromosome maps with a key to the banding of the chromosomes of Drosophila melanogaster. J Hered 26 : 60–64.

15. HuD, CrossJC (2010) Development and function of trophoblast giant cells in the rodent placenta. Int J Dev Biol 54 : 341–354 doi:10.1387/ijdb.082768dh

16. GengY, YuQ, SicinskaE, DasM, SchneiderJE, et al. (2003) Cyclin E ablation in the mouse. Cell 114 : 431–443 doi:10.1016/S0092-8674(03)00645-7

17. ParisiT, BeckAR, RougierN, McNeilT, LucianL, et al. (2003) Cyclins E1 and E2 are required for endoreplication in placental trophoblast giant cells. EMBO J 22 : 4794–4803 doi:10.1093/emboj/cdg482

18. OhganeJ, AikawaJ, OguraA, HattoriN, OgawaT, et al. (1998) Analysis of CpG islands of trophoblast giant cells by restriction landmark genomic scanning. Dev Genet 22 : 132–140 doi:10.1002/(SICI)1520-6408(1998)22 : 2<132::AID-DVG3>3.0.CO;2-7

19. SherN, Von StetinaJR, BellGW, MatsuuraS, RavidK, et al. (2013) Fundamental differences in endoreplication in mammals and Drosophila revealed by analysis of endocycling and endomitotic cells. Proc Natl Acad Sci U S A 110 : 9368–9373 doi:10.1073/pnas.1304889110

20. Sakaue-SawanoA, HoshidaT, YoM, TakahashiR, OhtawaK, et al. (2013) Visualizing developmentally programmed endoreplication in mammals using ubiquitin oscillators. Development 140 : 4624–4632 doi:10.1242/dev.099226

21. TibshiraniR, WangP (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9 : 18–29 doi:10.1093/biostatistics/kxm013

22. GrandelaC, WolvetangE (2007) hESC adaptation, selection and stability. Stem Cell Rev 3 : 183–191 doi:10.1007/s12015-007-0008-4

23. SkvortsovD, AbduevaD, CurtisC, SchaubB, TavaréS (2007) Explaining differences in saturation levels for Affymetrix GeneChip® arrays. Nucleic Acids Res 35 : 4154–4163 doi:10.1093/nar/gkm348

24. AbyzovA, UrbanAE, SnyderM, GersteinM (2011) CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21 : 974–984 doi:10.1101/gr.114876.110

25. ZhangY, HaraksinghR, GrubertF, AbyzovA, GersteinM, et al. (2013) Child development and structural variation in the human genome. Child Dev 84 : 34–48 doi:10.1111/cdev.12051

26. CarneyEW, PrideauxV, LyeSJ, RossantJ (1993) Progressive expression of trophoblast-specific genes during formation of mouse trophoblast giant cells in vitro. Mol Reprod and Dev 34 : 357–368 doi:10.1002/mrd.1080340403

27. ErlebacherA, PriceKA, GlimcherLH (2004) Maintenance of mouse trophoblast stem cell proliferation by TGF-ß/activin. Dev Biol 275 : 158–169 doi:10.1016/j.ydbio.2004.07.032

28. YanJ, TanakaS, OdaM, MakinoT, OhganeJ, et al. (2001) Retinoic acid promotes differentiation of trophoblast stem cells to a giant cell fate. Dev Biol 235 : 422–432 doi:10.1006/dbio.2001.0300

29. BoyleEI, WengS, GollubJ, JinH, BotsteinD, et al. (2004) GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20 : 3710–3715 doi:10.1093/bioinformatics/bth456

30. BeckAH, WengZ, WittenDM, ZhuS, FoleyJW, et al. (2010) 3′-end sequencing for expression quantification (3SEQ) from archival tumor samples. PloS One 5: e8768 doi:10.1371/journal.pone.0008768

31. ChuongEB, RumiMAK, SoaresMJ, BakerJC (2013) Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet 45 : 325–329 doi:10.1038/ng.2553

32. ZhangY, LiuT, MeyerCA, EeckhouteJ, JohnsonDS, et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9: R137 doi:10.1186/gb-2008-9-9-r137

33. Rugg-GunnPJ, CoxBJ, RalstonA, RossantJ (2010) Distinct histone modifications in stem cell lines and tissue lineages from the early mouse embryo. Proc Natl Acad Sci U S A 107 : 10783–10790 doi:10.1073/pnas.0914507107

34. ValouevA, WengZ, SweeneyRT, VarmaS, LeQ-T, et al. (2013) Discovery of recurrent structural variants in nasopharyngeal carcinoma. Genome Res 24 : 300–309 doi:10.1101/gr.156224.113

35. GilbertDM, TakebayashiSI, RybaT, LuJ, PopeBD, et al. (2010) Space and Time in the Nucleus Developmental Control of Replication Timing and Chromosome Architecture. Cold Spring Harb Symp Quant Biol 75 : 143–153 doi:10.1101/sqb.2010.75.011

36. RybaT, BattagliaD, PopeBD, HirataniI, GilbertDM (2011) Genome-scale analysis of replication timing: from bench to bioinformatics. Nat Protoc 6 : 870–895 doi:10.1038/nprot.2011.328

37. BelyaevaES, ZhimulevIF, VolkovaEI, AlekseyenkoAA, MoshkinYM, et al. (1998) Su(UR)ES: a gene suppressing DNA underreplication in intercalary and pericentric heterochromatin of Drosophila melanogaster polytene chromosomes. Proc Natl Acad Sci U S A 95 : 7532–7537 doi:10.1073/pnas.95.13.7532

38. van HeeschS, MokryM, BoskovaV, JunkerW, MehonR, et al. (2013) Systematic biases in DNA copy number originate from isolation procedures. Genome Biol 14: R33 doi:10.1186/gb-2013-14-4-r33

39. ManukjanG, TauscherM, SteinemannD (2013) Replication timing influences DNA copy number determination by array-CGH. BioTechniques 55 : 231–232 doi:10.2144/000114097

40. KokkinosMI, MurthiP, WafaiR, ThompsonEW, NewgreenDF (2010) Cadherins in the human placenta–epithelial–mesenchymal transition (EMT) and placental development. Placenta 31 : 747–755 doi:10.1016/j.placenta.2010.06.017

41. El-HashashAHK, KimberSJ (2006) PTHrP induces changes in cell cytoskeleton and E-cadherin and regulates Eph/Ephrin kinases and RhoGTPases in murine secondary trophoblast cells. Dev Biol 290 : 13–31 doi:10.1016/j.ydbio.2005.10.010

42. LiaoWX, LaurentL, AgentS, HodgesJ, ChenDB (2012) Human Placental Expression of SLIT/ROBO Signaling Cues: Effects of Preeclampsia and Hypoxia. Biol Reprod 86 : 111 doi:10.1095/biolreprod.110.088138

43. BarlowPW (1978) Endopolyploidy: towards an understanding of its biological significance. Acta Biotheor 27 : 1–18 doi:10.1007/BF00048400

44. RossantJ, CrossJC (2001) Placental development: lessons from mouse mutants. Nat Rev Genet 2 : 538–548 doi:10.1038/35080570

45. NicholsJ, EvansEP, SmithAG (1990) Establishment of germ-line-competent embryonic stem (ES) cells using differentiation inhibiting activity. Development 110 : 1341–1348.

46. ShivdasaniRA, SchulzeH (2005) Culture, expansion, and differentiation of murine megakaryocytes. Curr Protoc Immunol 22 : 6.1–22F doi:10.1002/0471142735.im22f06s67

47. DNAnexus Inc (2010) RNA-Seq/3SEQ Transcriptome Based Quantification.

48. FoleyJW, SidowA (2013) Transcription-factor occupancy at HOT regions quantitatively predicts RNA polymerase recruitment in five human cell lines. BMC Genomics 14 : 720 doi:10.1186/1471-2164-14-720

49. AndersS, HuberW (2010) Differential expression analysis for sequence count data. Genome Biol 11R: 106 doi:10.1186/gb-2010-11-10-r106

50. LiH, DurbinR (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25 : 1754–1760 doi:10.1093/bioinformatics/btp324

51. LiuT, OrtizJA, TaingL, MeyerCA, LeeB, et al. (2011) Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol 12: R83 doi:10.1186/gb-2011-12-8-r83