New MicroRNAs in —Birth, Death and Cycles of Adaptive Evolution

Download PDF České info

The origin and evolution of new microRNAs (miRNAs) is important because they can impact the transcriptome broadly. As miRNAs can potentially emerge constantly and rapidly, their rates of birth and evolution have been extensively debated. However, most new miRNAs identified appear not to be biologically significant. After an extensive search, we identified 12 new miRNAs that emerged de novo in Drosophila melanogaster in the last 4 million years (Myrs) and have been evolving adaptively. Unexpectedly, even though they are adaptively evolving at birth, more than 94% of such new miRNAs disappear over time. They provide selective advantages, but only for a transient evolutionary period. After 30 Myrs, all surviving miRNAs make the transition from the adaptive phase of rapid evolution to the conservative phase of slow evolution, apparently becoming integrated into the transcriptional network. During this transition, the expression shifts from being tissue-specific, predominantly in testes and larval brain/gonads/imaginal discs, to a broader distribution in many other tissues. Interestingly, a measurable fraction (20–30%) of these conservatively evolving miRNAs experience “evolutionary rejuvenation” and begin to evolve rapidly again. These rejuvenated miRNAs then start another cycle of adaptive –⁠ conservative evolution. In conclusion, the selective advantages driving evolution of miRNAs are themselves evolving, and sometimes changing direction, which highlights the regulatory roles of miRNAs.

Published in the journal: . PLoS Genet 10(1): e32767. doi:10.1371/journal.pgen.1004096
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1004096

Summary

Introduction

MicroRNAs (miRNAs) are a class of small, endogenous RNAs that regulate gene expression post-transcriptionally [1], [2]. Each miRNA gene is first transcribed as a stem-loop (hairpin) RNA structure, 70–90 nt in length in animals, and then processed in several steps into the ∼22-nt mature product, referred to as miR [3]. In animals, miR binds to the 3′ untranslated region (UTR) of target mRNAs through perfect base-pairing of the seed region (position 2–8 of a miR), inducing translation repression or mRNA degradation [4]. As the seed is only 7 nt long, each miRNA may potentially regulate hundreds of transcripts while each transcript may in turn be regulated by more than one miRNA [5].

The emergence of new miRNAs is of special interest in evolutionary biology for two reasons. First, they buffer gene expression noises and thus have been hypothesized to be a key player in canalization [6], [7]. As proposed by C. H. Waddington [8], [9], canalization contributes to developmental stability and, in a recent interpretation, it may also contribute to evolvability via hidden genetic variations [10], [11]. Second, due to their small size, miR-producing hairpins can form readily and de novo emergence of miRNAs from non-miRNA transcripts is a frequent phenomenon [12], [13]. There are hundreds of thousands of potential miRNA structures in each Drosophila genome [12] and millions in a mammalian genome [14]. Given such a propensity for new miRNAs to emerge, the birth, death and adaptation of new miRNAs are a significant part of understanding the evolution of transcriptional regulation [12]. In contrast, protein-coding genes require long open reading frames to yield functional peptides. Hence, local duplication or retrotransposition [15], rather than de novo origination, is the common mode for the formation of coding genes.

In Drosophila, the birth and death rates of miRNAs have been estimated to be about 12 and 11.7 genes per Myr, respectively, with a net gain of about 0.3 per Myr [12]. It is generally agreed that the net gain is low, ranging between 0.3 and 1 new gene per Myr [16], [17]. Despite this, the total repertoire of miRNAs should still be increasing dramatically over long periods. While the net gain (birth –⁠ death) is not in dispute, there is disagreement over the estimated birth and death rates of new miRNAs [12], [16], [17]. Because numerous putative miRNAs are found in the transcriptome, these lowly expressed, evolutionarily neutral, and short-lived miRNAs account for the bulk of the estimated births and deaths. The debate is about which ones should be counted as new miRNAs.

To resolve the issue, we propose to define new miRNAs in an evolutionary context by a set of stringent criteria, requiring a signature of initial adaptive evolution soon after their birth. Numerous small RNAs that emerge and vanish with the dynamics of neutral sequences are excluded from the evolutionary analysis. Given this definition, only a small fraction of miRNA-like sequences in any species would qualify as new miRNAs. We collected extensive small RNA-seq data available for four Drosophila species (D. melanogaster, D. simulans, D. pseudoobscura and D. virilis) [12], [16], [18]–[23] and three mosquitoes (Aedes albopictus, Aedes aegypti and Culex quinquefasiatus) [24], [25]. We further generated small RNA-seq data for sex organs and imaginal discs in D. simulans and D. pseudoobscura. The extensive dataset permits systematic identification of new miRNAs and in-depth analyses of their long-term fates.

Our first objective is to understand the origin and early evolution of new miRNAs in the species D. melanogaster. The second objective is to track the long-term evolutionary trajectory of new miRNAs, which may be in any of the following four modes after their initial adaptive evolution:

Evolving rapidly, driven by positive selection;
Transitioning between the initial adaptive phase and one of the two possible outcomes given below in 3) and 4). miRNAs in this phase may appear neutrally evolving;
Evolving conservatively and slowly after being assimilated into the transcriptional network;
Effectively dead after its structure degenerates and is no longer recognizable as an miRNA.

Results

From the D. melanogaster miRNA repository (miRBase Release 19.0, Ref. [26]), 238 miRNA genes, including 204 canonical miRNAs and 34 mirtrons, were evaluated for their expression levels by examining small RNA sequencing data from different tissues and developmental stages (Ref. [12], [16], [18]–[21], [23], see Table S1 and Materials and Methods). The phylogenetic distributions of the 238 miRNA genes in Drosophila (D. melanogaster, D. simulans, D. pseudoobscura and D. virilis), with mosquitoes (Aedes albopictus, A. aegypti and Culex quinquefasiatus) as the outgroup, were determined from the available small RNA libraries (Ref. [12], [16], [22], [24], [25], see Table S1 and Materials and Methods). In addition, we sequenced five additional libraries from D. simulans and D.pseudoobscura to ensure that all Drosophila species in this survey included samples from testes and ovaries. Genes represented by more than 200 reads per million (RPM) in at least one library were designated “highly expressed” (Table S2). The rest were denoted as “lowly expressed” miRNAs.

The 204 canonical miRNAs and 34 mirtrons have very different patterns in age and expression level. Table 1 shows the emergence time of each miRNA, which falls in the interval of 0–4, 4–30, 30–60, 60–250, and >250 Myrs before present as depicted in Fig. 1. More than half of the highly expressed, canonical miRNAs (71 out of 136) came from the oldest age group (>250 Myrs) but none of the mirtrons were from that group (Table 1), suggesting mirtrons contribute very little to miRNA repertoire over long periods of time. The result is consistent with previous findings that mirtrons have different evolutionary trajectories from canonical miRNAs [16]. The majority of the lowly expressed genes, both canonical miRNAs (60/68) and mirtrons (21/25), came from the young age group of 0–4 Myrs (Table 1), corroborating that lowly expressed miRNA genes are likely to be evolutionarily transient [12].

**Fig. 1. Origin of new miRNAs at different evolutionary periods.**

**Tab. 1. Number of miRNAs in the genome of <i>D. melanogaster</i> in different age groups.**

In this study, we will focus on the 136 highly expressed canonical miRNAs because, with respect to long-term evolution, they are the most significant class among the four categories of Table 1.

I. Birth of new miRNAs

Starting with the youngest genes, we first analyzed the 22 new miRNA genes that emerged in the last 30 Myrs, since D. melanogaster diverged from D. pseudoobscura (Fig. 1). Among them, 21 originated de novo; only miR-983-2 in D. melanogaster (dme-miR-983-2) was duplicated from another miRNA (dme-miR-983-1). More than half of the 22 new miRNAs are found in clusters –⁠ five in the miR-972 cluster (abridged as miR-972s), two in miR-310s and five in miR-982s. Members in a cluster have significantly higher expression levels than the orphan miRNAs (Mann-Whitney U test, p<0.05). The miR-982 cluster consists only of members emerging in the last 30 Myrs, whereas both miR-310s and miR-972s are mixtures of old and new miRNAs (Table S3). Thus, the former is most informative about the birth and early evolution of new miRNAs.

The miR-982s is X-linked, comprising five distinct miRNA families: miR-982, -2582, -303, -983 and -984. With the exception of the recently duplicated dme-miR-983-1/-2, miRNAs in this cluster do not share a seed sequence (Fig. 2A &B). Against the 12 Drosophila species [27], copies of this cluster can be found in D.simulans, D. sechellia, D. yakuba and D. erecta but are absent in all other more distantly related species. The expression of miR-982s members was confirmed by RT-PCR (Fig. S1). The evolution of this cluster in the D. melanogaster subgroup is depicted in detail in Fig. 2A.

**Fig. 2. The evolution of the structure of the miR-982s cluster.**

As shown in Fig. 2A, each member of miR-982s appears to emerge in situ from local non-miRNA sequences. Due to their small sizes, unstructured genomic sequences evolving into miRNA-like transcripts have often been suggested [28] but have not been convincingly proven. The cluster of miR-982/2582/303/983/984 appears to be a good example of de novo origin (see below) with point mutations improving miRNA processing step by step (Fig. 2B and Fig. S2 and S3). For example, the secondary structure of miR-982 in D. erecta can only form a poor hairpin (−18.20 kcal/mol). Many nucleotide substitutions, accumulated subsequently in the stem and loop regions, have greatly improved the thermodynamic stability of the hairpin in D.melanogaster (−24.00 kcal/mol) and in the three paralogs of D. simulans (−21.52 to −27.50 kcal/mol; Fig. 3A and Fig. S2 and S3).

Evolution of the secondary structure of members of miR-982s as predicted by RNAfold <em class="ref">[70]</em>. — **Fig. 3. Evolution of the secondary structure of members of miR-982s as predicted by RNAfold *[70]*.**

After each miRNA emerges from the unstructured sequence, gene duplication appears common [29], [30]. miR-2582 and miR-982 were expanded by whole-gene (Fig. 2A, Duplication 1, 2 and 3) or segment duplication (Duplication 4) in D. melanogaster and D. simulans, followed by gene conversion in D. sechellia (Fig. 2A). Moreover, miR-983 was duplicated in D. melanogaster (Duplication 5). In this species alone, miR-984 emerged de novo next to miR-983 (Fig. 2A).

These duplicates soon accumulated many nucleotide substitutions (Fig. 2B). Meanwhile, seed shifting and arm switching occurred in the miR-982/2582/303/983 families (Fig. 2B). These modifications presumably lead to new targets, resulting in neo-functionalization after gene duplication [28].

II. Early adaptive evolution of new miRNAs

After new miRNAs emerged de novo, the question is whether the subsequent evolution is driven by natural selection. A greater level of divergence in miRNA genes than in flanking regions might suggest positive selection (Ref. [31]; Fig. S4A). A proper analysis would require the comparison of between-species divergence (D) and within-species polymorphism (P) using a modified McDonald-Kreitman (MK) test [32].

In this study, we generated DNA sequences from 42 D. melanogaster (∼7.5 kb from each line) and 25 D. simulans lines (∼8.1 kb) (Table S4). The D/P ratios for each precursor miRNA from miR-982s, as well as the 1 kb upstream flanking regions, were compared [33]. As shown in Table 2, all the miRNA genes from the miR-982, miR-303 and miR-983 families have a significantly higher D/P ratio than the flanking regions in both D. melanogaster and D. simulans (p<0.05), suggesting positive selection. Members of the miR-2582 family show significantly higher D/P ratios in D. melanogaster, but not in D. simulans (Table 2, also see next section).

**Tab. 2. The McDonald-Kreitman test on individual miRNAs of the miR-982s cluster.**

Because each individual miRNA gene, being small, would yield a significant result in the MK test only when the selection is extremely strong, we also performed the test on new miRNAs collectively, relative to the genome-wide 4-fold degenerate sites (from Drosophila Population Genomics Project (DPGP); see Materials and Methods). Table 3 shows that the new miRNAs emerging in the last 30 Myrs have a higher D/P ratio than in the genome-wide 4-fold degenerate sites. In fact, more than 79% of the observed divergence in the precursors and more than 89% in the mature regions is estimated to have been fixed adaptively (see Materials and Methods and Table 3). A higher D/P ratio could also be attributed to an increase in selective constraint, rather than positive selection [34]. However, we excluded such possibility in Text S1. Due to the large number of adaptive sites, every new miRNA is likely to carry one or more of them. As expected, signatures of positive selection are much weaker for the lowly expressed miRNAs and mirtrons (Table S5).

**Tab. 3. The McDonald-Kreitman test on the entire group of miRNAs of the same age.**

Other lines of evidence for recent adaptive evolution include the pattern of polymorphism within species and the differentiation between populations. The miR-982 cluster was examined further by the sliding window analysis of Fay and Wu's H (θ_H), an estimator of nucleotide diversity sensitive to positive selection [35], [36]. The profile of θ_H peaks near miR-983/984 and miR-303 in both species, a common footprint of hitchhiking with positive selection [35]. The signature is stronger in D. simulans for miR-982 than in D. melanogaster (Fig. S4B and S4C). In addition, we analyzed the M and Z populations of D. melanogaster [37]–[39] using the Fst statistic [40]. For dme-miR-984 and dme-miR-303, the precursor sequences are strongly differentiated between M and Z lines (Fst = 0.318 for dme-miR-984 and Fst = 0.252 for dme-miR-303) compared to all SNPs within the miR-982s region (Mann-Whitney U test, p = 0.057 for dme-miR-984 and p = 0.068 for dme-miR-303, Table S6) or the 238 D. melanogaster miRNAs (Mann-Whitney U test, p = 0.046 for dme-miR-984 and p = 0.008 for dme-miR-303; data were obtained from DPGP2 [41], see Materials and Methods). The analyses collectively suggest that the rapid evolution of new miRNAs is driven by natural selection.

III. Death vs. integration after the initial adaptive evolution

After the initial adaptive evolution, one might reasonably expect these new adaptive miRNAs to be integrated into the transcriptional network and begin evolving at a slower rate. Surprisingly, the most likely fate of these new miRNAs was death, rather than integration. This can be seen in the number of observable new miRNAs from two different time periods –⁠ 22 surviving miRNAs from the last 30 Myrs but only 9 from the preceding 30 Myrs (30–60 Myrs before present).

By assuming a constant birth rate, we can estimate the number of newborn miRNAs in each time interval, which can then be compared with the surviving miRNAs from that time period. Using the estimated rate of 3 newborn miRNAs per Myr (12 in the last 4 Myrs), Figure 4A shows that 87% of new miRNAs disappeared in 4–30 Myrs (68 out of 78). The proportion of death in older miRNAs increased only marginally, to 90%, for the period of 30–60 Myrs (81 out of 90). Therefore, most miRNAs seem to die quickly at an early stage of evolution, soon after the initial adaptive evolution. Only 6.0% of new miRNAs (34 out of 570) survived after 60 Myrs. It is unexpected that new adaptive miRNAs favored by natural selection should suffer such quick and massive death, albeit at a somewhat lower rate than neutrally evolving new miRNAs [12]. The former has a survival rate of 6.0% while the latter has a lower rate, at 2.5% [12]. Apparently, the initial adaptation is evolutionarily transient and the continual adaptation toward integration is not a common fate. We should note that alternative explanations have been considered. A most obvious one concerns the possibility of a bust of adaptive new miRNAs in D. melanogaster since its split from D. simulans. These explanations are compared in Discussion.

Interestingly, miRNA death may sometimes be an adaptive process. The miR-2582-like gene in D. melanogaster is shown to be evolving adaptively in Table 2, but its evolution is toward degeneracy. Three lineage-specific mutations that disrupt the duplex structure are shown in Fig. 3B, probably associated with the degeneration of dme-miR-2582. Presumably, conditions changed causing the adaptive function initially performed by the new miRNA to become deleterious at a later time.

Upon survival, new miRNAs eventually became integrated into the transcriptional network and evolved conservatively. There is a transitional phase after the adaptive phase, but before either integration or death. During the transition, these miRNAs often appeared to have a neutral evolutionary rate. Figure 4A shows that all the surviving miRNAs began to evolve either neutrally or conservatively (three transitional and six conservative miRNAs, respectively) within 30–60 Myrs (See Materials and Methods). The miR-2582 gene in D. simulans appears to be in such a transition (Table 2). It is interesting that miR-2582 orthologs in sibling species may be at different stages of evolution.

IV. Cycles of adaptive-conservative evolution

Over long periods of time, new miRNAs will have died or have been integrated into the transcriptional network and are now conservatively evolving. miRNAs born 60–250 Myrs ago have largely vanished (94% have disappeared, see Fig. 4A). However, some of the cohort of the 34 surviving miRNAs are not behaving as expected. In fact, only 26 of them are evolving conservatively. Nearly a quarter of them (8 out of 34) are evolving either neutrally or adaptively (Fig. 4A) and most of these (7 out of 8) come from miR-972s or miR-310s (Table 4). At this rate of evolution, none of them should have been recognizable as homologs between D. melanogaster and D. virilis.

K<sub>miR</sub>/K<sub>S</sub> of the older miRNAs (60–250 Myrs) that have been evolving rapidly between <i>D. melanogaster</i> and <i>D. simulans</i>. — **Tab. 4. K_miR/K_S of the older miRNAs (60–250 Myrs) that have been evolving rapidly between *D. melanogaster* and *D. simulans*.**

We suggest that the 8 unusual miRNAs may have been conservatively evolving for most of their evolutionary history. Four of them have been adaptively evolving once again and the remaining four appear to be in transition, away from the previous selective constraints. If the hypothesis is correct, we expect to see stronger evolutionary conservation in more distant comparisons than in recent ones. We use K_miR/K_S, where K_miR denotes the divergence in the precursor region of the miRNA, to measure conservation. Table 4 shows their K_miR/K_S values for the last 4 Myrs and for the distant past (60 Myrs after the split between D. melanogaster and D. virilis). The evolutionary conservation has indeed been relaxed substantially in the last 4 Myrs with the average value increasing from 0.337 to 0.825, a 2.5-fold difference. Such fold-changes of K_miR/K_S were significantly high in the eight miRNAs, compared with the whole repertoire of 238 miRNAs (Mann-Whitney U test, p = 0.00014). The rate increase appears to be true in both D. melanogaster and D. simulans lineages when the homologous sequences from D. yakuba and D. erecta were used as outgroups to calculate the rate in each lineage separately. Among the eight genes, two and six are evolving slightly faster in D. melanogaster and D. simulans, respectively (see Table S7). It is interesting that some old miRNAs go through the reverse transition (or rejuvenation) from conservative to adaptive evolution, the latter being the hallmark of young miRNAs.

Rejuvenation can also lead to the death of old miRNAs. The miR-972s may be such an example. Some members of this cluster emerged 60–250 Myrs ago and should have been integrated into the ancestral genome by the time D. pseudoobscura split from D. melanogaster. However, the entire miR-972s region was lost in D. pseudoobscura since the split.

Taken together, new miRNAs (such as miR-310s and miR-972s) may go through cycles of adaptation, integration (if escaping death) and rejuvenation, which would start another cycle of adaptation and integration (Fig. 4B).

V. Evolution of miRNA expression

To study the evolution of new miRNAs sequences, we characterized their expression patterns. We did so by using the global small RNA profiling datasets (see Table S1 and Materials and Methods). Figure 5 shows young miRNAs (<30 Myrs) are lowly expressed in specific tissues, generally in the testes and larval brain/gonads/imaginal discs. Middle-aged miRNAs (30–60 Myrs) broadened their expressions to include ovaries and embryos. The older miRNAs (60–250 Myrs) showed moderate and even broader expressions, which then evolved to become highly abundant in all tissues and developmental stages as seen in the oldest miRNAs (>250 Myrs). The simplest explanation is that new miRNAs increase the expression level and expand the breadth as they get older. The change in expression parallels that in sequence evolution (Fig. 4A and Table S8). There are other explanations that may also account for the different expression patterns between new and old miRNAs (see Text S2). Detailed descriptions of the evolution in expression patterns are given in Text S3.

**Fig. 5. Clustering of miRNA expression among different age groups and tissues (or development stages).**

Discussion

During Metazoan evolution, the miRNA repertoire expanded dramatically from a few genes to several hundreds [28], [42]. By limiting the analysis to new miRNAs that evolve adaptively soon after their birth, we avoided the large number of lowly expressed miRNA-like sequences. These sequences may or may not be considered miRNAs and are generally thought to be evolutionarily ephemeral and adaptively insignificant [43], [44]. The inclusion of only new miRNAs that evolve adaptively at emergence reveals an unexpected pattern of an excess of such miRNAs in the last 4 million years of the D. melanogaster lineage. The possible explanations are therefore either a burst of birth since D. melanogaster split from D. simulans, or a decline in the survivorship of adaptive new miRNAs as they age.

We consider the latter explanation as more plausible for several reasons. First, the birth rate of miRNA-like sequences indeed appears constant because different Drosophila species have comparable numbers of such new transcripts [16]. Given the ease in forming precursor-like hairpins, the constant rate is hardly surprising. Second, as a result, the birth rate of adaptive new miRNAs may not deviate much from a constant value either. Indeed, the burst of adaptive new miRNAs is observable in D. simulans as well as the common ancestor of D. melanogaster and D. simulans, as is evident in the miR-982 cluster (Fig. 2A). Third, the proportion of adaptive miRNAs born in the period of 4–30 Myrs is also higher than that in the 30–60 Myrs period. Overall, an excess of new adaptive miRNAs appears to be a decreasing function of time, rather than of particular lineages; hence, their death over time is a simpler explanation.

Because only a small number of new adaptive miRNAs remain active after cycles of evolution through phases of adaptation and degeneration, the repertoire of miRNAs in the D. melanogaster genome has been nearly static in 40 Myrs of evolution, with only 0.18 miRNA integrations per Myrs. We should note that this low rate may still be an over-estimate because not all death has been accounted for. This (near) steady state echoes the view of a correlation between morphological complexity and the size of miRNA repertoire [45], as the Drosophila genus has been relatively invariant in form since its diversification.

Despite the low integration rate, many new miRNAs continue to emerge and some briefly evolve adaptively before their demise. This “transient utility” is puzzling as gene functions are lost usually through environmental changes (such as vision genes in caves [46]) or redundancies [47]. A possible explanation may be the suggested role of miRNAs in evolutionary canalization [7]. In such a role, the regulators and their targets need not be stringently wired as long as the system remains properly buffered. By this scheme, new miRNAs may emerge to fill in the transiently vacated role created by the shifting interactions between established miRNAs and their targets [7]. They disappear when the role is no longer needed.

A small number of new miRNAs that become integrated into the transcriptional network begin this process in the testis, in parallel with new protein coding genes [48]–[54]. Since sexual selection driving male reproduction is a very potent force of evolution, this expression pattern may not be all that surprising [50], [55]–[58]. In the example of miR-982s, the predicted targets are indeed enriched in genes of male courtship behavior and other male sexual traits (Table S9). Once a new miRNA is established, its expression is often broadened to other tissues. Testis may be the beachhead that permits the new miRNA to gradually modulate its expression and interactions with potential targets. In addition, new miRNAs with distinct seeds often emerge in clusters, which presumably facilitate their co-expression [29], [30], [59].

Unlike protein coding genes, miRNAs can easily emerge de novo, thanks to their small size, but can often be derived from existing genes as well [60]. The simple structure of miRNAs may permit general inferences on features and dynamics of genic evolution. A previous example is the rate of evolution as a correlate of expression level [61]. It would be interesting to see if the inferred cycles of evolution experienced by new miRNAs are a general process.

Materials and Methods

Sample RNA library preparation and sequencing

Total RNA was extracted from D. simulans (NC48S) and from D. pseudoobscura using TRIzol (Ambion). Ovaries and testes from 3 to 5-day adults were dissected and collected for both NC48S and D. pseudoobscura. Imaginal discs including central nerve system (CNS) were dissected from wandering third-instar larva of D. pseudoobscura. Small RNA libraries were generated from each RNA sample using Illumina Small RNA Sample Preparation kit, and sequenced with the Illumina HiSeq 2000 at the Beijing Genomics Institute (Shenzhen). The data were deposit at Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) under the accession numbers GSM1165052-GSM1165056.

Data compilation

The publicly available small RNA sequencing reads from four Drosophila species (D. melanogaster, D. simulans, D. pseudoobscura and D. virilis) were downloaded from GEO database (http://www.ncbi.nlm.nih.gov/geo/, Table S1). The miRNA sequences of three Culicinae species (Aedes albopictus, A. aegypti and Culex quinquefasiatus) were adopted from two previous small RNA sequencing studies [24], [25]. Drosophila genome sequences were retrieved from UCSC (http://genome.ucsc.edu); the Whole Genome Alignment (WGA) and CDS alignment were obtained from 12 Drosophila Assembly/Alignment/Annotation (http://rana.lbl.gov/drosophila). The genome versions used were: D. melanogaster, dm3; D. simulans, droSim1; D. sechellia, droSec1; D. yakuba, droYak2; D. erecta, droEre2; D. ananassae, droAna3; D. pseudoobscura, dp4; D. persimilis, droPer1; D. willistoni, droWil1; D. mojavensis, droMoj3; D. virilis, droVir3; D. grimshawi, droGri2. The genome coordinates and sequences of miRNA genes were retrieved from miRBase Release 19 (http://www.mirbase.org). The genome coordinates and sequences of intron, rRNA, tRNA, snRNA and transposon elements were obtained from FlyBase (r5.41, http://flybase.org,)

Defining canonical miRNAs, mirtrons and miRNA clusters

We defined canonical miRNAs and mirtrons according to Ruby et al. [62]. Mirtrons were defined as pre-miRNAs with both 5′ and 3′ ends matching the splicing sites of host introns. The rest of the miRNAs were then classified as canonical miRNAs. When more than three miRNAs were located within a 20 kb region, these miRNAs were considered as a cluster.

miRNA annotation and expression analysis

Small RNA reads (18–30 nt) were extracted from sequencing data. Firstly, we excluded reads mapped to transposon elements and structural RNAs (rRNA, tRNA and snRNA) using bowtie [63], allowing no mismatch. Next, we annotated novel miRNAs by miRDeep2 [64] with default parameters. Finally, miRNAs with no read matching miR* were removed following previous practice [23]. We combined novel miRNAs sequences and known miRNA sequences for expression analysis.

For each species, small RNA reads (18–30 nt) were mapped to miRNA precursor sequences using bowtie [63], allowing no mismatch. Each read count was divided by the number of matches to miRNA precursors. The miRNA expression was normalized by total miRNA counts and scaled to reads per million (RPM), as previous described [18].

Phylogenetic dating of miRNAs

We examined phylogenetic distributions of the D. melanogaster miRNAs in three other Drosophila species (D. simulans, D. pseudoobscura and D. virilis) and three Culicinae species (Aedes albopictus, A. aegypti and Culex quinquefasiatus), where small RNAs have been profiled via deep sequencing [12], [16], [22], [24], [25]. Based on the comprehensive dataset, miRNA homologs were determined by homology search using either the whole genome alignment (WGA) within the Drosophila group or BLAST (threshold E<10⁻⁵) between Drosophila species and mosquitoes, and cross-checked with small RNA reads in the species in query (at least one read matching mature and miR*).

The homologous sequences of the D. melanogaster miRNA precursors in D. simulans (droSim1), D. pseudoobscura (dp4) and D. virilis (droVir3) were extracted from UCSC pairwise WGAs using LiftOver (http://hgdownload.cse.ucsc.edu/, minMatch = 0.6). The precursors failing to obtain hits in the genomes were subjected to BLASTN search against NCBI trace archives (http://www.ncbi.nlm.nih.gov/Traces/home/). Matched sequences with E-values <10⁻⁵ were also considered as miRNA homologs and recovered for the analysis below. The WGA output was compared with miRNA annotation by miRDeep2 [64]; miRNA orthologs confirmed by miRDeep2 were retained.

The miRNA precursor sequences in Aedes albopictus, Culex quinquefasiatus and A. aegypti were adopted from the studies of Li et al. [24] and Skalsky et al. [25]. These sequences were combined and subjected to BLASTN search against miRNA precursors in D. melanogaster. The best reciprocal hits with E-values <10⁻⁵ were retained as the corresponding miRNA homologs in the Culicinae lineage.

According to the phylogenetic distribution, maximum parsimony method was used to infer the origination of each miRNA along the main trunk of the phylogenetic tree of D.melanogaster, D. simulans, D. pseudoobscura, D. virilis and Culicinae. An miRNA is assumed to emerge in the most recent common ancestor of all the species bearing the authentic homologs. The branch lengths of the phylogenetic tree (in Myrs) were adopted from previous estimations [27], [65], [66]. The 238 miRNAs were classified into five age groups, corresponding to the time intervals of 0–4 Myrs, 4–30 Myrs, 30–60 Myrs, 60–250 Myrs and >250 Myrs.

Genealogy of miR-982s in Drosophila species

The genomic coordinates and precursor sequences of dme-miR-982/303/983-1/983-2/984 and dsi-miR-982c/2582b/982b/2582a/982a/303/983 were retrieved from miRBase (Release v19). Based on the WGA of 12 Drosophila genomes [27], genomic sequence of the whole miR-982s cluster (∼9 Kb) in D. melanogaster (dm3) was extracted and used as a query to search against the other 11 Drosophila genomes using BLAT [67] with an E-value threshold of 0.001. We only detected hits in D. simulans, D. sechellia, D. yakuba and D. erecta, indicating that miR-982s is specific to the melanogaster subgroup. Homologous sequences of the miR-982s cluster from the five species were aligned using MUSCLE [68]. Homologs of miR-982s members in each species were identified using BLAST with the query of known precursor sequences (miRBase Release v19) and an E-value threshold of 0.001. The hits were further inspected in the alignment of the whole miR-982s cluster. The phylogenetic tree of each family of miR-982, miR-2582, miR-303, and miR-983 was reconstructed using the maximum likelihood method as implemented in MEGA 5.0 [69].

To validate the existence of miR-982s members in D. yakuba and D. erecta, we first predicted the secondary structure and thermo-stability of each miRNA homolog using RNAfold (http://rna.tbi.univie.ac.at/) with the default parameters [70]. A good hairpin with minimum free energy (MFE) >15 kcal/mol was considered as a potential miRNA candidate. There were four such candidates: dya-miR-2582-anc, dya-miR-303-anc, der-miR-982-anc, and der-miR-983-anc, where “anc” indicates ancestor. Then, we validated the expression of each candidate by amplifying the potential miRNA precursor from cDNA because the mature miRNA is hard to define. Total RNAs were extracted from testes of D. yakuba and D. erecta using TRIzol (Ambion) and treated with TURBO DNase Kit (Ambion). 0.5 ug RNA was reverse transcribed (RT) in a 20 ul reaction volume using PrimeScript II 1st Strand cDNA Synthesis Kit (TaKaRa). 1 ul RT products were used for PCR with Ex Taq DNA Polymerase (TaKaRa). PCR primers used are listed in Table S10.

Population genetic analysis of miR-982s in D. melanogaster and D. simulans

A total of 25 D.simulans lines and 42 D.melanogaster lines, including 29 M lines and 13 Z lines, were used for population sequencing of the miR-982s cluster. The fly strains used were listed in Table S4. The genomic sequences of D.simulans (droSim1) and D.melanogaster (dm3) were used to design primer pairs that amplify a ∼8 Kb region spanning the whole miR-982s cluster and ∼1.5 Kb each of the upstream and downstream flanking regions. The PCR product of each primer set was designed to be about 2 Kb in length and overlapped with each other by at least 300 bp. The primers used are listed in Table S10 and their genomic coordinates are displayed in Fig. S5. PCR was carried out using LA Taq DNA Polymerase (TaKaRa). PCR products were subject to direct sequencing or clone sequencing on an ABI 3730xl DNA Analyzer (Applied Biosystems). DNA sequences were assembled using SeqMan software (DNASTAR Inc., USA) and aligned using MUSCLE [68] with manual inspection. Haplotypes were inferred with the PHASE program when heterozygous sites were present [71]. The sequences obtained in this study have been deposited in GenBank under the accession numbers JX648211-JX648278.

Using the population sequencing data, several methods were used to detect positive selection of miR-982s in D. melanogaster and D.simulans, respectively. First, MK tests were applied on each member of miR-982s based on the divergence between D. melanogaster and D.simulans consensus sequences and polymorphism within either species. Each miRNA precursor was tested against a 1 kb region about 1.5 kb upstream of the 5′ end of miR-982s. Second, sliding window analysis of divergence and polymorphism was applied to the whole miR-982s cluster and its flanking region. The divergence was calculated using Kimura's 2-parameter model [72] based on the genomic sequences of D. simulans (droSim1) and D. melanogaster (dm3). The polymorphism within either species was estimated using the method described previously [35], [36], [73], [74]. D. simulans (droSim1) and D. melanogaster (dm3) were used as the outgroup for each other reciprocally, in order to polarize the derived alleles. The window size is 100 bp and the step width is 25 bp. Finally, based on our miR-982s population data or DPGP2 data (see below) [41], the pattern of population differentiation (Fst) between Z and M lines was estimated for each miRNA precursor using Weir's method [40].

Analysis of the evolutionary fate of miRNAs

We used the McDonald-Kreitman test (MK test) [32] framework to detect positive selection in miRNAs from each age group based on the polymorphisms within D. melanogaster and the divergence between D. melanogaster and D. simulans. Precursor or mature sequences of each miRNA group were combined and treated as the functional category, while the 4-fold degenerate sites in the whole genome were used as the neutral control. The divergence is calculated by counting the number of changed nucleotide sites between D. melanogaster (dm3) and D. simulans (droSim1) based on the UCSC whole genome alignment. Polymorphism data was retrieved from Drosophila Population Genomics Project (DPGP, http://www.dpgp.org/, release 1.0). SNPs that were detected on more than thirty individuals and exhibited a derived allele frequency (DAF) >5% were used for the MK test.

The proportion of adaptively fixed mutations (α) was estimated as previously described [75]. To estimate the evolutionary fate of each miRNA, we first screened for adaptive miRNAs among the 238 candidates by using each miRNA's precursor together with the 50 bp flanking sequences on both sides as the functional sites. The p-values of multiple MK tests were adjusted by the Benjamini-Hochberg method [76] and the adaptive significance of each candidate is re-validated by using the precursor alone in the MK test. We then identified the conservative miRNAs by comparing the number of substitutions in the miRNA precursors (K_miR) with the number of substitutions in the synonymous sites (K_S) between D.melanogaster and D.simulans. miRNAs with K_miR/K_S<0.5 were considered to be conservatively evolving. Kimura's 2-parameter model [72] and the Nei-Gojobori model [77] were used to calculate K_miR and K_S, respectively. Finally, excluding the adaptive and conservative miRNAs, the remaining were considered to be in transition between adaptive to conservative/death.

Evolutionary analysis of miRNA expression patterns

Data processing of small RNA deep sequencing libraries from different development stages and tissues of D. melanogaster [12], [16], [18]–[21], [23] was conducted as described above. The read counts of each miRNAs were normalized to Reads Per Million (RPM), which is the read number of each miRNA per million mapped reads in each library. The normalized counts were log2 transformed and subject to hierarchical clustering using R package heatmap2.

Target prediction of miR-982s and functional annotation

miR-982s targets were predicted by seed match using TargetScan (v5.0 http://www.targetscan.org/fly_12/) [5]. Taking all the miRNA members together, 1,002 targets were obtained in D. melanogaster and 3,563 in D. simulans, of which 454 were shared by both species. We used DAVID to perform a Gene Ontology (GO) enrichment test for the predicted targets in the two species (DAVID v6.7, http://david.abcc.ncifcrf.gov/) [78]. Only the GO terms for biological processes were used for the enrichment test.

Supporting Information

Zdroje

1. BartelDP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116 : 281–297.

2. BushatiN, CohenSM (2007) microRNA functions. Annu Rev Cell Dev Biol 23 : 175–205.

3. KimVN, HanJ, SiomiMC (2009) Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol 10 : 126–139.

4. BartelDP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136 : 215–233.

5. LewisBP, BurgeCB, BartelDP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120 : 15–20.

6. HornsteinE, ShomronN (2006) Canalization of development by microRNAs. Nat Genet 38 Suppl: S20–24.

7. WuCI, ShenY, TangT (2009) Evolution under canalization and the dual roles of microRNAs: a hypothesis. Genome Res 19 : 734–743.

8. WaddingtonC (1942) Canalization of development and the inheritance of acquired characters. Nature 150 : 563–565.

9. WaddingtonC (1960) Experiments on canalizing selection. Genet Res 1 : 140–150.

10. RutherfordSL, LindquistS (1998) Hsp90 as a capacitor for morphological evolution. Nature 396 : 336–342.

11. QueitschC, SangsterTA, LindquistS (2002) Hsp90 as a capacitor of phenotypic variation. Nature 417 : 618–624.

12. LuJ, ShenY, WuQ, KumarS, HeB, et al. (2008) The birth and death of microRNA genes in Drosophila. Nat Genet 40 : 351–355.

13. MeunierJ, LemoineF, SoumillonM, LiechtiA, WeierM, et al. (2013) Birth and expression evolution of mammalian microRNA genes. Genome Res 23 : 34–45.

14. BentwichI, AvnielA, KarovY, AharonovR, GiladS, et al. (2005) Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37 : 766–770.

15. ZhouQ, ZhangG, ZhangY, XuS, ZhaoR, et al. (2008) On the origin of new genes in Drosophila. Genome Res 18 : 1446–1455.

16. BerezikovE, LiuN, FlyntAS, HodgesE, RooksM, et al. (2010) Evolutionary flux of canonical microRNAs and mirtrons in Drosophila. Nat Genet 42 : 6–9 author reply 9-10.

17. LuJ, ShenY, CarthewRW, WangSM, WuCI (2010) Reply to “Evolutionary flux of canonical microRNAs and mirtrons in Drosophila”. Nat Genet 42 : 9–10.

18. RubyJG, StarkA, JohnstonWK, KellisM, BartelDP, et al. (2007) Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17 : 1850–1864.

19. ChungWJ, OkamuraK, MartinR, LaiEC (2008) Endogenous RNA interference provides a somatic defense against Drosophila transposons. Curr Biol 18 : 795–802.

20. CzechB, MaloneCD, ZhouR, StarkA, SchlingeheydeC, et al. (2008) An endogenous small interfering RNA pathway in Drosophila. Nature 453 : 798–802.

21. LauNC, RobineN, MartinR, ChungWJ, NikiY, et al. (2009) Abundant primary piRNAs, endo-siRNAs, and microRNAs in a Drosophila ovary cell line. Genome Res 19 : 1776–1785.

22. RozhkovNV, AravinAA, ZelentsovaES, SchostakNG, SachidanandamR, et al. (2010) Small RNA-based silencing strategies for transposons in the process of invading Drosophila species. RNA 16 : 1634–1645.

23. BerezikovE, RobineN, SamsonovaA, WestholmJO, NaqviA, et al. (2011) Deep annotation of Drosophila melanogaster microRNAs yields insights into their processing, modification, and emergence. Genome Res 21 : 203–215.

24. LiS, MeadEA, LiangS, TuZ (2009) Direct sequencing and expression analysis of a large number of miRNAs in Aedes aegypti and a multi-species survey of novel mosquito miRNAs. BMC Genomics 10 : 581.

25. SkalskyRL, VanlandinghamDL, ScholleF, HiggsS, CullenBR (2010) Identification of microRNAs expressed in two mosquito vectors, Aedes albopictus and Culex quinquefasciatus. BMC Genomics 11 : 119.

26. KozomaraA, Griffiths-JonesS (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39: D152–157.

27. ClarkAG, EisenMB, SmithDR, BergmanCM, OliverB, et al. (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450 : 203–218.

28. BerezikovE (2011) Evolution of microRNA diversity and regulation in animals. Nat Rev Genet 12 : 846–860.

29. ZhangR, PengY, WangW, SuB (2007) Rapid evolution of an X-linked microRNA cluster in primates. Genome Res 17 : 612–617.

30. LiJ, LiuY, DongD, ZhangZ (2010) Evolution of an X-linked primate-specific micro RNA cluster. Mol Biol Evol 27 : 671–683.

31. FayJC, WuCI (2003) Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet 4 : 213–235.

32. McDonaldJH, KreitmanM (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351 : 652–654.

33. LuJ, FuY, KumarS, ShenY, ZengK, et al. (2008) Adaptive evolution of newly emerged micro-RNA genes in Drosophila. Mol Biol Evol 25 : 929–938.

34. FayJC, WyckoffGJ, WuCI (2002) Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415 : 1024–1026.

35. FayJC, WuCI (2000) Hitchhiking under positive Darwinian selection. Genetics 155 : 1405–1413.

36. ZengK, FuYX, ShiS, WuCI (2006) Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174 : 1431–1439.

37. WuCI, HollocherH, BegunDJ, AquadroCF, XuY, et al. (1995) Sexual isolation in Drosophila melanogaster: a possible case of incipient speciation. Proc Natl Acad Sci U S A 92 : 2519–2523.

38. HollocherH, TingCT, WuML, WuCI (1997) Incipient speciation by sexual isolation in Drosophila melanogaster: extensive genetic divergence without reinforcement. Genetics 147 : 1191–1201.

39. TingCT, TakahashiA, WuCI (2001) Incipient speciation by sexual isolation in Drosophila: concurrent evolution at multiple loci. Proc Natl Acad Sci U S A 98 : 6709–6713.

40. WeirBS, CockerhamCC (1984) Estimating F-Statistics for the analysis of population structure. Evolution 38 : 1358–1370.

41. PoolJE, Corbett-DetigRB, SuginoRP, StevensKA, CardenoCM, et al. (2012) Population Genomics of sub-saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet 8: e1003080.

42. GrimsonA, SrivastavaM, FaheyB, WoodcroftBJ, ChiangHR, et al. (2008) Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455 : 1193–1197.

43. LiangH, LiWH (2009) Lowly expressed human microRNA genes evolve rapidly. Mol Biol Evol 26 : 1195–1198.

44. NozawaM, MiuraS, NeiM (2010) Origins and evolution of microRNA genes in Drosophila species. Genome Biol Evol 2 : 180–189.

45. HeimbergAM, SempereLF, MoyVN, DonoghuePC, PetersonKJ (2008) MicroRNAs and the advent of vertebrate morphological complexity. Proc Natl Acad Sci U S A 105 : 2946–2950.

46. YokoyamaS, MeanyA, WilkensH, YokoyamaR (1995) Initial mutational steps toward loss of opsin gene function in cavefish. Mol Biol Evol 12 : 527–532.

47. MaereS, De BodtS, RaesJ, CasneufT, Van MontaguM, et al. (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A 102 : 5454–5459.

48. LevineMT, JonesCD, KernAD, LindforsHA, BegunDJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 103 : 9935–9939.

49. DingY, ZhaoL, YangS, JiangY, ChenY, et al. (2010) A young Drosophila duplicate gene plays essential roles in spermatogenesis by regulating several Y-linked male fertility genes. PLoS Genet 6: e1001255.

50. KaessmannH (2010) Origins, evolution, and phenotypic impact of new genes. Genome Res 20 : 1313–1326.

51. WuDD, IrwinDM, ZhangYP (2011) De novo origin of human protein-coding genes. PLoS Genet 7: e1002379.

52. ChenS, NiX, KrinskyBH, ZhangYE, VibranovskiMD, et al. (2012) Reshaping of global gene expression networks and sex-biased gene expression by integration of a young gene. EMBO J 31 : 2798–2809.

53. YehSD, DoT, ChanC, CordovaA, CarranzaF, et al. (2012) Functional evidence that a recently evolved Drosophila sperm-specific gene boosts sperm competition. Proc Natl Acad Sci U S A 109 : 2043–2048.

54. RossBD, RosinL, ThomaeAW, HiattMA, VermaakD, et al. (2013) Stepwise evolution of essential centromere function in a Drosophila neogene. Science 340 : 1211–1214.

55. WuCI, DavisAW (1993) Evolution of postmating reproductive isolation: the composite nature of Haldane's rule and its genetic bases. Am Nat 142 : 187–212.

56. WyckoffGJ, WangW, WuCI (2000) Rapid evolution of male reproductive genes in the descent of man. Nature 403 : 304–309.

57. WuCI, TingCT (2004) Genes and speciation. Nat Rev Genet 5 : 114–122.

58. SwansonWJ, VacquierVD (2002) The rapid evolution of reproductive proteins. Nat Rev Genet 3 : 137–144.

59. GlazovEA, McWilliamS, BarrisWC, DalrympleBP (2008) Origin, evolution, and biological role of miRNA cluster in DLK-DIO3 genomic region in placental mammals. Mol Biol Evol 25 : 939–948.

60. MarcoA, NinovaM, RonshaugenM, Griffiths-JonesS (2013) Clusters of microRNAs emerge by new hairpins in existing transcripts. Nucleic Acids Res 41 : 7745–7752.

61. ShenY, LvY, HuangL, LiuW, WenM, et al. (2011) Testing hypotheses on the rate of molecular evolution in relation to gene expression using microRNAs. Proc Natl Acad Sci U S A 108 : 15942–15947.

62. RubyJG, JanCH, BartelDP (2007) Intronic microRNA precursors that bypass Drosha processing. Nature 448 : 83–86.

63. LangmeadB, TrapnellC, PopM, SalzbergSL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.

64. FriedlanderMR, ChenW, AdamidiC, MaaskolaJ, EinspanierR, et al. (2008) Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 26 : 407–415.

65. GauntMW, MilesMA (2002) An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Mol Biol Evol 19 : 748–761.

66. BolshakovVN, TopalisP, BlassC, KokozaE, della TorreA, et al. (2002) A comparative genomic analysis of two distant diptera, the fruit fly, Drosophila melanogaster, and the malaria mosquito, Anopheles gambiae. Genome Res 12 : 57–66.

67. KentWJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12 : 656–664.

68. EdgarRC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32 : 1792–1797.

69. TamuraK, PetersonD, PetersonN, StecherG, NeiM, et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28 : 2731–2739.

70. HofackerIL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31 : 3429–3431.

71. StephensM, DonnellyP (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73 : 1162–1169.

72. KimuraM (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16 : 111–120.

73. WattersonGA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7 : 256–276.

74. TajimaF (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105 : 437–460.

75. MackayTF, RichardsS, StoneEA, BarbadillaA, AyrolesJF, et al. (2012) The Drosophila melanogaster Genetic Reference Panel. Nature 482 : 173–178.

76. BenjaminiY, HochbergY (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57 : 289–300.

77. NeiM, GojoboriT (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3 : 418–426.

78. HuangDW, ShermanBT, LempickiRA (2008) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protocols 4 : 44–57.

79. MesserPW, PetrovDA (2013) Frequent adaptation and the McDonald-Kreitman test. Proc Natl Acad Sci U S A 110 : 8615–8620.