Genome Analysis of a Transmissible Lineage of Reveals Pathoadaptive Mutations and Distinct Evolutionary Paths of Hypermutators

Download PDF České info

Genome sequencing of bacterial pathogens has advanced our understanding of their evolution, epidemiology, and response to antibiotic therapy. However, we still have only a limited knowledge of the molecular changes in in vivo evolving bacterial populations in relation to long-term, chronic infections. For example, it remains unclear what genes are mutated to facilitate the establishment of long-term existence in the human host environment, and in which way acquisition of a hypermutator phenotype with enhanced rates of spontaneous mutations influences the evolutionary trajectory of the pathogen. Here we perform a retrospective study of the DK2 clone type of P. aeruginosa isolated from Danish patients suffering from cystic fibrosis (CF), and analyze the genomes of 55 bacterial isolates collected from 21 infected individuals over 38 years. Our phylogenetic analysis of 8,530 mutations in the DK2 genomes shows that the ancestral DK2 clone type spread among CF patients through several independent transmission events. Subsequent to transmission, sub-lineages evolved independently for years in separate hosts, creating a unique possibility to study parallel evolution and identification of genes targeted by mutations to optimize pathogen fitness (pathoadaptive mutations). These genes were related to antibiotic resistance, the cell envelope, or regulatory functions, and we find that the prevalence of pathoadaptive mutations correlates with evolutionary success of co-evolving sub-lineages.

The long-term co-existence of both normal and hypermutator populations enabled comparative investigations of the mutation dynamics in homopolymeric sequences in which hypermutators are particularly prone to mutations. We find a positive exponential correlation between the length of the homopolymer and its likelihood to acquire mutations and identify two homopolymer-containing genes preferentially mutated in hypermutators. This homopolymer facilitated differential mutagenesis provides a novel genome-wide perspective on the different evolutionary trajectories of hypermutators, which may help explain their emergence in CF infections.

Published in the journal: . PLoS Genet 9(9): e32767. doi:10.1371/journal.pgen.1003741
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1003741

Summary

Introduction

A molecular and mechanistic understanding of how bacterial pathogens evolve during infection of their human hosts is important for our ability to fight infections. The advent of high-throughput sequencing techniques now offer unprecedented nucleotide resolution to determine the relatedness among infecting bacterial isolates and to unveil genetic adaptation within infected individuals and in response to antibiotic therapy [1]–[7]. Unraveling the genetic content of pathogens helps to identify the genes that make certain bacterial lineages more pathogenic than others. Nonetheless, the pathogenicity of a bacterial clone can also evolve via the mutational changes of pre-existing genes, a mechanism which is also known as pathogenicity -⁠ or pathoadaptive mutations [8]. While several studies have provided insight into the genomic evolution of primary bacterial pathogens such as Yersinia pestis [6] and Vibrio cholera [2] causing acute infections, only little is known how these observations relate to opportunistic pathogens causing long-term infections [1].

The opportunistic pathogen Pseudomonas aeruginosa is a common environmental inhabitant, which is also capable of causing both acute and chronic infections in a range of hosts from amoeba and plants to humans. For example, P. aeruginosa causes chronic airway infections in most patients with cystic fibrosis (CF), and is directly associated with the morbidity and mortality connected with this disease. Chronic CF infections provide an opportunity for long-term monitoring of the battle between the infecting bacteria and the host immune defense and clinical intervention therapy [9], [10], and thus offer a direct method for observing evolutionary mechanisms in vivo. In an effort to understand the evolutionary mechanisms facilitating the transition of P. aeruginosa from its environment to a human host, we have previously found no evidence for horizontal acquisition of genes to play a role [9]. Instead, we suggested the establishment of long-term chronic infections to be a matter of tuning the existing genome via pathoadaptive mutations.

The within-host mutation rate is a key factor in determining the potential for bacterial pathogens to genetically adapt to the host immune system and drug therapies, and knowledge about in vivo growth dynamics of bacterial pathogens and their capacity for accumulation of mutations is essential for the design of optimal interventions. Interestingly, the generation of mutations is frequently accelerated in clinical populations of P. aeruginosa that evolves as so-called ‘hypermutators’ due to deficient DNA mismatch repair systems [11]. Although the hypermutable phenotype is also observed for other species in a range of conditions [12]–[16], the impact of this phenotype in a natural environment and in relation to infections remains less clear.

Here we analyze the genome sequences of 55 isolates of the transmissible P. aeruginosa DK2 clone type causing chronic infections in a cohort of Danish CF patients. Our collection, comprising both normal (normomutator) and hypermutable isolates, enabled a comparative analysis of evolutionary trajectories of individual sub-lineages of the DK2 clone type making it possible to identify genes targeted by pathoadaptive mutations. Furthermore, the long-term population dynamics and structure of the clonal expanding DK2 lineage was elucidated by a high-resolution phylogeny, and an examination of the mutation dynamics of homopolymers (homopolymeric tracts of identical nucleotides, e.g. GGGGG) provided novel genome-wide evidence for the potential advantage of differential mutagenesis associated with the hypermutator phenotype.

Results and Discussion

Strain collection and maximum-parsimonious phylogeny

We sequenced the genomes of a collection of 55 P. aeruginosa DK2 clones sampled from Danish CF patients between 1972 and 2008 (Figure 1 and Table S4). The sequence data of 45 of the isolates have previously been reported [9], [10]. Most patients (n = 19) were represented by only a single or a few (≤4) isolates. However, two patients were represented by 11 and 15 isolates (CF173 and CF333, respectively).

Patient origin and sampling time of genome sequenced <i>P. aeruginosa</i> DK2 isolates. — **Fig. 1. Patient origin and sampling time of genome sequenced *P. aeruginosa* DK2 isolates.**

We identified a total of 7,326 unique SNPs in the 55 DK2 genomes, that could be explained by 7,368 mutational events (consistency index 0.99) using a maximum-parsimonious phylogenetic model to elucidate the evolutionary relationship of the P. aeruginosa DK2 population (Figure 2). The high consistency of the tree reflects the unidirectional, clonal evolution from the root of the tree to the tips, thus enabling inferences about the succession of mutations and the relationship among P. aeruginosa DK2 clones.

Maximum-parsimonious reconstruction of the phylogeny of the <i>P. aeruginosa</i> DK2 clones. — **Fig. 2. Maximum-parsimonious reconstruction of the phylogeny of the *P. aeruginosa* DK2 clones.**

Prevalence of mutS, mutL, and mutY hypermutators

From the phylogenetic tree we observed a linear correlation between the number of SNPs and the time of sampling (i.e. a constant rate of mutation accumulation during the clonal expansion of the DK2 lineage) (Figure 3). However, nine sub-lineages (indicated by filled circles in Figure 3) deviated from this trend and had accumulated mutations at higher rates. In one of these isolates, CF224-2002a, we found that 265 of the 273 SNPs accumulated in the branch leading to the isolate were densely clustered in two chromosomal regions with SNP densities (1.2 and 1.8 SNPs per kb, respectively), that are much higher than expected (0.043 SNPs per kb assuming a random distribution of SNPs) (Figure S1). The most likely explanation for these high SNP densities is that the two genomic regions are the result of recombination events with DNA from a P. aeruginosa strain(s) unrelated to the DK2 clone type. Another study by Chung et al. observed similar indications of within-patient recombination events in P. aeruginosa [17].We found no evidence for additional recombination events among the 55 genome sequences.

**Fig. 3. Total number of SNPs accumulated in each DK2 isolate.**

The excess numbers of mutations in the remaining eight deviant isolates were the result of increased mutation rates due to mutations in mismatch repair and error prevention genes. Seven isolates had non-synonymous mutations in one of the DNA mismatch repair (MMR) genes mutS (n = 2) and mutL (n = 4) or both (n = 1), and their excess numbers of SNPs showed a highly increased transition∶transversion ratio consistent with MutS or MutL defects (Table 1) [3], [11], [17]–[19]. Moreover, one isolate (CF173-1991) had a mutation in mutY and a molecular signature consistent with a MutY defect (i.e. a high proportion of transversions) (Table 1).

**Tab. 1. Statistics on mutations accumulated in hypermutable sub-lineages.**

We did not find other mutations in mutS or mutL among the remaining genome sequences, but three additional early isolates (CF84-1972, CF43-1973, and CF105-1973) had mutations in mutY as well as having the molecular signature associated with a MutY defect (Table 1).

In total, we found 11 hypermutator strains among the 55 isolates. These mutators were found in ten of the 21 patients in our study (48%), which is comparable to previous findings (36%) [11]. Our results include two patients (CF211 and CF224) from whom we isolated both hypermutators and normal (normomutator) clones documenting the co-existence of both types. Indeed, the identification of both a hypermutable and a normal sub-lineage in years 1997 and 2006 from patient CF211 suggests at least 9 years of co-existence within this patient (Figure 2). It is possible that the sub-lineages with different mutation rates occupy different niches within the hosts, each niche representing different selection pressures.

Mutation rate of homopolymers

We next designed our mutational analysis to detect small insertions and deletions (microindels). A total of 1,204 unique microindels were discovered. The inheritance was explained by 1,380 parsimonious events and was congruent with the SNP-based phylogeny although the consistency for the microindels was lower (0.87) than for the SNPs (0.99). The higher rate of homoplasy among microindels would a priori be expected as microindels accumulate with high rates at mutational hotspots consisting of simple sequence repeats (SSRs) [20]. Accordingly, 93% of the inconsistent microindels were located in SSRs. As expected from current knowledge [21], [22], we observed that the seven mutL/mutS hypermutators were particularly prone to mutation within SSRs consisting of homopolymers, and as a result 86% of the microindels that accumulated in the mutS/mutL hypermutable sub-lineages were localized in homopolymers whereas this was only true for 21% of the microindels within the remaining sub-lineages (Table 1).

Highly mutable loci have been shown to be important for pathogenesis and host adaptation of several pathogens [23]. For example, increased mutation rates of homopolymers in MMR-deficient P. aeruginosa strains have been shown in vitro to be important for mutational inactivation of the regulatory gene mucA [24], which is pivotal for adaptation in CF airways. Nonetheless, we only have a limited understanding of the homopolymer mutation dynamics at a genome-wide level and of the impact of increased mutation rates of homopolymers in relation to host adaptation. However, our collection of genome sequences from both normal and hypermutator isolates, sampled from the airways of CF patients, provides an opportunity to shed new light on homopolymer mutation rates and their impact on adaptation.

For each of the seven mutS/mutL hypermutator sub-lineages we calculated the mutation rates of homopolymers of different lengths (Figure 4). We observed that longer homopolymers were more likely mutated than homopolymers of shorter lengths, and for homopolymers of 3–6 nucleotides length the mutation rate increased exponentially (R = 0.995; Student's t-test, P = 0.0026). One might expect large homopolymers to exhibit higher probabilities of mutation, because they are distributed more frequently outside coding regions. However, we observed no evidence of this playing a role, as mutation rates of intergenic and intragenic homopolymers were similar (Figure S2). Instead, the size-dependent mutation rate of homopolymers is likely to be a consequence of the mechanistically determined probabilities of strand-slippage during replication [20].

Distinct evolutionary paths in hypermutators

The size-dependent mutation rates of homopolymers of different lengths suggest that different genes have different probabilities of mutation. In this way, certain genes, in which variation is appreciated, may harbor sequences that are more frequently mutated in contrast to essential genes in which genetic changes are strongly selected against [23]. In agreement with this, we find that genes annotated as essential genes in P. aeruginosa PAO1 [25] are less likely to contain large homopolymers ≥7 nt (Fisher's exact test, P = 0.037) (Table S1).

Survival of bacteria in human hosts has previously been suggested to be positively influenced by rapid modulation of the cell envelope. In agreement with this (and in opposition to essential genes), we find that genes functionally related with the composition of the cell envelope are more likely to contain large ≥7 nt homopolymers (Fisher's exact test, P = 0.002) (Table S1). This leads us to speculate that hypermutators have a selective advantage over their normal counterparts, not only because they can speed up evolution, but also because they are creating a bias towards a different evolutionary path by homopolymer facilitated differential mutagenesis.

In support of our hypothesis, we find that mutS/mutL hypermutators acquire 3.7 more mutations in cell envelope genes containing large homopolymers (≥7 nt) relative to cell envelope genes without large homopolymers, and that the accumulation of mutations in the homopolymer-containing cell envelope genes is due to mutations within the homopolymers. Accordingly, 50% of mutations in homopolymer-containing cell envelope genes are indels whereas this is only true for 5% of the mutations in the remaining genes (9/164 vs. 8/8; Fisher's exact test, P = 6.2×10⁻⁶).

In further support of our hypothesis on differential mutagenesis we find two genes (PADK2_15360 and PADK2_03970) in which all seven mutS/mutL hypermutators, but no other isolates, carries mutations. Given the number of mutations within each of the seven hypermutable sub-lineages and all other lineages this observation is highly unexpected by chance (P(X≥2)∼binom(X; 5976; 2.9×10⁻⁷) = 1.6×10⁻⁶; where 2.9×10⁻⁷ is the probability of an average length gene to be mutated in only the mutS/mutL sub-lineages). One of the genes, PADK2_15360, encodes an outer membrane receptor protein, and all seven hypermutators are independently mutated in the same 7×G homopolymer located at position 1127–1133 within the 2958 nt gene. Since none of the other 48 isolates contain mutations within PADK2_15360, we suggest that mutations in this gene represent a hypermutator-specific adaptive target for rapid modulation of the cell envelope. All seven mutations are frameshift mutations causing premature stop codons resulting in truncated proteins without a putative TonB dependent receptor domain (Pfam family PF00593) located in the C-terminal part of the protein. We hypothesize that this domain is localized in the outer membrane where it, due to its potential surface-exposure, could be a target of recognition by the immune defense.

Time-measured phylogenetic reconstruction of the DK2 lineage

To further investigate the within-host evolutionary history of the DK2 lineage and to estimate the dates of divergence between DK2 isolates, we applied Bayesian statistics to infer time-measured phylogenies using a relaxed molecular clock rate model (Figure 5).

Bayesian phylogenetic reconstruction and divergence date estimates of the <i>P. aeruginosa</i> DK2 clones. — **Fig. 5. Bayesian phylogenetic reconstruction and divergence date estimates of the *P. aeruginosa* DK2 clones.**

We excluded the hypermutator isolates and isolate CF224-2002a containing recombined regions from the analysis, as they would otherwise interfere with the phylogenetic analysis. Based on this analysis, the mean mutation rate was estimated to be 2.6 SNPs/year (95% highest posterior density (HPD; see Materials and Methods) 1.8–3.2 SNPs/year) which is equivalent to 4×10⁻⁷ SNPs/year per site or 9×10⁻¹¹–11×10⁻¹¹ SNPs/bp per generation assuming 3700–4500 generations per year [26]. Our estimated mutation rate is in the same range as those estimated for Shigella sonnei (6×10⁻⁷ SNPs/bp/year) [7] and Vibrio cholerae (8×10⁻⁷) [2] but in between the rates reported for Yersinia pestis (2×10⁻⁸) [6] and Staphylococcus aureus (3×10⁻⁶) [27].

The topologies of the Bayesian phylogenetic reconstruction and the maximum-parsimonious phylogeny were congruent, and the relationship among the clones correlated with patient origin and the time of sampling (Figure 2; Figure 5).

We have previously shown that a set of specific mutations first observed in CF30-1979 and in all isolates sampled after 1979 were important for the reproductive success of the DK2 lineage and its dissemination among multiple individuals [10]. Using the phylogenetic reconstruction, we estimate that isolates sampled after 1979 diverged from a common ancestor in year 1970 (95% HPD, 1961–1976) [10]. Furthermore, our phylogenetic data document that the transmission potential of the DK2 lineage has been maintained over several decades. The most recent transmission event is predicted to have occurred in year 1997 (95% HPD, 1991–2001), as this is the latest time estimate of a predicted ancestor shared by isolates from different patients (CF177-2002 and CF223-2002). Since we have not investigated DK2 isolates from all patients chronically infected with this lineage it remains a possibility that transmission has occurred subsequent to this time.

Seven patients are represented by multiple isolates, and in six of the patients at least two of the isolates clustered as monophyletic groups according to patient origin (Figure 2; Figure 5). This is in agreement with a model in which independent sub-lineages of the DK2 clone evolved separately within individual patients, and it excludes the possibility of continuous and near-perfect mixing of strains between patients. The patient-linkage was most prevalent for patient CF333 from which all 15 isolates constituted a single monophyletic group, and the isolates branched in general according to their sampling year giving a linear evolutionary trajectory with an average distance of 6.1 SNPs (∼2.3 years) from the line of descent (Figure 2; Figure 5).

In contrast, we observed an unexpected DK2 population dynamics in patient CF173 in which the isolates clustered as three different monophyletic groups with four, five and two isolates, respectively (Figure 2). This shows that patient CF173 was infected by three distinct sub-lineages rather than only a single sub-lineage. Interestingly, the three sub-lineages carried by patient CF173 can be distinguished based on the sampling year of the isolates. Accordingly, the isolates from the different clusters are sampled in the time-periods 1984–1991 (cluster A), 1992–1999 (cluster B) and 2002–2005 (cluster C), respectively. This points to a replacement of the earlier sub-lineages around years 1991–1992 and 1999–2002, respectively, caused by secondary transmission events. Alternatively, it could be the result of co-existing lineages whose time-dependent sampling was caused by shifts in relative abundance or changes in sampling probability from different niches.

Parallel evolution of genes involved in host adaptation

The presence of independently evolving DK2 sub-lineages made it possible to search for recurrent patterns of mutation and to identify bacterial genes that have acquired mutations in parallel in different individuals [1], [28]. Overall, we found no evidence for either intragenic bias of the mutations or for positive selection within coding regions (dN/dS = 0.66 including all mutations; Text S1), and we would therefore expect the 7,383 intragenic mutations to be distributed randomly among the 5,976 P. aeruginosa DK2 genes. This means that on average a gene would acquire 1.2 mutations, and we would expect only 1.3 genes to acquire mutations more than 6 times (P(X>6)∼binom(X; 7,383; 5976⁻¹) = 2.2×10⁻⁴). Nonetheless, we identified 65 genes that were mutated more than 6 times when comparing across all DK2 sub-lineages (see Table S2 for the full list of all 65 genes). The high mutation number within these genes could be the result of a positive selection for mutations, which is supported by our observation that increased pressures of selection acts on the top most mutated genes (Figure 6). Accordingly, the signature for selection for SNPs accumulated in the 65 top most mutated genes (dN/dS = 1.11) was positive and significantly higher than for SNPs accumulated in other genes (dN/dS = 0.69; Fisher's exact test, P = 5.2×10⁻⁵).

**Fig. 6. Increased pressures of selection for mutations in the top most mutated genes.**

These findings suggest that the 65 genes with multiple mutations undergo adaptive evolution (i.e. they are pathoadaptive genes involved in host adaptation), although the presence of neutral mutational hotspots or fast acquisition of secondary mutations within the same gene may contribute to the high mutation number in some genes. To exclude the possibility that the high mutation numbers were the result of recombination events or because of particularly large gene sizes, we left out mutations from recombined regions and large genes (>5 kb) from our analysis.

A large part of the identified pathoadaptive genes were associated with antibiotic resistance (n = 14), including the genes ampC, emrB, ftsI, fusA, gyrA/B, mexB/Y, pmrB, pprA, oprD, and rpoB/C (Figure 7 and Table S2), in which mutations have been shown to confer resistance against a range of antibiotics, e.g. beta-lactams, tetracyclines, quinolones, chloramphenicol, macrolides, fusidic acid, aminoglycosides, polymycins and penicillins [29]–[37]. As such, the detection of multiple mutations in known antibiotic resistance genes confirmed the ability of our approach to identify genes involved in host adaptation. The exact amino acid changes caused by nine out of 16 unique non-synonymous mutations found within the genes gyrA/B and rpoB have previously been shown to confer resistance against fluoroquinolones and rifampicin, respectively (Table S3).

Another major group of pathoadaptive genes (n = 18) were functionally related to the cell envelope (Figure 7 and Table S2). Possibly, these mutations have been selected to evade the host immune response [38] or, especially in the case of lpxO2, to prevent interaction from LPS-targeting antibiotics [39]. Also, mutations in 13 genes involved in gene regulation were identified in our analysis, suggesting that remodeling of regulatory networks is a key evolutionary pathway in host adaptation as it seems to be in evolving Escherichia coli populations [40]. Among the regulatory genes that acquired mutations were four yet uncharacterized genes encoding components of two-component regulatory systems, a gene-category which is significantly overrepresented (88/5823 vs. 7/58 Fisher's exact test, P = 6.7×10⁻⁵) among the pathoadaptive genes (Table S2). We suggest that these uncharacterized regulatory genes as well as other genes identified as involved host adaptation represent potential therapeutic targets.

Mutations in pathoadaptive genes correlate with strain displacement events within patient CF173

The adaptive benefits of a mutation are usually investigated by introduction of single or multiple mutations into isogenic strains and testing for fitness effects associated with the mutation(s) in controlled experimental conditions (such as competition experiments). Such testing is most effective when the phenotype (e.g. antibiotic resistance) can be easily interpreted in relation to the fitness impact. However, for mutations for which no or only subtle phenotypic changes are apparent it is difficult to directly test the fitness effects. In addition, the impacts on fitness of specific mutations must be assessed in the same environment as the one in which the mutation was selected. This is obviously not possible in case of human airway infections. To circumvent these limitations, we hypothesize that the count of mutations within the pathoadaptive genes can be used as a measure of the fitness of individual clones of P. aeruginosa. To investigate this hypothesis we took advantage of the two strain displacements (or changes in strain abundances) that occurred in patient CF173 in the years 1991–1992 and 1999–2002, which suggested that CF173 was infected by three succeeding DK2 sub-lineages A (1980–1991), B (1990–1999), and C (2000–2005).

We assume that the succeeding sub-lineage must be better adapted (i.e. having a higher fitness) than the previous sub-lineage, which was outcompeted. When determining the number of mutations found in the sub-lineages A, B, and C within the pathoadaptive genes, it was striking that the succeeding genotype consistently had a higher count of mutations than the previous genotype (Figure 7). In this way, the counts of mutations correlated with the strain displacement observed within patient CF173. We suggest that the mutation count can be used to predict the fitness of emerging DK2 clones, and that the pathogenicity scoring together with the information about the specific mutations can be used as a novel approach for clinicians to treat and segregate patients. It should be noted that our results cannot simply be ascribed to the succeeding genotypes having more mutations in general as no significant positive correlation existed between the total number of mutations and the number of mutations within the pathoadaptive genes (R = 0.30; Student's t-test, P = 0.28).

Conclusions and implications

By genome sequencing of 55 isolates of the transmissible DK2 clone type of P. aeruginosa, we have provided a detailed view of the evolution of a bacterial pathogen within its human host. The sampling from multiple patients offered the opportunity to detect loci that were independently mutated in parallel lineages, here referred to as pathoadaptive genes, whereas sampling multiple times from the same patient gave an opportunity to study the within-patient population dynamics.

Several of the pathoadaptive genes identified here were associated with antibiotic resistance, gene regulation, and composition of the cell envelope. Some of these genes have been found in other studies of genomic evolution in CF pathogens to be important for adaptation [1], [3], [17]. Genomic analysis of additional P. aeruginosa lineages from different patients and clinical settings will enable a systematic identification of genes that are repeated targets for selective mutations during adaptation to life in the CF lung. Importantly, we also identified genes of unknown function and without prior implication in pathogenesis. Further investigations of the function of these genes are required to determine their potential as future therapeutic targets against the infection.

An exceptional 21-year time series of 11 isolates sampled from patient CF173 revealed a complex population dynamics in which the patient was infected by three distinct sub-lineages of the DK2 clone type, each sub-lineage being dominant over several years until its final decline or disappearance. This observation illustrates the power of high-throughput sequencing in relation to uncovering pathogen dynamics within infected individuals. We further observed that the cumulative count of mutations within pathoadaptive genes increased for each of the succeeding sub-lineages. This means that emerging sub-lineages carried a cumulative palette of pathoadaptive mutations and not only adaptive mutations conferring an advantage for a newly introduced selection force that may have triggered the removal of the preceding lineage.

The identification of pathoadaptive genes involved in host adaptation and our finding that the specific count of mutations within these genes act as a classifier that predict the pathogenicity of emerging sub-lineages of the DK2 clone type, should enable better epidemiological predictions and provide valuable information for the clinicians on how to treat and segregate patients.

The presence of hypermutable lineages within 48% of the studied individuals might be the outcome of an accelerated acquisition of beneficial mutations within hypermutators [11], [41]–[43]. Nonetheless, our examination of mutation dynamics of homopolymers provided a novel genome-wide perspective on the impact and potential advantage of differential mutagenesis associated with the hypermutator phenotype. Showing a clear exponential correlation between the rate of change and the size of the homopolymer, we confirmed homopolymers to be hotspots for differential mutagenesis, and we identified two homopolymer-containing genes to be preferentially mutated in hypermutators.

In conclusion, we have shown how collections of isolates of bacteria sampled from chronically infected patients constitute a valuable basis for studying evolution of pathogens in vivo, and our results facilitates comparative studies as sequencing datasets become increasingly available.

Materials and Methods

Bacterial strains and genome sequencing

The study encompasses 55 isolates of the P. aeruginosa DK2 clone type that were sampled over 38 years from 21 CF patients attending the Copenhagen Cystic Fibrosis Center at the University Hospital, Rigshospitalet (Figure 1). Isolation and identification of P. aeruginosa from sputum was done as previously described [44]. Sequencing of 45 of the isolates was previously reported by Yang et. al. [10] and Rau et al. [9]. Two of the previously sequenced isolates (CF333-1991 and CF510-2006) were re-sequenced together with ten new isolates on an Illumina HiSeq2000 platform generating 100-bp paired-end reads using a multiplexed protocol to an average coverage depth of 63–212 fold. Sequence reads from all isolates are deposited in the Short Read Archive under accession number ERP002277 (accession numbers for individual samples are provided in Table S4).

Mutation detection and analysis

Reads were mapped against the P. aeruginosa DK2 reference genome (CF333-2007a; Genbank accession no. CP003149) using Novoalign (Novocraft Technologies) [45], and pileups of the read alignments were produced by SAMtools release 0.1.7 [46]. Single nucleotide polymorphisms were called by the varFilter algorithm in SAMtools in which minimum SNP coverage was set to 3 (samtools.pl varFilter -d 3 -D 10000). Only SNP calls with quality scores (Phred-scaled probability of sample reads being homozygous reference) of at least 50 (i.e. P≤10⁻⁵) were retained. Microindels were extracted from the read pileup by the following criteria; (1) quality scores of at least 500, (2) root-mean-square (RMS) mapping qualities of at least 25, and (3) support from at least one fifth of the covering reads. The false-negative rates were found to be 2% and 3% by in silico introduction of random base-substitutions and microindels (lengths 1–10 bp), respectively. To avoid false-positives, the reference genome was re-sequenced by Illumina sequencing to exclude polymorphisms caused by errors in reference assembly. Also, Illumina re-sequencing of CF333-1991 confirmed all the SNPs (and found no other SNPs) that were previously reported for this isolate by use of pyrosequencing [10]. Indeed, the confirmation by re-sequencing of CF333-1991 and the fact that many isolates are only discriminated by a few mutations verify that our genomic analysis has a very low false-positive rate.

A maximum-parsimonious phylogenetic analysis was used to predict the relationship and mutational events among the clones of the DK2 clone type. The tree consistency index (CI = m/s) was calculated as the minimum number of changes (m) divided by the number of changes required on the tree (s). The CI will equal 1 when there is no homoplasy.

For the calculation of average distances of the 15 CF333 isolates to their line of descent, the line of descent was defined as the direct lineage from the most recent common ancestor (MRCA) of all 15 isolates until the MRCA of the three most recently sampled isolates (CF333-2007a, CF333-2007b, CF333-2007c).

To provide the most accurate estimates of the relative homopolymer mutation rates in the mutS/mutL MMR-deficient sub-lineages, we calculated the rates per mutS/mutL MMR-deficiency caused SNP. This corrected count of SNPs were found by subtracting the fraction of SNPs expected to have accumulated due to the normal underlying mutation rate, i.e. SNPs not caused by the mutS/mutL MMR-deficiency. For this purpose a 2∶1 transition to transversion ratio was assumed for the normal background mutation rate. This means that the SNP count of hypermutator branch “KD” composed of 2,534 SNPs (Table 1), hereof 29 transversions, was corrected to 2,447 mutS/mutL MMR-deficiency caused SNPs. All results and conclusions were unaffected from this correction.

Bayesian evolutionary analysis

Bayesian analysis of evolutionary rates and divergence times was performed using BEAST v1.7.2 [47]. BEAST was run with isolate CF510-2006 as an outgroup [9] and the following user-determined settings; a lognormal relaxed molecular clock model which allows rates of evolution to vary amongst the branches of the tree, and a general time-reversible substitution model with gamma correction. Results were produced from three independent chains of 50 million steps each, sampled every 5,000 steps. The first 5 million steps of each chain were discarded as a burn-in. The results were combined, and the maximum clade credibility tree was generated (using LogCombiner and TreeAnnotator programs from the BEAST package, respectively). The effective sample-sizes (ESS) of all parameters were >500 as calculated by Tracer v1.5 (available from http://beast.bio.ed.ac.uk/Tracer), which was also used to calculate 95% HPD confidence intervals of the mutation rate (i.e. an interval in which the modeled parameter resides with 95% probability). The root of the tree was predicted to be in year 1943 (95% HPD, 1910–1962). Note, as this estimate is based on isolates primarily sampled after year 1980, the same accumulation rate of SNPs might not hold true for the evolution of the DK2 clone type before 1980.

Supporting Information

Zdroje

1. LiebermanTD, MichelJB, AingaranM, Potter-BynoeG, RouxD, et al. (2011) Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet 43 : 1275–1280.

2. MutrejaA, KimDW, ThomsonNR, ConnorTR, LeeJH, et al. (2011) Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477 : 462–465.

3. SmithEE, BuckleyDG, WuZ, SaenphimmachakC, HoffmanLR, et al. (2006) Genetic adaptation by Pseudomonas aeruginosa to the airways of cystic fibrosis patients. Proc Natl Acad Sci U S A 103 : 8487–8492.

4. HarrisSR, ClarkeIN, Seth-SmithHM, SolomonAW, CutcliffeLT, et al. (2012) Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet 44 : 413–419, S411.

5. CroucherNJ, HarrisSR, FraserC, QuailMA, BurtonJ, et al. (2011) Rapid pneumococcal evolution in response to clinical interventions. Science 331 : 430–434.

6. MorelliG, SongY, MazzoniCJ, EppingerM, RoumagnacP, et al. (2010) Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet 42 : 1140–1143.

7. HoltKE, BakerS, WeillFX, HolmesEC, KitchenA, et al. (2012) Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44 : 1056–1059.

8. SokurenkoEV, HastyDL, DykhuizenDE (1999) Pathoadaptive mutations: gene loss and variation in bacterial pathogens. Trends Microbiol 7 : 191–195.

9. RauMH, MarvigRL, EhrlichGD, MolinS, JelsbakL (2012) Deletion and acquisition of genomic content during early stage adaptation of Pseudomonas aeruginosa to a human host environment. Environ Microbiol 14 : 2200–2211.

10. YangL, JelsbakL, MarvigRL, DamkiaerS, WorkmanCT, et al. (2011) Evolutionary dynamics of bacteria in a human host environment. Proc Natl Acad Sci U S A 108 : 7481–7486.

11. OliverA, CantonR, CampoP, BaqueroF, BlazquezJ (2000) High frequency of hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. Science 288 : 1251–1254.

12. MaticI, RadmanM, TaddeiF, PicardB, DoitC, et al. (1997) Highly variable mutation rates in commensal and pathogenic Escherichia coli. Science 277 : 1833–1834.

13. MaoEF, LaneL, LeeJ, MillerJH (1997) Proliferation of mutators in A cell population. J Bacteriol 179 : 417–422.

14. LeClercJE, LiB, PayneWL, CebulaTA (1996) High mutation frequencies among Escherichia coli and Salmonella pathogens. Science 274 : 1208–1211.

15. RichardsonAR, YuZ, PopovicT, StojiljkovicI (2002) Mutator clones of Neisseria meningitidis in epidemic serogroup A disease. Proc Natl Acad Sci U S A 99 : 6103–6107.

16. SniegowskiPD, GerrishPJ, LenskiRE (1997) Evolution of high mutation rates in experimental populations of E. coli. Nature 387 : 703–705.

17. ChungJC, BecqJ, FraserL, Schulz-TrieglaffO, BondNJ, et al. (2012) Genomic Variation among Contemporary Pseudomonas aeruginosa Isolates from Chronically Infected Cystic Fibrosis Patients. J Bacteriol 194 : 4857–4866.

18. OliverA, BaqueroF, BlazquezJ (2002) The mismatch repair system (mutS, mutL and uvrD genes) in Pseudomonas aeruginosa: molecular characterization of naturally occurring mutants. Mol Microbiol 43 : 1641–1650.

19. SchaaperRM, DunnRL (1987) Spectra of spontaneous mutations in Escherichia coli strains defective in mismatch correction: the nature of in vivo DNA replication errors. Proc Natl Acad Sci U S A 84 : 6220–6224.

20. BicharaM, WagnerJ, LambertIB (2006) Mechanisms of tandem repeat instability in bacteria. Mutat Res 598 : 144–163.

21. StrandM, ProllaTA, LiskayRM, PetesTD (1993) Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature 365 : 274–276.

22. LevinsonG, GutmanGA (1987) High frequencies of short frameshifts in poly-CA/TG tandem repeats borne by bacteriophage M13 in Escherichia coli K-12. Nucleic Acids Res 15 : 5323–5338.

23. MoxonER, RaineyPB, NowakMA, LenskiRE (1994) Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol 4 : 24–33.

24. MoyanoAJ, SmaniaAM (2009) Simple sequence repeats and mucoid conversion: biased mucA mutagenesis in mismatch repair-deficient Pseudomonas aeruginosa. PLoS One 4: e8203.

25. LiberatiNT, UrbachJM, MiyataS, LeeDG, DrenkardE, et al. (2006) An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc Natl Acad Sci U S A 103 : 2833–2838.

26. YangL, HaagensenJA, JelsbakL, JohansenHK, SternbergC, et al. (2008) In situ growth rates and biofilm development of Pseudomonas aeruginosa populations in chronic lung infections. J Bacteriol 190 : 2767–2776.

27. HarrisSR, FeilEJ, HoldenMT, QuailMA, NickersonEK, et al. (2010) Evolution of MRSA during hospital transmission and intercontinental spread. Science 327 : 469–474.

28. MarvigRL, SondergaardMS, DamkiaerS, HoibyN, JohansenHK, et al. (2012) Mutations in 23S rRNA confer resistance against azithromycin in Pseudomonas aeruginosa. Antimicrob Agents Chemother 56 : 4519.

29. SeverinovK, MarkovD, SeverinovaE, NikiforovV, LandickR, et al. (1995) Streptolydigin-resistant mutants in an evolutionarily conserved region of the beta' subunit of Escherichia coli RNA polymerase. J Biol Chem 270 : 23926–23929.

30. YeeYC, KisslingerB, YuVL, JinDJ (1996) A mechanism of rifamycin inhibition and resistance in Pseudomonas aeruginosa. J Antimicrob Chemother 38 : 133–137.

31. LiaoX, HancockRE (1995) Cloning and characterization of the Pseudomonas aeruginosa pbpB gene encoding penicillin-binding protein 3. Antimicrob Agents Chemother 39 : 1871–1874.

32. LomovskayaO, LewisK (1992) Emr, an Escherichia coli locus for multidrug resistance. Proc Natl Acad Sci U S A 89 : 8938–8942.

33. BieleckiP, LukatP, HuseckenK, DotschA, SteinmetzH, et al. (2012) Mutation in elongation factor G confers resistance to the antibiotic argyrin in the opportunistic pathogen Pseudomonas aeruginosa. Chembiochem 13 : 2339–2345.

34. HancockRE (1998) Resistance mechanisms in Pseudomonas aeruginosa and other nonfermentative gram-negative bacteria. Clin Infect Dis 27 Suppl 1: S93–99.

35. StratevaT, YordanovD (2009) Pseudomonas aeruginosa -⁠ a phenomenon of bacterial resistance. J Med Microbiol 58 : 1133–1148.

36. MoskowitzSM, ErnstRK, MillerSI (2004) PmrAB, a two-component regulatory system of Pseudomonas aeruginosa that modulates resistance to cationic antimicrobial peptides and addition of aminoarabinose to lipid A. J Bacteriol 186 : 575–579.

37. WangY, HaU, ZengL, JinS (2003) Regulation of membrane permeability by a two-component regulatory system in Pseudomonas aeruginosa. Antimicrob Agents Chemother 47 : 95–101.

38. NguyenD, SinghPK (2006) Evolving stealth: genetic adaptation of Pseudomonas aeruginosa during cystic fibrosis infections. Proc Natl Acad Sci U S A 103 : 8305–8306.

39. MoskowitzSM, BrannonMK, DasguptaN, PierM, SgambatiN, et al. (2012) PmrB mutations promote polymyxin resistance of Pseudomonas aeruginosa isolated from colistin-treated cystic fibrosis patients. Antimicrob Agents Chemother 56 : 1019–1030.

40. PhilippeN, CrozatE, LenskiRE, SchneiderD (2007) Evolution of global regulatory networks during a long-term experiment with Escherichia coli. Bioessays 29 : 846–860.

41. WaineDJ, HoneybourneD, SmithEG, WhitehouseJL, DowsonCG (2008) Association between hypermutator phenotype, clinical variables, mucoid phenotype, and antimicrobial resistance in Pseudomonas aeruginosa. J Clin Microbiol 46 : 3491–3493.

42. FerroniA, GuillemotD, MoumileK, BernedeC, Le BourgeoisM, et al. (2009) Effect of mutator P. aeruginosa on antibiotic resistance acquisition and respiratory function in cystic fibrosis. Pediatr Pulmonol 44 : 820–825.

43. CiofuO, RiisB, PresslerT, PoulsenHE, HoibyN (2005) Occurrence of hypermutable Pseudomonas aeruginosa in cystic fibrosis patients is associated with the oxidative stress caused by chronic lung inflammation. Antimicrob Agents Chemother 49 : 2276–2282.

44. Hoiby N, Frederiksen B (2000) Microbiology of cystic fibrosis. In: Hodson M, Geddes D, editors. Cystic fibrosis. 2nd ed. London, United Kingdom: Arnold. pp. 83–107.

45. KrawitzP, RodelspergerC, JagerM, JostinsL, BauerS, et al. (2010) Microindel detection in short-read sequence data. Bioinformatics 26 : 722–729.

46. LiH, HandsakerB, WysokerA, FennellT, RuanJ, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 : 2078–2079.

47. DrummondAJ, SuchardMA, XieD, RambautA (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29 : 1969–1973.