Archive for the 'Molecular Biology' Category

Assessing the Conservation of Mammalian Gene Expression Using High-Density Exon Arrays

Thursday, January 1st, 1970

Microarray data from multiple species have been used to study evolutionary constraints on gene expression. Expression measurements from conventional microarray platforms such as the 3' expression arrays are strongly affected by platform-dependent probe effects that may introduce apparent but misleading discrepancies between species. In this manuscript, we assess the conservation of mammalian gene expression in adult tissues using data from a high-density exon array platform. The exon arrays have more than 6 million probes on a single array targeting all exons in a genome. We find that, unlike 3' array data, gene expression measurements from exon arrays reveal patterns of gene expression that are highly conserved between humans and mice in multiple tissues. Our analysis provides strong evidence for widespread stabilizing selection pressure on transcript abundance during mammalian evolution.

A Reversible Jump Method for Bayesian Phylogenetic Inference with a Nonhomogeneous Substitution Model

Thursday, January 1st, 1970

Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G + C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.

Tracing Past Human Male Movements in Northern/Eastern Africa and Western Eurasia: New Clues from Y-Chromosomal Haplogroups E-M78 and J-M12

Thursday, January 1st, 1970

Detailed population data were obtained on the distribution of novel biallelic markers that finely dissect the human Y-chromosome haplogroup E-M78. Among 6,501 Y chromosomes sampled in 81 human populations worldwide, we found 517 E-M78 chromosomes and assigned them to 10 subhaplogroups. Eleven microsatellite loci were used to further evaluate subhaplogroup internal diversification.

The geographic and quantitative analyses of haplogroup and microsatellite diversity is strongly suggestive of a northeastern African origin of E-M78, with a corridor for bidirectional migrations between northeastern and eastern Africa (at least 2 episodes between 23.9–17.3 ky and 18.0–5.9 ky ago), trans-Mediterranean migrations directly from northern Africa to Europe (mainly in the last 13.0 ky), and flow from northeastern Africa to western Asia between 20.0 and 6.8 ky ago.

A single clade within E-M78 (E-V13) highlights a range expansion in the Bronze Age of southeastern Europe, which is also detected by haplogroup J-M12. Phylogeography pattern of molecular radiation and coalescence estimates for both haplogroups are similar and reveal that the genetic landscape of this region is, to a large extent, the consequence of a recent population growth in situ rather than the result of a mere flow of western Asian migrants in the early Neolithic.

Our results not only provide a refinement of previous evolutionary hypotheses but also well-defined time frames for past human movements both in northern/eastern Africa and western Eurasia.

A Likelihood Framework to Measure Horizontal Gene Transfer

Thursday, January 1st, 1970

We suggest a likelihood-based approach to estimate an overall rate of horizontal gene transfer (HGT) in a simplified setting. To this end, we assume that the number of occurring HGT events within a given time interval follows a Poisson process. To obtain estimates for the rate of HGT, we simulate the distribution of tree topologies for different numbers of HGT events on a clocklike species tree. Using these simulated distributions, we estimate an HGT rate for a collection of gene trees representing a set of taxa. As an illustrative example, we use the "Clusters of Orthologous Groups of proteins" (COGs). We also perform a correction of the estimated rate taking into account the inaccuracies due to gene tree reconstructions. The results suggest a corrected HGT rate of about 0.36 per gene and unit time, in other words 11 HGT events have occurred on average among the 44 taxa of the COG species tree. A software package to estimate an HGT rate is available online (http://www.cibiv.at/software/hgt/).

Positive Selection for Single Amino Acid Change Promotes Substrate Discrimination of a Plant Volatile-Producing Enzyme

Thursday, January 1st, 1970

We used a combined evolutionary and experimental approach to better understand enzyme functional divergence within the SABATH gene family of methyltransferases (MTs). These enzymes catalyze the formation of a variety of secondary metabolites in plants, many of which are volatiles that contribute to floral scent and plant defense such as methyl salicylate and methyl jasmonate. A phylogenetic analysis of functionally characterized members of this family showed that salicylic acid methyltransferase (SAMT) forms a monophyletic lineage of sequences found in several flowering plants. Most members of this lineage preferentially methylate salicylic acid (SA) as compared with the structurally similar substrate benzoic acid (BA). To investigate if positive selection promoted functional divergence of this lineage of enzymes, we performed a branch-sites test. This test showed statistically significant support (P < 0.05) for positive selection in this lineage of MTs (dN/dS = 10.8). A high posterior probability (pp = 0.99) identified an active site methionine as the only site under positive selection in this lineage. To investigate the potential catalytic effect of this positively selected codon, site-directed mutagenesis was used to replace Met with the alternative amino acid (His) in a Datura wrightii floral–expressed SAMT sequence. Heterologous expression of wild-type and mutant D. wrightii SAMT in Escherichia coli showed that both enzymes could convert SA to methyl salicylate and BA to methyl benzoate. However, competitive feeding with equimolar amounts of SA and BA showed that the presence of Met in the active site of wild-type SAMT resulted in a >10-fold higher amount of methyl salicylate produced relative to methyl benzoate. The Met156His-mutant exhibited little differential preference for the 2 substrates because nearly equal amounts of methyl salicylate and methyl benzoate were produced. Evolution of the ability to discriminate between the 2 substrates by SAMT may be advantageous for efficient production of methyl salicylate, which is important for pollinator attraction as well as pathogen and herbivore defense. Because BA is a likely precursor for the biosynthesis of SA, SAMT might increase methyl salicylate levels directly by preferential methylation and indirectly by leaving more BA to be converted into SA.

The Mitochondrial Genome of the Lizard Calotes versicolor and a Novel Gene Inversion in South Asian Draconine Agamids

Thursday, January 1st, 1970

A complete mitochondrial DNA (mtDNA) sequence was determined for the lizard Calotes versicolor (Reptilia; Agamidae). The 16,670-bp genome with notable shorter genes for some protein-coding and tRNA genes had the same gene content as that found in other vertebrates. However, a novel gene arrangement was found in which the proline tRNA (trnP) gene is located in the light strand instead of its typical heavy-strand position, providing the first known example of gene inversion in vertebrate mtDNAs. A segment of mtDNA encompassing the trnP gene and its flanking genes and the control region was amplified and sequenced for various agamid taxa to investigate timing and mechanism of the gene inversion. The inverted trnP gene organization was shared by all South Asian draconine agamids examined but by none of the other Asian and African agamids. Phylogenetic analyses including clock-free Bayesian analyses for divergence time estimation suggested a single occurrence of the gene inversion on a lineage leading to the draconine agamids during the Paleogene period. This gene inversion could not be explained by the tandem duplication/random loss model for mitochondrial gene rearrangements. Our available sequence data did not provide evidence for remolding of the trnP gene by an anticodon switch in a duplicated tRNA gene. Based on results of sequence comparisons and other circumstantial evidence, we hypothesize that inversion of the trnP gene was originally mediated by a homologous DNA recombination and that the de novo gene organization that does not disrupt expression of mitochondrial genes has been maintained in draconine mtDNAs for such a long period of time.

Exceptionally High Density of NUMTs in the Honeybee Genome

Thursday, January 1st, 1970

The available genome sequences of 4 insects (the fruit fly, the African malaria mosquito, the flour beetle, and the honeybee) are used to compare the amount of mitochondrial DNA transferred to the nuclear genome (NUMTs). The data from the beetle and the bee show frequent transfer of NUMTs, whereas NUMTs in the 2 other insects are rare. The density of NUMTs in the honeybee (>1.0 bp transferred DNA per 1 kb of the nuclear sequence) is the highest in any animal studied, about ten times higher than in humans and comparable to the densities in plant genomes. The density of NUMTs in the beetle (0.056 bp/kb) is of the same order of magnitude as that in humans. The analysis of the honeybee genome indicates that NUMTs originate from all parts of the mitochondrial genome, that about two-thirds of the nuclear copies result from secondary transpositions within the nuclear genome, that the copies are significantly associated to "mariner" type transposons, and that the NUMTs consist mainly of short and fragmented copies.

Adaptive Evolution of Metabolic Pathways in Drosophila

Thursday, January 1st, 1970

The adaptive significance of enzyme variation has been of central interest in population genetics. Yet, how natural selection operates on enzymes in the larger context of biochemical pathways has not been broadly explored. A basic expectation is that natural selection on metabolic phenotypes will target enzymes that control metabolic flux, but how adaptive variation is distributed among enzymes in metabolic networks is poorly understood. Here, we use population genetic methods to identify enzymes responding to adaptive selection in the pathways of central metabolism in Drosophila melanogaster and Drosophila simulans. We report polymorphism and divergence data for 17 genes that encode enzymes of 5 metabolic pathways that converge at glucose-6-phosphate (G6P). Deviations from neutral expectations were observed at five loci. Of the 10 genes that encode the enzymes of glycolysis, only aldolase (Ald) deviated from neutrality. The other 4 genes that were inconsistent with neutral evolution (glucose-6-phosphate dehydrogenase [G6pd]), phosphoglucomutase [Pgm], trehalose-6-phosphate synthetase [Tps1], and glucose-6phosphatase [G6pase] encode G6P branch point enzymes that catalyze reactions at the entry point to the pentose-phosphate, glycogenic, trehalose synthesis, and gluconeogenic pathways. We reconcile these results with population genetics theory and existing arguments on metabolic regulation and propose that the incidence of adaptive selection in this system is related to the distribution of flux control. The data suggest that adaptive evolution of G6P branch point enzymes may have special significance in metabolic adaptation.

The Yersinia kristensenii O11 O-Antigen Gene Cluster was Acquired by Lateral Gene Transfer and Incorporated at a Novel Chromosomal Locus

Thursday, January 1st, 1970

We have sequenced the O-antigen gene clusters for the Escherichia coli O98 and Yersinia kristensenii O11 O antigens. The basic structures of these O antigens are identical, and the sequence data indicate that Y. kristensenii O11 gained its O-antigen gene cluster by lateral gene transfer (LGT). Escherichia coli O98 has a typical O-antigen gene cluster between galF and gnd as is usual in E. coli. However, the O-antigen gene cluster of Y. kristensenii O11 is not located at the traditional Yersinia O-antigen gene cluster locus, between hemH and gsk, but at a novel chromosomal locus between aroA and cmk where it is flanked by remnant galF and gnd genes that indicate the probable source of the gene cluster. Phylogenetic analysis indicated that the source was not E. coli itself but a species in the Escherichia, Salmonella, and Klebsiella group of genera. Although other O-antigen studies imply LGT on the basis of the hypervariability of the loci and GC content, this report also identifies a potential donor and provides evidence for the mechanism involved. Remnant insertion sequence (IS) sequences flank the galF and gnd remnants and suggest that LGT of the gene cluster was IS mediated.

Chloroplast Genome (cpDNA) of Cycas taitungensis and 56 cp Protein-Coding Genes of Gnetum parvifolium: Insights into cpDNA Evolution and Phylogeny of Extant Seed Plants

Thursday, January 1st, 1970

Phylogenetic relationships among the 5 groups of extant seed plants are presently unsettled. To reexamine this long-standing debate, we determine the complete chloroplast genome (cpDNA) of Cycas taitungensis and 56 protein-coding genes encoded in the cpDNA of Gnetum parvifolium. The cpDNA of Cycas is a circular molecule of 163,403 bp with 2 typical large inverted repeats (IRs) of 25,074 bp each. We inferred phylogenetic relationships among major seed plant lineages using concatenated 56 protein-coding genes in 37 land plants. Phylogenies, generated by the use of 3 independent methods, provide concordant and robust support for the monophylies of extant seed plants, gymnosperms, and angiosperms. Within the modern gymnosperms are 2 highly supported sister clades: CycasGinkgo and GnetumPinus. This result agrees with both the "gnetifer" and "gnepines" hypotheses. The sister relationships in CycasGinkgo and GnetumPinus clades are further reinforced by cpDNA structural evidence. Branch lengths of Cycas–Ginkgo and Gnetum were consistently the shortest and the longest, respectively, in all separate analyses. However, the Gnetum relative rate test revealed this tendency only for the 3rd codon positions and the transversional sites of the first 2 codon positions. A tufA located between psbE and petL genes is here first detected in Anthoceros (a hornwort), cycads, and Ginkgo. We demonstrate that the tufA is a footprint descended from the chloroplast tufA of green algae. The duplication of ycf2 genes and their shift into IRs should have taken place at least in the common ancestor of seed plants more than 300 MYA, and the tRNAPro-GGG gene was lost from the angiosperm lineage at least 150 MYA. Additionally, from cpDNA structural comparison, we propose an alternative model for the loss of large IR regions in black pine. More cpDNA data from non-Pinaceae conifers are necessary to justify whether the gnetifer or gnepines hypothesis is valid and to generate solid structural evidence for the monophyly of extant gymnosperms.