English   español  
Please use this identifier to cite or link to this item: http://hdl.handle.net/10261/139910
Share/Impact:
Statistics
logo share SHARE logo core CORE   Add this article to your Mendeley library MendeleyBASE

Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL
Exportar a otros formatos:
Title

Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability.

AuthorsArenas, Miguel ; Sánchez-Cobos, Agustin; Bastolla, Ugo
Keywordsfolding stability
maximum-likelihood estimate
structurally constrained substitution models
misfolded state,
Issue Date2-Apr-2015
PublisherOxford University Press
CitationMolecular Biology and Evolution 32: 2195- 2207 (2015)
AbstractDespite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new meanfield substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that themost variable sites are those with an intermediate number of native contacts. The meanfield models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback–Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitutionmodel assigns larger likelihood than the empiricalmodel to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at ttp://ub.cbm.uam.es/software/Prot_Evol.php
URIhttp://hdl.handle.net/10261/139910
DOI10.1093/molbev/msv085
Identifiersdoi: 10.1093/molbev/msv085
issn: 0737-4038
Appears in Collections:(CBM) Artículos
Files in This Item:
File Description SizeFormat 
Bastolla U Maximum Likelihood.pdf457,03 kBAdobe PDFThumbnail
View/Open
Show full item record
 

Related articles:


WARNING: Items in Digital.CSIC are protected by copyright, with all rights reserved, unless otherwise indicated.