PHYML is a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time.
An option is available to assess the reliability of internal branches using nonparametric bootstrap which is possible to achieve for even large datasets, thanks to the speed of PHYML optimization algorithm. The number of bootstrap replicates is fixed by the user. The bootstrap values are displayed on the maximum likelihood phylogeny estimated from the original dataset. Trees estimated from each bootstrap replicate, as well as the corresponding substitution parameters, can also be saved in separate files for further analysis of computation of confidence intervals for the substitution parameters or estimation of a consensus bootstrap tree, as performed by PHYLIP's CONSENSES.
The PHYML software has been implemented in C ANSI and is available under GNU general public licence. Sources are available upon request. Binaries, example datasets, sources and documentation are distributed free of charge for academic purpose only.
PhyML has two distinct user-interfaces. The first interface is a PHYLIP-like text interface that makes the choice of the options selfexplanatory. Second is the command-line interface that is well-suited for people that are familiar with PhyML options or for running PhyML in batch mode.
The main tool in this module builds phylogenies under the maximum likelihood criterion. It implements a large number of substitution models coupled to efficient options to search the space of phylogenetic tree topologies. PhyTime is another tool in the PhyML package that focuses on divergence date estimation in a Bayesian setting. The main strengths of PhyTime lies in its ability to accommodate for uncertrainty in the placement of fossil calibration and the use of realistic models of rate variation along the tree.
Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. PhyML has been widely used because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, a new algorithm has developed to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, it describes a new test to assess the support of the data for internal branches of a phylogeny.