Empirical Numerical Properties of Maximum Likelihood Phylogenetic Inference

  • Subject:Phylogenetic Inference, Efficient Evolutionary Bioinformatics
  • Type:Masterarbeit
  • Date:October 2020
  • Supervisor:

    Lukas Hübner, Alexandros Stamatakis

  • Student:

    Julia Haag

  • Links:PDF
  • Phylogenetic trees represent hypothetical evolutionary relationships between organisms. Approaches for inferring phylogenetic trees include the Maximum Likelihood (ML) method. This method relies on numerical optimization routines that use internal numerical thresholds. We analyze the influence of these thresholds on the likelihood scores and runtimes of tree inferences for the ML inference tools RAxML-NG, IQ-Tree, and FastTree. We analyze 22 empirical datasets and show that we can speed up the tree inference in RAxML-NG and IQ-Tree by changing the default values of two such numerical thresholds. Using 15 additional simulated datasets, we show that these changes do not affect the accuracy of the inferred phylogenetic trees. For RAxML-NG, increasing the likelihood thresholds lh_epsilon and spr_lh_epsilon to 10 and 10^3 respectively results in an average speedup of 1.9 ± 0.6. Increasing the likelihood threshold lh_epsilon in IQ-Tree results in an average speedup of 1.3 ± 0.4. In addition to the numerical analysis, we attempt to predict the difficulty of datasets, with the aim of preventing an unnecessarily large number of tree inferences for datasets that are easy to analyze. We present our prediction experiments and discuss why this task proved to be more challenging than anticipated.