Frequently Asked Questions

Large clades

Q: I have really large clades (say 1000+ species); is there any way that I can use the "unresolved clade BiSSE" with my tree?

I don't think that the approach that I use for the unresolved clades will scale to any more than about 200 species (even at that limit, it can be pretty poorly behaved). There are a couple of reasons for this:

First, the time requirements grow exponentially (from memory space grows something like n^4, which is pretty nasty, too). Basically, this means that even if you could find the computational power to do a clade of size n, a clade of size n+1 could take a thousand years, perhaps. This is the bane of a huge number of computational problems.

Second, and more subtly, there is a machine precision issue. Floating point arithmetic can be unreliable for numbers smaller than about 1e-8, and totally useless for numbers smaller than 1e-15 (try (1 + 1e-16) - 1 in R on most platforms to see what I mean). Because you are spreading a single unit of probability over an increasingly large space, a huge number of the cells (almost certainly including the one that you end up caring about) will be these really unreliable numbers. I've seen this creep in for clades that are smaller than 200 species where there is moderate extinction rates.

Positive log-likelihood values

Q: I am getting positive log likelihood values! I thought that log likelihoods had to be negative - is this a problem with my tree or with diversitree?

The log likelihood is just proportional to the probability of observing the data, up to some unknown normalising constant.

For BiSSE-style models, these generally arise in trees that have a very short root-tip distance. This means that per unit tree time the speciation rates must be very large (on the order of log(N)/t) for a tree with N tips and root-tip distance t. At each node, the conditional likelihoods are multiplied by the speciation rates, so there is a multiplication by ((N-1)λ).

If these bother you, just multiply the branch lengths of your tree:

         phy$edge.length <- phy$edge.length * 100
    

The estimated rates will now be a factor of 100 smaller, and the log likelihoods will probably be negative.

deSolve version

Q: I get this warning message when running diversitree:

         Warning message:
         In make.ode(derivs, dll, initfunc, neq, safe) :
         diversitree is not known to work with deSolve>   X.XX.X
         falling back on slow version

It does seem to run slow. Is there a solution to this, and is this why it is running slow?

This error message appears whenever the installed deSolve version is more recent than the current diversitree.

deSolve's ode solvers look up the memory address of the derivative functions every time they are evaluated. This is a nontrivial operation, and happens for every branch on a tree -- thousands of times during an ML search or MCMC chain. To get around this, I use non-exported Fortran functions in deSolve directly, and remember the address of the derivative functions after the first lookup. However, if the definitions of these change R will crash (not an error, but a complete crash). When the deSolve version is not known to work, this caching is skipped, and the calculations slow down.

To work around this problem, there are two options:

  1. Downgrade to the previous deSolve version indicated in the warning message. Older versions can be installed from this source archive.
  2. For Linux and OSX users, pass method="CVODES" to the make.xxx function to use a different backend. This does not work on windows.