R: Simple Markov Chain Monte Carlo with Slice Sampling

mcmc {diversitree}

R Documentation

Simple Markov Chain Monte Carlo with Slice Sampling

Description

Run a simple-minded MCMC using slice samples (Neal 2003) for independent updating of each variable.

Usage

mcmc(lik, x.init, nsteps, ...)
## Default S3 method:
mcmc(lik, x.init, nsteps, w, prior=NULL,
             sampler=sampler.slice, fail.value=-Inf, lower=-Inf,
             upper=Inf, print.every=1, control=list(),
             save.every=0, save.file, ...)

sampler.slice(lik, x.init, y.init, w, lower, upper, control)
sampler.norm(lik, x.init, y.init, w, lower, upper, control)

Arguments

`lik`	Likelihood function to run MCMC on. This must return the log likelihood (or the log of a value proportional to the likelihood).
`x.init`	Initial parameter location (vector).
`nsteps`	Number of MCMC steps to take.
`w`	Tuning parameter for the sampler. See Details below for more information.
`prior`	An optional prior probability distribution function. This must be a function that returns the log prior probability, given a parameter vector. See `make.prior` for more information. If no prior is given, unbounded (and therefore “improper”) priors are used for all parameters, which can cause the MCMC to fail in some situations.
`sampler`	Sampler to use for the MCMC. There are currently only two implemented; `sampler.slice` (the default, and generally recommended), and `sampler.norm` (Gaussian updates, and for illustrative purposes mostly).
`lower`	Lower bounds on parameter space (scalar or vector of same length as `x.init`).
`upper`	Upper bounds on parameter space (scalar or vector of same length as `x.init`).
`fail.value`	Value to use where function evaluation fails. The default (negative infinity) corresponds to zero probability. Most values that fail are invalid for a given model (negative rates, etc) or have negligble probability, so this is reasonable. Set to `NULL` to turn off checking.
`print.every`	The position and its probability will be printed every `print.every` generations. Set this to 0 to disable printing.
`control`	List with additional control parameters for the sampler. Not currently used.
`save.every`	Number of steps to save progress into `save.file`. By default this is 0, which prevents saving occuring. Low nonzero values of this will slow things down, but may be useful during long runs.
`save.file`	Name of csv file to save temporary output in. Contents will be rewritten at each save.
`...`	Arguments passed to the function `lik`
`y.init`	Likelihood evaluated at `x.init`.

Details

There are two samplers implemented: a slice sampler (Neal 2003) and a basic Gaussian sampler. In general, only the slice sampler should be used; the Gaussian sampler is provided for illustration and as a starting point for future samplers.

For slice sampling (sampler.slice), the tuning parameter w affects how many function evaluations are required between sample updates, but in almost all cases it does not affect how fast the MCMC “mixes” (Neal 2003). In particular, w is not analagous to the step sizes used in conventional Metropolis-Hastings updaters that use some fixed kernel for updates (see below). Ideally, w would be set to approximately the width of the high probability region. I find that chosing the distance between the 5% and 95% quantiles of the marginal distributions of each parameter works well, computed from this preliminary set of samples (see Examples). If a single value is given, this is shared across all parameters.

For the Gaussian updates (sampler.norm), the tuning parameter w is the standard deviation of the normal distribution centred on each parameter as it is updated.

For both samplers, if a single value is given, this is shared across all parameters. If a vector is given, then it must be the same length as w, and parameter i will use w[i].

If the MCMC is stopped by an interrupt (Escape on GUI versions of R, Control-C on command-line version), it will return a truncated chain with as many points as completed so far.

This is far from the most efficient MCMC function possible, as it was designed to work with likelihood functions that are relatively expensive to compute. The overhead for 10,000 slice samples is on the order of 5s on a 2008 Mac Pro (0.0005 s / sample).

The sampler function sampler.norm and sampler.slice should not generally be called directly (though this is possible), but exist only to be passed in to mcmc.

Author(s)

Richard G. FitzJohn

References

Neal R.M. 2003. Slice sampling. Annals of Statistics 31:705-767.

Examples

To demonstrate, start with a simple bivariate normal. The function 'make.mvn' creates likelihood function for the multivariate normal distribution given 'mean' (a vector) and 'vcv' (the variance covariance matrix). This is based on mvnorm in the package mvtnorm, but will be faster where the vcv does not change between calls.

make.mvn <- function(mean, vcv) {
    logdet <- as.numeric(determinant(vcv, TRUE)$modulus)
    tmp <- length(mean) * log(2 * pi) + logdet
    vcv.i <- solve(vcv)

    function(x) {
        dx <- x - mean
        -(tmp + rowSums((dx %*% vcv.i) * dx))/2
    }
}

Our target distribution has mean 0, and a VCV with positive covariance between the two parameters.

vcv <- matrix(c(1, 0.25, 0.25, 0.75), 2, 2)
lik <- make.mvn(c(0, 0), vcv)

Sample 10,000 points from the distribution, starting at c(0, 0).

set.seed(1)
samples <- mcmc(lik, c(0, 0), 10000, 1, print.every = 1000)

## 1000: {-0.2770, -0.1155} -> -1.69046
## 2000: {-0.2669, -0.6291} -> -1.91619
## 3000: {-0.9096, -1.1534} -> -2.68786
## 4000: {-0.5028, -0.5651} -> -1.91738
## 5000: {0.5887, 0.2555} -> -1.83234
## 6000: {-1.6867, -0.2918} -> -3.08529
## 7000: {0.8322, 0.6029} -> -2.11014
## 8000: {-0.0344, 0.8899} -> -2.23830
## 9000: {-1.2724, -0.4611} -> -2.47496
## 10000: {-0.2465, 1.6853} -> -3.90028

The marginal distribution of V1 (the first axis of the distribution) should be a normal distribution with mean 0 and variance 1:

curve(dnorm, xlim = range(samples$V1), ylim = c(0, 0.5), col = "red")
hist(samples$V1, 30, add = TRUE, freq = FALSE)

plot of chunk unnamed-chunk-4


plot(V2 ~ V1, samples, pch = 19, cex = 0.2, col = "#00000055", asp = 1)

plot of chunk unnamed-chunk-4

The estimated variance here matches nicely with the true VCV:

var(samples[2:3])

##        V1     V2
## V1 1.0025 0.2362
## V2 0.2362 0.7323

The above uses slice sampling. We can use simple Gaussian updates instead. This performs updates with standard deviation '1' in each direction. Unlike slice sampling, the 'w' parameter here will matter a great deal in determining how fast the chain will mix.

samples.norm <- mcmc(lik, c(0, 0), 10000, 1, print.every = 1000, 
    sampler = sampler.norm)

## 1000: {-0.4445, -0.1761} -> -1.75240
## 2000: {-1.6756, -2.2118} -> -5.39205
## 3000: {1.3478, 0.5958} -> -2.60753
## 4000: {1.7179, -0.0141} -> -3.26917
## 5000: {-0.7712, -0.1732} -> -1.94816
## 6000: {0.6816, 1.4551} -> -3.08319
## 7000: {1.8971, 0.5115} -> -3.45100
## 8000: {0.1182, -0.8298} -> -2.19456
## 9000: {0.7103, -0.6993} -> -2.46191
## 10000: {-0.4434, -0.9854} -> -2.30514

This appears to run much faster than the slice sampling based approach above, but the effective sample size of the second approach is much lower. The 'effectiveSize' function in coda says that for 10,000 samples using slice sampling, the effective sample size (equivalent number of independent samples) is about 8,500, but for the Gaussian updates is only 1,200. This can be seen by comparing the autocorrelation between samples from the two different runs.

op <- par(oma = c(0, 0, 2, 0))
acf(samples[2:3])
title(main = "Slice sampling", outer = TRUE)

plot of chunk unnamed-chunk-7


acf(samples.norm[2:3])
title(main = "Gaussian updates", outer = TRUE)

plot of chunk unnamed-chunk-7

The autocorrelation is negligable after just 2 samples under slice sampling, but remains significant for about 15 with Gaussian updates.

[This section is not run by default] Next, a diversitree likelihood example. This example uses a 203 species phylogeny evolved under the BiSSE model. This takes a more substantial amount of time, so is not evaluated by default.

pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01)
set.seed(2)
phy <- tree.bisse(pars, max.t = 60, x0 = 0)

First, create a likelihood function:

lik <- make.bisse(phy, phy$tip.state)
lik(pars)

This produces about a sample a second, so takes a while. The "upper" limit is a hard upper limit, above which the sampler will never let the parameter going (in effect, putting a uniform prior on the range lower..upper, and returning the joint distribution conditional on the parameters being in this range).

tmp <- mcmc(lik, pars, nsteps = 100, w = 0.1)

The argument 'w' works best when it is about the width of the "high probability" region for that parameter. This takes the with of the 90##D than the first guess. Samples are generated about 1/s; allow 15 minutes to generate 1000 samples.

w <- diff(sapply(tmp[2:7], quantile, c(0.05, 0.95)))
out <- mcmc(lik, pars, nsteps = 1000, w = w)

You can do several things with this. Look for candidate ML points:

out[which.max(out$p), ]

Or look at the marginal distribution of parameters

profiles.plot(out["lambda0"], col.line = "red")

Or look at the joint marginal distribution of pairs of parameters

plot(lambda0 ~ mu0, out)

[Ends not run section]

[Package diversitree version 0.9-4 ]