Forecasts
Here are my latest forecasts, made on 2016-02-03:
Confidence bounds | |||
---|---|---|---|
Quantity | Best fit | Lower | Upper |
Outbreak expected to end | 2016-10-19 | 2016-09-27 | 2016-11-10 |
Final number of cases | 28,400 | 26,400 | 30,500 |
Final number of deaths | 11,400 | 10,500 | 12,300 |
I've been running my model out almost every day since January 2015 to see how consistent the forecasts are. Forecasts change as I make changes to the model, as new data come in, and just due to chance when curve fitting. Here are my historical forecasts:
I try to forecast three things:
- The date the epidemic will be declared over (end date),
- The final number of cases, and
- The final number of deaths.
End date
My model uses differential equations, to model the disease progress. That means the total number of cumulative cases and deaths are approximated as real numbers and can keep increasing indefinitely as the epidemic winds down. In reality, of course, the number of cases/deaths can only be whole numbers. To determine when a real epidemic ends you just look for the last time the number of cases increased. But that won't work in this kind of model where the numbers keep increasing (just slower and slower). So, how do I forecast the end date of the epidemic? I use the same criterion as the World Health Organization: 42 days without a new case. For my model that just means I take the end date to be when the rate of new cases drops below 1 every 42 days, or $dC/dt < 1/42$.
Here's the same graph on a logscale, showing the threshold of 1 new case per 42 days:
Extrapolated forecasts like these rely on some unlikely assumptions so they tend to be unreliable:
- They assume that that the underlying model is “good” — that it captures the essence of the problem. George E. P. Box wrote “all models are wrong” to make the point that modeling is a process of “throwing away” details. Hopefully, only the irrelevant details are thrown out but it is hard to be sure. I can only hope that I haven't thrown away any details that are essential for determining the dynamics of this epidemic.
- They assume that the past is a good predictor of the future; things won't change. But of course they will. People learn and adapt; diseases evolve; environmental conditions change. There are many reasons to expect that the future will be different from the past. Maybe a vaccine will be distributed. Maybe the disease will evolve to become less virulent. It's impossible to anticipate everything that could change so that a model that was previously good, becomes useless.
- Even if it this is a good model and the trend continues, the predicted end date could be wildly wrong just due to chance. As a differential equation model, it only predicts the average behaviour of the system – imagine taking the average of many repeated epidemics. It neglects details about individual events, such as exactly when the last infection occurs. In this way, it's kind of like saying “Because this is a fair coin I predict that on the next toss it will land half heads-up, half tails-up.” (What does that even mean? On its edge?) It is possible to account for these stochastic sources of error by building an agent-based model, but I haven't done that.