@MichaelO
TL’DR: Yes you can do a forecast for stationary data with fast.ai (or any other deep learning API) but forecasting non-stationary data especially with variable variance, you do better with Bayesian-based deep learning for not-so-obvious reasons.
And just FYI, 30 Day forecast on non-stationary data, while possible, comes with a ridiculous error that makes it relatively useless in practice so you better have stationary data or settle for a smaller forecast time window.
Long story:
Machine learning roots in statistics and statistics, for the most part, relies on “frequencies” in the sense of which values occur how often in the data. When date are big, you use a sample distribution and approximate the real frequencies and that all works pretty well and thus, no prior knowledge of the data is required.
The Bayesian point of view starts with a prior probability which is based on some previous belief or knowledge you already have. However, with each sample you draw from the data, you update that previous belief to approximate reality as closely as possible.
Matthew Stewart points out that, the “fundamental difference between the Bayesian and frequentist approach is about where the randomness is present. In the frequentist domain, the data is considered random and the parameters (e.g. mean, variance) are fixed. In the Bayesian domain, the parameters are considered random and the data is fixed.”
With statistics and deep learning you have just a single parameter as a result of your estimator (the data is random, the parameters are fixed), but with Bayesian you have a probability distribution over the parameters (the parameters are random, the data are fixed), so you need to integrate to obtain the distribution over your data. That makes the math kinda cumbersome and the modeling a bit harder to understand, but that is what you have deal with whenever complexity increases.
Fundamentally, the parameter frequency in statistic and the parameter probability distribution in Bayesian are really two different ways to look at the same data. And that raises immediately an important question:
When would you use statistic based deep learning and when would you use the Bayesian-based deep learning?
Statistic / Frequency based deep learning excels when:
- You have a ton of data. (Law of large numbers)
- A single value for each parameter is sufficient to approximate the underlyng function. (Universal approximation theorem)
- There is zero prior knowledge of the data (distribution)
When you think about the implications, it makes perfect sense that NN excel at image data because, quite often, you have a lot of images, single values for parameters can be learned extremely well, and since an image actually is just a 2D array of numeric RGB values, you have no clue of data distribution, properties, or whatsoever. Luckily, you don’t have to because of the universal approximation theorem.
Speaking of the forecast problem, whenever you have sufficient data or you can generate more data with augmentation, an FCN can do remarkably well. I measure frequently a root mean squared percentage error in the high eighties or low nineties with the fabulous tabular learner. However, that works only well with stationary or semi-stationary data.
Stationary data reverts around a constant long-term mean and have a constant variance independent of time. Conversely, non-stationary data are just plain random and impossible to predict.
When you generate delta-mean-features, that measure the difference between your y value and any moving average, you capture the stationary part (revert to the mean) of the data and that is technically what you need to predict (y+n). And that is the only thing you cannot do with the tabular learner, which always uses x to predict y, as the underlying linear equation dictates.
As a rule of thumb, whenever you deal with time-dependent and semi-stationary data, it’s going to be really, really hard. it is possible, but ain’t no free lunch here.
In the Rossman example, you have plenty of exogenous data that are largely non-stationary and so a normal FCN deep network well to predict sales.
When modeling financial markets, you don’t have that luxury because at least variance (volatility) isn’t exactly constant in any asset class.
That brings us to the use case of Bayesian-based deep learning. You use it whenever you have:
- Relatively few data (that’s true in finance)
- Have a (strong) prior intuitions (from pre-existing observations/models) about how things work (that’s mostly true in finance)
- Having high levels of uncertainty, or a strong need to quantify the level of uncertainty about a particular model or comparison of models
The last point is the actual selling point because in quant finance, your day job is to model risk and therefore you must know the degree of uncertainty.
With PyTorch / Pyro you get the luxury of both worlds, that means, you do probabilistic parameter sampling and feed into a nice FCN to do predictions while using GPU acceleration.
To answer your question, yes you can do a forecast for stationary data with fast.ai (or any other deep learning API) but for a forecasting non-stationary especially with variable variance, you do better with Bayesian-based deep learning because then you use variance distribution instead.
Hope that helps.