I am working on time series data (basically predicting a number in the future). What might be the best approach besides RNNs. I think I’ve heard Jeremy saying that he obtained better results using a traditional tabular approach but I couldn’t find the course related to that.
The state of art in time series forecasting is achieved using different variant of LSTM architecture. Joshua Bengio Group lately published a new architecture N-BEATS that uses s a multi-layer FC network. @ takotab implemented N-BEATS for fastai2. You can find his package fastseqhere
You might check this post to have an overview of the different time series projects that we are developing for fastai in this forum.
Here is the Time Series Thread (for fastai2) where we are discussing the implementation of a common time series package for fastai2.
PS: The case treated in Rossmann is a regression: It is a kind of a single point forecasting. In the present project, we are more interested in multi-point forecasting and more specifically in probabilistic forecasting. The latter gives an idea of your confidence interval of your forecast (prediction). Single point forecasting only gives either the median or the mean of the forecast which is insufficient because we don’t know the size of the confidence interval.
I am looking to forecast daily traffic to our website, where I have about 1200 days worth of data. Does this seem like a solvable problem? I have been using Facebook’s Prophet model. Eager to see what I can do with Fast AI!
I haven’t started on this yet, but I am thinking one issue will be that our paid marketing channels are a significant contributor to site traffic. And our paid marketing spend is variable over time.
We are a large online fashion resale site so I currently have a dataframe of the following:
Date
Number of Visitors
Fashion item release day? (0 or 1)
Would it be of value to add daily paid marketing spend as another variable? Probably right?
@jwithing Yes, it would be good to include daily paid marketing as a co-variate (feature time-series). Your model will learn the impact of that data and will incorporate it in its forecasting.
Here below is a brief illustrated explanation on how it works at a high level:
A model is trained by randomly sampling several training examples from each of the time series in the training dataset. Each training example consists of a pair of adjacent context and prediction windows with fixed predefined lengths. The context_length hyperparameter controls how far in the past the network can see, and the prediction_length hyperparameter controls how far in the future predictions can be made.
The following figure represents five samples with context lengths of 12 hours and prediction lengths of 6 hours drawn from element i. The feature time series are xi,1,t and ui,2,t (also called co-variates in literature).
To capture seasonality patterns, a model can also automatically feeds lagged values from the target time series. In the example with hourly frequency, for each time index, t = T, the model exposes the zi,t values, which occurred approximately one, two, and three days in the past.
Got it those illustrations are very helpful! I’m reaching out to @takotab over email …can’t figure out how to install fastseq on a Colab notebook. I have fastai2 installed there.
Well, I’m still doing something wrong. Somehow I am not installing the correct packages. I think this is the issue because I am popping an error on not having nbdev. So then I add !pip install nbdev and then get another package error and so on until I get to TSDataLoader which can’t be !pip install
from fastai import * from fastai2.basics import * from fastseq.all import * from fastseq.nbeats.model import * from fastseq.nbeats.learner import * from fastseq.nbeats.callbacks import *
do you have any idea why the training/validation loss is nan? Although I tried to fill in the missing values manually - without going through FillMissing, I still got NaN.
Can someone point me to an example or blog post for multivariate time series forecasting using fastai, wherein we can pass in other categorical column like day of week as well …
I looked in the fastseq example but that is a univariate example. I have 2 months of data and I need to predict for next fifteen days.
You might check out Amazon Labs’ time series forecasting repo called GluonTS .
GluonTS uses Amazon MXNet (instead of Pytorch or TensorFlow). They implemented many state-of-the-art architectures ( DeepFactor, DeepAR, DeepState, GP Forecaster, GP Var, LST Net, N-BEATS, NPTS, Prophet, R Forecast, seq2seq, Simple FeedForward, Transformer, Trivial, and WaveNet). Many of them (DeepFactor, DeepAR, DeepState) also use categorical data (covariate variables) and use probabilistic forecasting