TimeSeries

This! we have @oguiza on board…
We should maybe organize a call to be able to talk/share ideas.
+1 for anomaly detection (not sure what do you mean).
I think the underlying solution is very important, because we need it to be fast, and fitting in memory is not always an option.
Visualizing multichannel timeseries is not always easy or necessary, but for univariate it is a must. Probably for univariate the pandas DataFrame is the way to go, but for multivariate maybe it should be something else.
I have a current project where I will need to assemble Images plus Timeseries, and I was hoping to build on V2.

1 Like

Welcome on board @oguiza!


Awesome progress. You also added a lot of documentation, nicely done.

I think it makes a lot of sense to collaborate and try to make a structure for the 4 problems. I feel like anomaly detection is the same as Classification, but feel free to correct me on that one. The 4 problems I come to are:

  • Classification,
  • Regression
  • Forecast
  • Imputation (interpolation)

The first 3 are clear I think the last one becomes clear with this picture:

So this is filling in the blanks. This can be nicely done with a nbeats style model.

This makes sense. I’m not sure I’m ready for it, I would like to go a little deeper into the libary @farid build (could you also open the the docs on a github page?). Than I can ask better questions during the call.

Very interesting to hear you made the same call. Would love to talk about that. Although I think (already discussed this with @tcapelle ) it can be done if we store the timeserie in a cell of the dataframe as a Series or just as an array. Like it is done below in the colums ts_1, pred respectivly:

df = pd.DataFrame(data={'pred':[np.arange(10.),
                           np.arange(12.)],
                   'ts_0':[np.ones(10)[None,:],
                           np.ones(12)[None,:]],
                   'ts_1':[pd.Series(np.arange(1,11)+np.random.randn(10)),
                           pd.Series(np.arange(1,13)+np.random.randn(12))],
                   'var_0':[0.,1.],
                   'con_0':[0,1]})
df

I think either ts_1 or pred is the way forward. So a column for every feature/timeserie. Where every ts has of the same row must have the same length. This way you can do multivariate for different instances (rows).

This way also has the potential to unite it with the current tabular module. However, the forecasting Dataloader becomes quite complicated so that part will need to be separate. But things like transforms and proccs could/should be shared I think.

1 Like

@sachinruk build a pytorch/fastai implementation of facebooks prophet maybe interesting to keep in the loop. https://github.com/sachinruk/ProFeTorch

1 Like

I’m ok for a call. I think it’d be helpful to discuss scope and potential approaches.

What I mean is that this a task where you use unsupervised learning techniques to identify outliers in your data. I know it’s a task some people are interested in, but it’s not my case. In my case I’m mostly interested in classification/ regression.

Not sure what you mean. It’d be interesting to discuss during the call.

1 Like

@takotab, awsome job for being able to develop a fastai v2 version of the N-BEATS article.

In my repo, I included documentation that I generated using nbdev_buid_docs but I have some issues that prevent generating the css, js, images folders. I will post the issue in the nbev thread later.

A couple month ago, I played with Amazon Labs’ time series forecasting repo called GluonTS (maybe you are already familiar with. If it’s the case, please ignore the rest of the post). As everybody might guess, their implementation uses Amazon MXNet. I must say it’s quiet impressive what they have achieved. They implemented almost all the architecture that you can think of ( DeepFactor, DeepAR, DeepState, GP Forecaster, GP Var, LST Net, N-BEATS, NPTS, Prophet, R Forecast, seq2seq, Simple FeedForward, Transformer, Trivial, and WaveNet). I think it’s worth exploring it for those interested in time series forecasting.

They use their models for both univariate and multivariate timeseries (with millions of timeseries as they describe in some of their papers). They also leverage covariate information like day of week, month, year, or any other time series’ related information.

GLuonTS Tutorials

6 Likes

Sounds like we might want to aim for a port to fastai2 - is that what you’re thinking, @farid?

2 Likes

@jeremy, I think it would be fantastic to have something like that in fastai2. That was my initial thought when I discovered GluonTS but it’s a lot of work. I think it would be doable if we pool our efforts. We can choose the most efficient architectures and gradually port them to fastai2. @takotab already implemented the N-BEATS model to fastai v2. So that’s already a good start :slightly_smiling_face:

1 Like

@farid I believe you’re part of the fast.ai Live group - this might make for a great project during the course…

3 Likes

@jeremy, yes I am, and yes indeed that would be a great project to implement during the course, and to share with the fastai community.

2 Likes

Thank you for sharing GluonTS, didn’t know about it! Is the N-beats model implemented there? I cannot find it.

1 Like

You will find the N-BEATS model here. You will notice there are always 2 files (_network.py and _estimator.py) for almost all the models. The following are the one from N-BEATS:

_network.py used to train the model
_estimator.py used for forecasting a sequence

In the N-BEATS case, you also have _ensemble.py, and this is because it uses an ensembling technique.

1 Like

Got it, thank you!

1 Like

It rang a bell somewhere, but never knew they where this far. But looks very extensive. Would be awesome to work on, quite the challenge. I’m certainly willing collaborate to port to fastai2.

Maybe start with n-beats (1 or 2 more) and wait for the results of M5 to spot the best models, before adding them all.

2 Likes

Sounds great!

I just found a nice python matrix profile implementation.

Its a time serie preprocessing technique that might be interest to people doing :

  • pattern/motif (approximately repeated subsequences within a longer time series) discovery
  • anomaly/novelty (discord) discovery
  • shapelet discovery
  • semantic segmentation
  • density estimation
  • time series chains (temporally ordered set of subsequence patterns)
  • and more …
6 Likes

I pushed a univariate time series classification example notebook. It uses the Yoga Univariate Dataset.

I also added all the UCR univariate datasets urls in timeseries.data. Now, we have all the UCR univariate and multivariate datasets urls. All the univariate urls have the UNI_ prefix which eases accessibility.

Example:

path = unzip_data(URLs_TS.UNI_YOGA)
4 Likes

The documentation of the timeseries library is now live. It is pretty extensive and easier to navigate thanks to the nbdev built-in features.

@takotab, I think you expressed interest in checking out the doc.

8 Likes

Mimicking fastai2.vision, I created ts_learner with the following defaults:

model = inception_time
opt_func = Ranger
loss_func = LabelSmoothingCrossEntropy()
Metrics = accuracy

So now, we can train any time series dataset end-to-end with 4 lines of code (Raise your hand if this looks familiar to you :wink:):

path = unzip_data(URLs_TS.NATOPS)
dls = TSDataLoaders.from_files(bs=32,fnames=[path/'NATOPS_TRAIN.arff', path/'NATOPS_TEST.arff'], batch_tfms=[Normalize()]) 
learn = ts_learner(dls)
learn.fit_one_cycle(25, lr_max=1e-3) 

While checking out the accuracy for the NATOPS dataset, I was amazed by the fact of using just fastai2 default settings you can achieve above 97% accuracy (even 98,6% now and then) in only 18 epochs. Awesome! … and this is just another normal day using :muscle: fastai2

I added a notebook called training_using_default_settings where you can play with it.

If you like the timeseries library and/or find it useful, please star it on github, and share it. Any constructive feedback is very welcome

9 Likes

Super!
How do we proceed? Let’s make an unified repo to work together.
I would think that maybe is better to fork fastai V2 and develop a branch for timeseries. Doing so, would enable us to propose a merge to the master in the future. We can make issues and traack the progress of various features. This will also enable to request advice from the pros on fastai on how to do the thing in the most fastaish possible way.
I made a fork with a branch called timeseries here both of you are admins of the repo., notebooks starting from 100 are my simple classification example. Can you move your core funcionalities here @takotab and @farid?

1 Like