TimeSeries

This is thread to discuss about porting timeseries algorithms to fastai. We currently have a very polished lib to work with timeseries tsai actively developed by @oguiza et al.
This lib follows fastai style of coding and is based on nbdev. Please help us develop it!

  • Ignacio has been currently working on self-supervised learning to get pretrained models on unlabeled data.

Resources:

This thread was originally created to port fastai V1 timeseries algos to V2. in 2021 fastai V2 is the standard and realeased everythwere.

25 Likes

Great @tcapelle! I’ve turned your post into a wiki. Perhaps you can add a link to the time series study group thread in it, along with links to any other useful resources?

1 Like

I think you should start experimenting with a time series notebook that works then gradually refactor. And if then another notebook is needed with more basic functionality, it can be added at that time.

3 Likes

Good luck with this! We are working on an audio module and Ignacio has already reached out to us as we will probably have a lot of overlap in what we’re working on. Our approach has been to make two notebooks, audio_core and audio_augment.

We agreed to all do our own implementations so that we could work through the complexity of v2 and all understand the various pieces better, instead of just having the most advanced person building the base, and then the rest of us having gaps in understanding. We’ve also cheated a bit and taken bits and pieces from each others code. I used both @scart97 and @hiromi’s code in some spots, but tried to vary my approach as much as possible. Here are some examples:

@scart97 - Lucas’s implementation

My Implementation (first nb doesnt display properly, not sure why)

These are just us playing around. We don’t have any end-to-end training examples yet and I’m sure the final version will change a lot, but hopefully this can help get you guys started. The plan is then to discuss what we all tried, what worked and what didn’t, and hopefully be able to come to consensus on a decent base design, build out the features on top, and then once we see the big picture, come back and iteratively refactor the base stuff until it feels right. Hope this helps, feel free to reach out if there’s anything else we can do to help get time series off the ground!

7 Likes

Wow, nice work!

This is so smart. Normally people try to split work up too much, which doesn’t really save time in my experience, and leads to a less good result than everyone doing there best on the whole project and then combining the best bits of each and refactoring at the end.

5 Likes

I just went through @oguiza’s intro notebook and holy crap is it good. Well documented and everything worked perfectly out of the box. Would highly recommend for anyone looking to get started.

4 Likes

Yep, it is

Thanks a lot for your comment, @GiantSquid! :grinning: It’s very encouraging!
I’m working on some additional time series functionality, and will continue to create notebooks in the next few days/ weeks.
I just hope this will some day (not far) all this will be ported to v2!

3 Likes

Awesome! Do you have anything in the works for regression? I’d be happy to pitch in there as I’m currently working on a regression problem.

I don’t right now, but TS Regression should be relatively easy to build as it uses the same models as TSC, just with n classes set to 1. Forecasting (more than 1 step into the future) is different as you need different models.
Do you know any TS Regression dataset? It’d be good to find at least one as well as its state-of-the-art performance, so we can try to match/beat.

As a trial dataset for regression I was planning to use NBA box score stats (e.g. rebounds) and the state-of-the-art I’m trying to beat is projections for daily fantasy sports sites.

I can confirm that any of the proposed architectures works for regression as I am using InceptionTime for regression myself. Just set n_out=1.

1 Like

Fantastic, thank you!

I have recently been playing with MixedItemLIst to work with Tabular + Tensor data (like a time series, but not a TS).
And I think, that maybe it is interesting to have a base class like TensorList, where one can put any type of tensory data (from numpy, pandas, torch, …) and quickly build a model on top. Of corse, TimeSeries is an special case of this.
In V1 I would build:

class TensorItem(ItemBase):
    .
    .
    .
class TensorItemList(ItemList):
    .
    .
    .
    def get(self, i):
        item = super().get(i)
        return TensorItem(item)

What would be the right approach in V2 to do this?
PS: Maybe ItemBase is good enough already…

1 Like

Have anyone in this group tried video classification ? I’m thinking about firstly extracting features of each frame (by using Resnet for example), then stacking these features to create an 2D array. Then training again the time series image to classify the video.

This is what I can imagine. However, the features of each images is not good for the purpose, so maybe the technique can not have a good result.

I’m trying to use InceptionTime for regression and it seems I’m getting something wrong. I’m working from Ignacio’s intro notebook. The changes I’ve made are

  1. loss_func = LabelSmoothingCrossEntropy() -> loss_func = MSELossFlat()
  2. model = arch(db.features, db.c, **arch_kwargs).to(device) -> model = arch(db.features, 1, **arch_kwargs).to(device)

I think 2. above is the n_out you were referring to, but I’m not sure. When I try to create the model I’m getting an error ‘CUDA error: device-side assert triggered’ which I think is indicative of a mismatch between number of classes and number of output units, so it seems there’s something I haven’t adjusted properly.

If you can spot what I’ve done wrong, or let me know the changes you made to get regression working, I would really appreciate it!

1 Like

Its difficult without a complete example, maybe you are constructing your databunch incorrectly, who knows.
Be sure to pass label_cls=FloatList to the labeller.
Another nice trick, is to always check size, before feeding your NN.

x, y = db.one_batch()
x.shape, y.shape

I’d like to suggest we continue this conversation in the Time Series/ Seq study group thread, since this one is more focused on v2.

I’m very excited to share with you a new module called timeseries for fastai v2. It’s a work in progress. This timeseries project is inspired by the fantastic work of @oguiza. It supports the InceptionTime model (from @hfawaz). I combined @tcapelle (thank you for sharing) and Ignacio InceptionTime implementations. I will gradually add other NN architectures.

To build timeseries, I chose another approach by creating an API that is similar to fastai2.vision. In a way, a timeserie is similar to a 1D image. I also created a new class TSData to download and store timeseries datasets. This class avoid both handling individual files (files per channel) and creating intermediary files (csv files). I will create another post where I will explain in more detail what I did (like why I abandoned the idea of using Tabular data, and created TensorTS (similar to TensorImage))

I would like to thank @jeremy and @sgugger for creating such a powerful and concise library. Thanks to fastai v2, timeseries package is really both compact and fast.

This extension mimics the unified fastai v2 APIs used for vision, text, and tabular. For those familiar with fastai2.vision, they will feel at home using timeseries. It uses Datasets, DataBlock, and a newly introduced TSDataLoaders and TensorTS.

timeseries presently reads arff files (See TSData class), and soon ts files will also be supported. Each timeseries notebook also includes a set of tests (more will be added in the future). Here below is an example of an end-to-end training using TSDataLoaders.

from timeseries.all import *
path = unzip_data(URLs_TS.NATOPS)
fnames = [path/"NATOPS_TRAIN.arff", path/"NATOPS_TEST.arff"]
# Normalize at batch time
batch_tfms = [Normalize(scale_subtype='per_sample_per_channel', scale_range=(0, 1))]
dls = TSDataLoaders.from_files(fnames=fnames, batch_tfms=batch_tfms)
dls.show_batch(max_n=9, chs=range(0,12,3)) # chs : list of chosen channels to display

c_in = get_n_channels(dls.train); c_out = dls.c
model = inception_time(c_in, c_out).to(device=default_device())
opt_func = partial(Adam, lr=3e-3, wd=0.01); loss_func = LabelSmoothingCrossEntropy()
learn = Learner(dls, model, opt_func=opt_func, loss_func=loss_func, metrics=accuracy)
epochs=30; lr_max=1e-3; pct_start=.7; moms=(0.95,0.85,0.95); wd=1e-2
learn.fit_one_cycle(epochs, lr_max=lr_max, pct_start=pct_start, moms=moms, wd=wd)
learn.show_results(max_n=9, chs=range(0,12,3))

In the index.ipynb notebook, I show 4 differents methods to create dataloaders (using Datasets, DataBlock, and TSDataLoaders: that’s just 3! => the guy doesn’t know how to count, be aware :wink:). I think the best way to quickly explore this extension is to run it, in Google Colab, either the full detailed index.ipynb notebook or the Colab_timeseries_Tutorial. The latter shows a minimal example using TSDataloaders (similar to the code snippet here above). It’s better to turn on GPU on Colab. In Windows, it’s automatically detected. It’s pretty fast even with CPU on.

I tested timeseries in Windows, Linux (WSL), and Google Colab using fastai2 0.0.11 and 0.0.12 and fastcore 0.1.13 and 0.1.14.

Please feel free to reach out, and to contribute. Hopefully, we will build the best deep learning module for timeseries classification and regression for fastai v2.

Looking forward for your feedback

17 Likes