TimeSeries

I have recently been playing with MixedItemLIst to work with Tabular + Tensor data (like a time series, but not a TS).
And I think, that maybe it is interesting to have a base class like TensorList, where one can put any type of tensory data (from numpy, pandas, torch, …) and quickly build a model on top. Of corse, TimeSeries is an special case of this.
In V1 I would build:

class TensorItem(ItemBase):
    .
    .
    .
class TensorItemList(ItemList):
    .
    .
    .
    def get(self, i):
        item = super().get(i)
        return TensorItem(item)

What would be the right approach in V2 to do this?
PS: Maybe ItemBase is good enough already…

1 Like

Have anyone in this group tried video classification ? I’m thinking about firstly extracting features of each frame (by using Resnet for example), then stacking these features to create an 2D array. Then training again the time series image to classify the video.

This is what I can imagine. However, the features of each images is not good for the purpose, so maybe the technique can not have a good result.

I’m trying to use InceptionTime for regression and it seems I’m getting something wrong. I’m working from Ignacio’s intro notebook. The changes I’ve made are

  1. loss_func = LabelSmoothingCrossEntropy() -> loss_func = MSELossFlat()
  2. model = arch(db.features, db.c, **arch_kwargs).to(device) -> model = arch(db.features, 1, **arch_kwargs).to(device)

I think 2. above is the n_out you were referring to, but I’m not sure. When I try to create the model I’m getting an error ‘CUDA error: device-side assert triggered’ which I think is indicative of a mismatch between number of classes and number of output units, so it seems there’s something I haven’t adjusted properly.

If you can spot what I’ve done wrong, or let me know the changes you made to get regression working, I would really appreciate it!

1 Like

Its difficult without a complete example, maybe you are constructing your databunch incorrectly, who knows.
Be sure to pass label_cls=FloatList to the labeller.
Another nice trick, is to always check size, before feeding your NN.

x, y = db.one_batch()
x.shape, y.shape

I’d like to suggest we continue this conversation in the Time Series/ Seq study group thread, since this one is more focused on v2.

I’m very excited to share with you a new module called timeseries for fastai v2. It’s a work in progress. This timeseries project is inspired by the fantastic work of @oguiza. It supports the InceptionTime model (from @hfawaz). I combined @tcapelle (thank you for sharing) and Ignacio InceptionTime implementations. I will gradually add other NN architectures.

To build timeseries, I chose another approach by creating an API that is similar to fastai2.vision. In a way, a timeserie is similar to a 1D image. I also created a new class TSData to download and store timeseries datasets. This class avoid both handling individual files (files per channel) and creating intermediary files (csv files). I will create another post where I will explain in more detail what I did (like why I abandoned the idea of using Tabular data, and created TensorTS (similar to TensorImage))

I would like to thank @jeremy and @sgugger for creating such a powerful and concise library. Thanks to fastai v2, timeseries package is really both compact and fast.

This extension mimics the unified fastai v2 APIs used for vision, text, and tabular. For those familiar with fastai2.vision, they will feel at home using timeseries. It uses Datasets, DataBlock, and a newly introduced TSDataLoaders and TensorTS.

timeseries presently reads arff files (See TSData class), and soon ts files will also be supported. Each timeseries notebook also includes a set of tests (more will be added in the future). Here below is an example of an end-to-end training using TSDataLoaders.

from timeseries.all import *
path = unzip_data(URLs_TS.NATOPS)
fnames = [path/"NATOPS_TRAIN.arff", path/"NATOPS_TEST.arff"]
# Normalize at batch time
batch_tfms = [Normalize(scale_subtype='per_sample_per_channel', scale_range=(0, 1))]
dls = TSDataLoaders.from_files(fnames=fnames, batch_tfms=batch_tfms)
dls.show_batch(max_n=9, chs=range(0,12,3)) # chs : list of chosen channels to display

c_in = get_n_channels(dls.train); c_out = dls.c
model = inception_time(c_in, c_out).to(device=default_device())
opt_func = partial(Adam, lr=3e-3, wd=0.01); loss_func = LabelSmoothingCrossEntropy()
learn = Learner(dls, model, opt_func=opt_func, loss_func=loss_func, metrics=accuracy)
epochs=30; lr_max=1e-3; pct_start=.7; moms=(0.95,0.85,0.95); wd=1e-2
learn.fit_one_cycle(epochs, lr_max=lr_max, pct_start=pct_start, moms=moms, wd=wd)
learn.show_results(max_n=9, chs=range(0,12,3))

In the index.ipynb notebook, I show 4 differents methods to create dataloaders (using Datasets, DataBlock, and TSDataLoaders: that’s just 3! => the guy doesn’t know how to count, be aware :wink:). I think the best way to quickly explore this extension is to run it, in Google Colab, either the full detailed index.ipynb notebook or the Colab_timeseries_Tutorial. The latter shows a minimal example using TSDataloaders (similar to the code snippet here above). It’s better to turn on GPU on Colab. In Windows, it’s automatically detected. It’s pretty fast even with CPU on.

I tested timeseries in Windows, Linux (WSL), and Google Colab using fastai2 0.0.11 and 0.0.12 and fastcore 0.1.13 and 0.1.14.

Please feel free to reach out, and to contribute. Hopefully, we will build the best deep learning module for timeseries classification and regression for fastai v2.

Looking forward for your feedback

17 Likes

Thank you for this library! I’ll try it next week.

1 Like

Now added to the unofficial fastai extensions repository :slight_smile:

3 Likes

@farid @oguiza I’d love to see a blog post giving a little guided your of this module. Perhaps using fastpages or some other notebook blogging thing, for instance… :slight_smile:

If you do that, please at-mention me so I don’t miss it.

5 Likes

Yes indeed. A couple of days ago, I cloned fastpages template for this purpose. I’m planning to write a couple of blog posts in a top-down style. There will be an introduction for those who are not familiar with timeseries, and gradually followed by more in-depth posts. The goal is to show the users, with different background, how to easily process their own data using the timeseries module. Let me know what you think about that @jeremy.

Meanwhile, for those who are interested, there is an illustrated example, in the README, that show how to process NATOPS dataset. There is a brief description of the dataset, and couple of images that show what are the timeseries and the labels (classes) of the dataset. Unfortunately, I didn’t have time to add more prose (in my todo list :slightly_smiling_face:).

Timeseries

The data is generated by sensors on the hands, elbows, wrists and thumbs. The data are the x,y,z coordinates for each of the eight locations (24 channels in total).

Classes (Labels)

The six classes are separate actions, with the following meaning:
1: I have command 2: All clear 3: Not clear 4: Spread wings 5: Fold wings 6: Lock wings

5 Likes

Sounds amazing!

3 Likes

This is great!
@takotab and I have been working on something similar, he has been working with a package based on V2 for ts forecasting and I have been porting my repo timeseries_fastai to V2 as an excercise to get familiar with V2.
Probably we should mutualize all this awesome work to get something smooth and concise.
What I was discussing the other day with @takotab is that there are mainly 3 taks on TS:

  1. Classification of independent timeseries, as with images. (vision api): Input: TS, output: CategoryLabel, What I was doing.
  2. Regression of indepedent timeseries, as with images: Input: TS, output: Float or Tensor.
  3. Forecasting/Generation: Input TS, output TS (can be future, past or equivalent). This is similar to Image to Image, or Image to mask tasks.

For this 3 tasks, the underlying timeseries Item should be the same, but the preprocessing to construct the pairs x,y for the supervised learning, are different.
After reading more and more fastaiV2 I think that we should build something similar to TabularPandas structure, to store timeseries as an underlying pandas dataframe. For multichannel timeseries it may be challenging.

2 Likes

It sounds great that both you are all already collaborating.

I need to go back to my initial notebooks that I created a couple of months ago in order to recollect the different challenges that I faced when I was using the TabularPandas for the multivariate timeseries, as that was my main focus. My approach was if I can solve for the multivariate timeseries case, the univariate case will be automatically solved.

We can explore on how to find a common ground for both Classification/Regression and Forecasting timeseries. I’m also also interested in Probabilistic Forecasting. I gathered some information about that topic, and that was a couple a month ago. As soon as I have some time, I can share that with those who are interested in.

1 Like

Thanks for sharing all this @farid, @tcapelle, @takotab ! It’s a great start for TimeSeries in v2.

Starting this month, I’ll have more time available, and would like to start to port functionality I’ve built in the past to v2. This includes timeseriesAI as well as other fastai extensions, mainly focused on semi-supervised/ self-supervised learning. I have to do it since it’s what I use in my daily work.

I think it’d be great if we join efforts and develop the TimeSeries for fastai v2 as a team instead of working in parallel.

I’d be more than happy to share all the code I have (core functionality, augmentations, models, etc), as well as ideas and learnings from all the time I’ve been using timeseriesAI (for example how to work with larger than memory datasets).

If you agree, I think we should start discussing the scope of the project, and how to organize the work.

For example, in scope we should decide this like:

  • Tasks included: agree with the ones @tcapelle has mentioned. Maybe include anomaly detection(?). In my case, I’m very focused on classification/ regression.

  • Type of time series:

    • Time series: univariate, multivariate
    • Sequential data
    • Space-time series
    • Temporal sequences (like Rossman)

    Personally I’d start with univariate and multivariate time series/ sequential data, and maybe expand in the future.

  • Which approaches should we use?

    • Raw data
    • Raw to image data?

Please, let me know what you think, and if agree we may raise this in the Time Series thread to check if anybody else is interested.

2 Likes

I think you should proceed with the blog post you are creating and publish it @fared since it seems you’ve made substantial progress. I don’t have any problem with that :slightly_smiling_face:

This! we have @oguiza on board…
We should maybe organize a call to be able to talk/share ideas.
+1 for anomaly detection (not sure what do you mean).
I think the underlying solution is very important, because we need it to be fast, and fitting in memory is not always an option.
Visualizing multichannel timeseries is not always easy or necessary, but for univariate it is a must. Probably for univariate the pandas DataFrame is the way to go, but for multivariate maybe it should be something else.
I have a current project where I will need to assemble Images plus Timeseries, and I was hoping to build on V2.

1 Like

Welcome on board @oguiza!


Awesome progress. You also added a lot of documentation, nicely done.

I think it makes a lot of sense to collaborate and try to make a structure for the 4 problems. I feel like anomaly detection is the same as Classification, but feel free to correct me on that one. The 4 problems I come to are:

  • Classification,
  • Regression
  • Forecast
  • Imputation (interpolation)

The first 3 are clear I think the last one becomes clear with this picture:

So this is filling in the blanks. This can be nicely done with a nbeats style model.

This makes sense. I’m not sure I’m ready for it, I would like to go a little deeper into the libary @farid build (could you also open the the docs on a github page?). Than I can ask better questions during the call.

Very interesting to hear you made the same call. Would love to talk about that. Although I think (already discussed this with @tcapelle ) it can be done if we store the timeserie in a cell of the dataframe as a Series or just as an array. Like it is done below in the colums ts_1, pred respectivly:

df = pd.DataFrame(data={'pred':[np.arange(10.),
                           np.arange(12.)],
                   'ts_0':[np.ones(10)[None,:],
                           np.ones(12)[None,:]],
                   'ts_1':[pd.Series(np.arange(1,11)+np.random.randn(10)),
                           pd.Series(np.arange(1,13)+np.random.randn(12))],
                   'var_0':[0.,1.],
                   'con_0':[0,1]})
df

I think either ts_1 or pred is the way forward. So a column for every feature/timeserie. Where every ts has of the same row must have the same length. This way you can do multivariate for different instances (rows).

This way also has the potential to unite it with the current tabular module. However, the forecasting Dataloader becomes quite complicated so that part will need to be separate. But things like transforms and proccs could/should be shared I think.

1 Like

@sachinruk build a pytorch/fastai implementation of facebooks prophet maybe interesting to keep in the loop. https://github.com/sachinruk/ProFeTorch

1 Like

I’m ok for a call. I think it’d be helpful to discuss scope and potential approaches.

What I mean is that this a task where you use unsupervised learning techniques to identify outliers in your data. I know it’s a task some people are interested in, but it’s not my case. In my case I’m mostly interested in classification/ regression.

Not sure what you mean. It’d be interesting to discuss during the call.

1 Like