Time series/ sequential data study group

Lol, yeah :

batch_size = min(x_train.shape[0]/10, 16)
1 Like

Hi @hfawaz,

Welcome to our study group! Itā€™s a priviledge to have a Time Series world-class researcher joining us!

I hope youā€™ll find the experience as useful and rewarding as I have. I can say that for me the fastai communityā€™s been the best learning and collaborative environment Iā€™ve found in the area of ML.

Iā€™d really like to thank you and the rest of the team for the quality of work you are producing and for openly share your code. I think youā€™re raising the standard of research in TS.

I also work in the area of Time Series Classification and Regression (not Forecasting), mainly with multivariate datasets.

I have a few comments on your previous post:

  • InceptionTime: I read your paper when it was public, found it super interesting, so I created a pytorch version. Iā€™ve been using it for a couple of weeks and results on my own datasets are better than with ResNet. So thanks a lot for developing it! Personally I think that the idea of using larger receptor fields goes in the right direction. Iā€™m building a Practical Time Series repo that Iā€™ll be able to share either today or tomorrow that contains all that is required to train TS models with fastai, as well as a collection of some of the state-of -the-art TS architectures (FCN, ResNet, ResCNN, InceptionTime, etc). Iā€™m currently investigating ways to improve performance of the InceptionTime network applying the fastai framework.
  • Imaging Time Series: Iā€™m with you and Jeremy that the encoding of TS seems like a waste of time, since all the information is contained in the raw data. However, Iā€™ve seen that in some datasets, imaging works really well, even if the dataset is tiny, as you can benefit from computer vision transfer learning. I have tried multiple encodings (Gramian, MTF, RecurrencePlots, Wavelets, etc) with mixed results. I believe that in the end raw input models should prevail, but itā€™s also true that our brain is far better identifying patterns based on charts that on numerical data.
  • Recurrent models: In all comparisons Iā€™ve made, Iā€™ve always found CNN models far superior to RNNs, and they are much faster to train. I gave up on RNNs some time ago.
  • Regression: Iā€™m also working in this area, but my datasets are proprietary, so I cannot share them. Sorry about that!

Just to give you an idea, here are few areas Iā€™m currently testing in the area of multivariate TS (everything using fastai):

  • Impact of LSUV (and related) initialization
  • New optimizers (like Ranger, developed by some great fastai colleagues - thread)
  • New activation function (also developed by some great fastai colleagues - thread)
  • Data augmentation: cutout, mixup, cutmix,ā€¦
  • Semi-supervised learning: mixmatch, uda, s4l
  • Training: progressive resizing
  • Ensembles vs multi-branch models vs hybrids
  • New hybrid Time-Frequency models
  • Inception architecture tweaks: ā€™bag of tricksā€™
  • Visualization of activations

Iā€™ll post any significant insights I get during my experiments.

Iā€™m more than happy to discuss any of this with anybody whoā€™s interested. Iā€™ll also create notebooks to demonstrate this functionality.

3 Likes

@oguiza I am also very glad to be here, thanks for taking this great initiative and creating this study group!
I find it great to be able to discuss with everyone interested in such an important topic.
I will be eagerly waiting for your results and implementation of InceptionTime in fastai.

As for imaging time series, I think that for some datasets (and maybe most of them) adding domain knowledge into the design of an architecture is going to help improving the accuracy - which is the case for some datasets where imaging (frequency domain for example) is some kind of domain knowledge that helped in improving the accuracy.

I am also working on multivariate, semi-supervised, data augmentation, ensembling and some architecture tweaks. I will keep everyone up-to-date once I have something concrete to show.

Thanks again for all of this!

1 Like

@oguiza I implemented the Inception module today, it looks like this:

class InceptionModule(nn.Module):
    def __init__(self, ni, use_bottleneck=True, kss=[41, 21, 11], bottleneck_size=32, nb_filters=32, stride=1):
        super().__init__()
        if use_bottleneck:
            self.conv0 = nn.Conv1d(ni, bottleneck_size, 1, bias=False)
        else:
            self.conv0 = noop
        self.conv1 = conv(bottleneck_size, nb_filters, kss[0])
        self.conv2 = conv(bottleneck_size, nb_filters, kss[1])
        self.conv3 = conv(bottleneck_size, nb_filters, kss[2])
        self.conv_bottle = nn.Sequential(nn.MaxPool1d(3, stride, padding=1), 
                                         nn.Conv1d(bottleneck_size, nb_filters, 1, bias=False))
        self.bn_relu = nn.Sequential(nn.BatchNorm1d(4*nb_filters), 
                                     nn.ReLU())
    def forward(self, x):
        x = self.conv0(x)
        return self.bn_relu(torch.cat([self.conv1(x), self.conv2(x), self.conv3(x), self.conv_bottle(x)], dim=1))

and to create the network:

def create_inception(ni, nout, kss=[41, 21, 11], stride=1, depth=6, bottleneck_size=32, nb_filters=32,head=True):
    layers = [InceptionModule(ni, kss=kss, use_bottleneck=False, stride=stride), MergeLayer(), nn.ReLU()]
    layers += (depth-1)*[InceptionModule(4*nb_filters, kss=kss, bottleneck_size=bottleneck_size, stride=stride), MergeLayer(), nn.ReLU()]
    head = [AdaptiveConcatPool1d(), Flatten(), nn.Linear(8*nb_filters, nout)] if head else []
    return  SequentialEx(*layers, *head)

I think it can be simplified a bit. @hfawaz can you check if it is correct? From my initial testings, it is not training that well. The 40 epochs needed for resnet almost donā€™t do anything to the InceptionTime, probably I have a bug somewhere

Nice that was fast!
Not quite sure, is there an output of model.summary() similar to keras ?

ni=1, bottleneck=32, nb_filters=32

InceptionModule(
  (conv1): Conv1d(1, 32, kernel_size=(41,), stride=(1,), padding=(20,), bias=False)
  (conv2): Conv1d(1, 32, kernel_size=(21,), stride=(1,), padding=(10,), bias=False)
  (conv3): Conv1d(1, 32, kernel_size=(11,), stride=(1,), padding=(5,), bias=False)
  (conv_bottle): Sequential(
    (0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
    (1): Conv1d(1, 32, kernel_size=(1,), stride=(1,), bias=False)
  )
  (bn_relu): Sequential(
    (0): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU()
  )
)

This would be the 1st layer for reading a 1 channel TS. The problem with this display method is that you donā€™t see that the 3 convs+ the conv_bottle are stacked together, you could guess this by the batchnorm(128) layer that comes afterwards.

I guess here you are applying a bottleneck operation for the first layer. You can see here that I skip it for the first layer explicitly.

Thanks, I will change that. Would you mind checking here if I got it right?

TimeseriesAI

For those of you interested in the area of Time Series Classification, Iā€™ve created a new repo called ā€œPractical Deep Learning for Time Seriesā€ based on the fastai library.
Itā€™s based on an idea Iā€™ve been developing for quite some time. What I plan to do is to share a lot of code that Iā€™ve created over the last few months, as well as some notebooks to demo how that code can be used. You will see that everything is focused on Time Series (Classification and Regression in particular).
The first commit of this repo contains the following:

  • Fastai time series library called fast_timeseries. It contains lots of things Iā€™ll be demoing in notebooks in the next few weeks. In the first one weā€™ll make use of custom TSItem, TSItemLists, TSDataBunch, etc. Youā€™ll see that it makes the use of time series in fastai really easy.

  • Iā€™ve also included a pytorch model library called torchtimeseries.models. It contains some of the state-of-the-art models for time series classification (based on raw data). Iā€™ve included FCN, ResNet, ResCNN and InceptionTime. I have other models, but I believe these work really well in small/ medium datasets. Iā€™ll add more models in the near future.

  • Iā€™ve also created a first notebook (Intro to Time Series Classification) to demo how to integrate all this in a simple way, to that you may be able to create a state-of-the-art models in just a few minutes.

In future notebooks, Iā€™ll try to explain how you can start using more advanced initialization schemes, data augmentation for time series, visualization techniques, and many other topics related to Time Series.

Iā€™d love to receive some feeback, expecially if thereā€™s anything that doesnā€™t work as expected, or is not clear, or is missing.

10 Likes

Thanks so much for this. I look forward to getting stuck in! Itā€™s work like this, willingly shared, that makes the fast.ai community such an amazing place!

1 Like

@oguiza thank you very much for a very clear and concise notebook to follow for time-series. Iā€™ve been iffy about getting my feet wet with it but your notebook has made things very clear for me. Thanks :slight_smile:

1 Like

Thanks so much @AnthonyHolmes! Iā€™ve learned so much Jeremy, Rachel, and the great fastai community that I wanted to give something back.

Excellent! You have helped me understand so many things, that Iā€™m very glad you found the repo clear and useful. I value your opinion a lot! Thanks for sharing!

1 Like

Great work, thanks for this very fast reactivity ! I believe both implementations @tcapelle and @oguiza should achieve almost the same results.
Is anyone willing/planning to run the fastai implementation on the whole 128 archive ?

BTW I updated today the InceptionTime repository which contains now the results for the 128 UCR datasets as well as the multivariate ones.

Has anyone tried using the Mish activation for time series yet instead of ReLU? (out of curiosity, I want to play with it myself later in the week but I cannot at the moment).

Now you understand why this is called fast.ai! :wink:
No merit fro my side. I developed the architecture a couple of weeks ago and have been using on my datasets. @tcapelle has really reacted very quickly!
I have not compared the implementations. Iā€™ll take a look at them tomorrow. Iā€™ll let you both know if I have any questions.
As to testing the fastai implementation, Iā€™d love to but donā€™t have the time or resources to do it. I just have a single, cloud GPU. So if you want to go ahead and run the test, Iā€™ll be more than happy to assist in any way I can, but wonā€™t be able to run long tests. I think itā€™d be good to benchmark against your Tensorflow implementation, as a starting point although there are a few approaches that could further improve the result.

2 Likes

No, not yet. Iā€™ve already tested the Ranger + Flat + cosannealing framework and it seems to work better than one_cycle. I have a few ideas on how to tweak the InceptionTime arch that Iā€™m planning to test over the next few days, but if you have the time before me just go for it!

1 Like

Impressive work Ignacio!
My RTX is running the full dataset right now @hfawaz , I will post results tomorrow. I also ran an hybrid incept-resnet for the fun.
@oguiza you really went full fastai implementing the TimeSeriesList, nice work! I may finally get rid of the TensorDataset.
I really think we should test LBFGS as for UCE it fits in memory. It is a better method tahn any SGD as it has second derivatives approximations.

@hfawaz, I wanted to ask you something about the UCR univariate and multivariate datasets.
Since many of us do not have the framework to test our ideas against 100+ datasets, are there any subset of them that we can use in our tests to try to gauge the potential value of our ideas.
In computer vision, there are some large datasets that are used for benchmarking (Imagenet), but Jeremy built a subset of that (Imagennette ~1%) that is useful to quickly test ideas.
What do you think?

Edit: I guess, another way to put it is what type of result would really catch your attention?

Thanks! Yes I wanted to experiment with it and learned quite a bit building it.

Thatā€™s great @tcapelle! You are fast, Thomas!
Out of curiosity, are you using your model with one_cycle?
I compared one_cycle to the set up I included in the nb (Ranger + FlatCosAnneling), and one_cycle is worse on 2-3 datasets.

@oguiza, with Mish and a proper learning rate (4e-3) I was able to match your 88% (epoch 72ish) at epoch 26! and your 89% at epoch 64, I know it is just one run but seems promising :slight_smile:

Your finish: 88.3%
Mish: 89.73

(Youā€™re a better judge than I am with this dataset, is that better significantly?)

notebook

(Had a bit of time between classes :wink: )