Time series/ sequential data study group

@oguiza I implemented the Inception module today, it looks like this:

class InceptionModule(nn.Module):
    def __init__(self, ni, use_bottleneck=True, kss=[41, 21, 11], bottleneck_size=32, nb_filters=32, stride=1):
        super().__init__()
        if use_bottleneck:
            self.conv0 = nn.Conv1d(ni, bottleneck_size, 1, bias=False)
        else:
            self.conv0 = noop
        self.conv1 = conv(bottleneck_size, nb_filters, kss[0])
        self.conv2 = conv(bottleneck_size, nb_filters, kss[1])
        self.conv3 = conv(bottleneck_size, nb_filters, kss[2])
        self.conv_bottle = nn.Sequential(nn.MaxPool1d(3, stride, padding=1), 
                                         nn.Conv1d(bottleneck_size, nb_filters, 1, bias=False))
        self.bn_relu = nn.Sequential(nn.BatchNorm1d(4*nb_filters), 
                                     nn.ReLU())
    def forward(self, x):
        x = self.conv0(x)
        return self.bn_relu(torch.cat([self.conv1(x), self.conv2(x), self.conv3(x), self.conv_bottle(x)], dim=1))

and to create the network:

def create_inception(ni, nout, kss=[41, 21, 11], stride=1, depth=6, bottleneck_size=32, nb_filters=32,head=True):
    layers = [InceptionModule(ni, kss=kss, use_bottleneck=False, stride=stride), MergeLayer(), nn.ReLU()]
    layers += (depth-1)*[InceptionModule(4*nb_filters, kss=kss, bottleneck_size=bottleneck_size, stride=stride), MergeLayer(), nn.ReLU()]
    head = [AdaptiveConcatPool1d(), Flatten(), nn.Linear(8*nb_filters, nout)] if head else []
    return  SequentialEx(*layers, *head)

I think it can be simplified a bit. @hfawaz can you check if it is correct? From my initial testings, it is not training that well. The 40 epochs needed for resnet almost don’t do anything to the InceptionTime, probably I have a bug somewhere

Nice that was fast!
Not quite sure, is there an output of model.summary() similar to keras ?

ni=1, bottleneck=32, nb_filters=32

InceptionModule(
  (conv1): Conv1d(1, 32, kernel_size=(41,), stride=(1,), padding=(20,), bias=False)
  (conv2): Conv1d(1, 32, kernel_size=(21,), stride=(1,), padding=(10,), bias=False)
  (conv3): Conv1d(1, 32, kernel_size=(11,), stride=(1,), padding=(5,), bias=False)
  (conv_bottle): Sequential(
    (0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
    (1): Conv1d(1, 32, kernel_size=(1,), stride=(1,), bias=False)
  )
  (bn_relu): Sequential(
    (0): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU()
  )
)

This would be the 1st layer for reading a 1 channel TS. The problem with this display method is that you don’t see that the 3 convs+ the conv_bottle are stacked together, you could guess this by the batchnorm(128) layer that comes afterwards.

I guess here you are applying a bottleneck operation for the first layer. You can see here that I skip it for the first layer explicitly.

Thanks, I will change that. Would you mind checking here if I got it right?

TimeseriesAI

For those of you interested in the area of Time Series Classification, I’ve created a new repo called “Practical Deep Learning for Time Series” based on the fastai library.
It’s based on an idea I’ve been developing for quite some time. What I plan to do is to share a lot of code that I’ve created over the last few months, as well as some notebooks to demo how that code can be used. You will see that everything is focused on Time Series (Classification and Regression in particular).
The first commit of this repo contains the following:

  • Fastai time series library called fast_timeseries. It contains lots of things I’ll be demoing in notebooks in the next few weeks. In the first one we’ll make use of custom TSItem, TSItemLists, TSDataBunch, etc. You’ll see that it makes the use of time series in fastai really easy.

  • I’ve also included a pytorch model library called torchtimeseries.models. It contains some of the state-of-the-art models for time series classification (based on raw data). I’ve included FCN, ResNet, ResCNN and InceptionTime. I have other models, but I believe these work really well in small/ medium datasets. I’ll add more models in the near future.

  • I’ve also created a first notebook (Intro to Time Series Classification) to demo how to integrate all this in a simple way, to that you may be able to create a state-of-the-art models in just a few minutes.

In future notebooks, I’ll try to explain how you can start using more advanced initialization schemes, data augmentation for time series, visualization techniques, and many other topics related to Time Series.

I’d love to receive some feeback, expecially if there’s anything that doesn’t work as expected, or is not clear, or is missing.

10 Likes

Thanks so much for this. I look forward to getting stuck in! It’s work like this, willingly shared, that makes the fast.ai community such an amazing place!

1 Like

@oguiza thank you very much for a very clear and concise notebook to follow for time-series. I’ve been iffy about getting my feet wet with it but your notebook has made things very clear for me. Thanks :slight_smile:

1 Like

Thanks so much @AnthonyHolmes! I’ve learned so much Jeremy, Rachel, and the great fastai community that I wanted to give something back.

Excellent! You have helped me understand so many things, that I’m very glad you found the repo clear and useful. I value your opinion a lot! Thanks for sharing!

1 Like

Great work, thanks for this very fast reactivity ! I believe both implementations @tcapelle and @oguiza should achieve almost the same results.
Is anyone willing/planning to run the fastai implementation on the whole 128 archive ?

BTW I updated today the InceptionTime repository which contains now the results for the 128 UCR datasets as well as the multivariate ones.

Has anyone tried using the Mish activation for time series yet instead of ReLU? (out of curiosity, I want to play with it myself later in the week but I cannot at the moment).

Now you understand why this is called fast.ai! :wink:
No merit fro my side. I developed the architecture a couple of weeks ago and have been using on my datasets. @tcapelle has really reacted very quickly!
I have not compared the implementations. I’ll take a look at them tomorrow. I’ll let you both know if I have any questions.
As to testing the fastai implementation, I’d love to but don’t have the time or resources to do it. I just have a single, cloud GPU. So if you want to go ahead and run the test, I’ll be more than happy to assist in any way I can, but won’t be able to run long tests. I think it’d be good to benchmark against your Tensorflow implementation, as a starting point although there are a few approaches that could further improve the result.

2 Likes

No, not yet. I’ve already tested the Ranger + Flat + cosannealing framework and it seems to work better than one_cycle. I have a few ideas on how to tweak the InceptionTime arch that I’m planning to test over the next few days, but if you have the time before me just go for it!

1 Like

Impressive work Ignacio!
My RTX is running the full dataset right now @hfawaz , I will post results tomorrow. I also ran an hybrid incept-resnet for the fun.
@oguiza you really went full fastai implementing the TimeSeriesList, nice work! I may finally get rid of the TensorDataset.
I really think we should test LBFGS as for UCE it fits in memory. It is a better method tahn any SGD as it has second derivatives approximations.

@hfawaz, I wanted to ask you something about the UCR univariate and multivariate datasets.
Since many of us do not have the framework to test our ideas against 100+ datasets, are there any subset of them that we can use in our tests to try to gauge the potential value of our ideas.
In computer vision, there are some large datasets that are used for benchmarking (Imagenet), but Jeremy built a subset of that (Imagennette ~1%) that is useful to quickly test ideas.
What do you think?

Edit: I guess, another way to put it is what type of result would really catch your attention?

Thanks! Yes I wanted to experiment with it and learned quite a bit building it.

That’s great @tcapelle! You are fast, Thomas!
Out of curiosity, are you using your model with one_cycle?
I compared one_cycle to the set up I included in the nb (Ranger + FlatCosAnneling), and one_cycle is worse on 2-3 datasets.

@oguiza, with Mish and a proper learning rate (4e-3) I was able to match your 88% (epoch 72ish) at epoch 26! and your 89% at epoch 64, I know it is just one run but seems promising :slight_smile:

Your finish: 88.3%
Mish: 89.73

(You’re a better judge than I am with this dataset, is that better significantly?)

notebook

(Had a bit of time between classes :wink: )

Hey, you are also very quick!! :rofl:
I’m impressed!!
Yes, I think so. I’ve run many tests on the ChlorineConcentration dataset (i just picked it randomly), including some ensembles (that improved the result), but not so much as Mish!
So it looks pretty good to me!
I think it’s someting really worth exploring in more depth with other datasets.
My impression is that we may be able to come up with a few ideas that jointly used could significantly improve the outcome as it happened with Imagennette.
Well done Zachary!!
PS: I had the feeling that you would not wait until “later in the week”… :wink:

1 Like

I agree. I’ve seen my own interesting behavior with Mish with tabular data, time series specifically, and so it doesn’t surprise me that this is the result.

General T/S question, if I have an input with 12 variables over a small time window of 5ms, with 1ms intervals, so I have an input of 60 variables, for my dataframe is it as simple as putting those 60 variables on one row, and then I can use your API? And could you explain the difference between feat and target? Or what feat is?

Thanks! :slight_smile:

(By the way, reran it and got 90%, definitely worth running ~5 times and reporting a CI, I’ll edit this back with it :wink: ) </s)

5 Runs:
mean: 0.9002084
std: 0.0024

1 Like

So is this a time series with just 5 time steps? If so, you won’t be able to use InceptionTime as it has large kernels. I’ve never used a model with such short time series.

But to answer your question in general, in the end you need to need a 3D tensor with shape = batch size, features, time steps.
In your example, it’d be bs, 12 features, 5 time steps.

Target is used to identify column in the pd df with the dependent variable. Since pandas dataframe are 2D and we need to create a 3D tensor, we need to indicate which column will be used to identify all samples for each feature, so that we can then concatenate the 3D array. If you only have 1 feature, you can leave it as None, as there’s nothing to concatenate. Not sure if this is clear. If not, pls, let me know

1 Like