Time series/ sequential data study group

vrodriguezf · October 3, 2019, 2:01pm

Makes sense! I will make mean imputation then. Thanks!

tcapelle · October 4, 2019, 10:42am

How would you proceed doing so?

Creating a new Timeseries core like notebook 40_tabular?
A more basic TensorData one?
Creating a DataBlock to read Timeseries?
I would like to start implementing this on V2, but don’t know where to start.

jeremy · October 4, 2019, 12:59pm

Best to ask these questions in the v2 thread you just created.

oguiza · October 4, 2019, 4:05pm

New notebook: Time Series Data Augmentation (cutout, mixup, cutmix)

As you know one of the best ways to improve performance (especially when a dataset is small) is to do some type of data augmentation.
In the last coupld of years some cutout, mixup and cutmix have appeared as very powerful ways to improve performance in computer vision problems.
I’ve created a notebook to show you how easily you can start using them. In many cases, performance can be improved. If you want to see how it works you can go to timeseriesAI and open notebook 03_new_Time_Series_data_augmentations.ipynb
I hope you find it useful in you time series problems.

tcapelle · October 4, 2019, 5:22pm

I saw the notebook earlier today, is awesome! Let’s build core.series on fastai V2!

oguiza · October 4, 2019, 6:35pm

I’ve completed 3-4 runs of InceptionTime baseline after fixing the cone layer bias (thanks to @tcapelle for pointing that out). The results of 500 epochs are now better, and almost the same as the one @hfawaz has with the TensorFlow version with 1500 epochs. So I think it’s a good start. Here are the results:

You’ll notice that the average performance is very close in 11 datasets, but there are 2 where it’s actually lower (Wine: -9%, and Herring: -7% accuracy compared to the avg of 5 TF InceptionTime runs).
I’ve digged a bit more to try to understand why, and have run Wine for 1500 epochs (with and without mixup). There’s a huge jump in performance (+23% without mixup, and +27% with mixup):

All these performance results use the same architecture as the original in TensorFlow. I have not made any single change yet. Any difference would mainly come from the training process/ optimizer.

So a few comments:

some datasets benefit from a much larger number of epochs (Wine).
mixup increases performance even after a long run. In the case of Wine an average of 3.7%. Actually, with a longer training and mixup, Wine goes from being the bottom performer, to being relatively close to the state-of-the-art.
accuracy at the end of training tends to be the same or slightly better that the accuracy at the lowest training loss. This seems to indicate that there is little / no overfit even after 1500 epochs.

farid · October 4, 2019, 6:44pm

Thank you very much @oguiza for sharing your fantastic work and inspiring some many people to get on board.

oguiza · October 4, 2019, 6:53pm

Thank you so much for your feedback Farid. I really appreciate it!
I just try to learn and contribute as much as I can

oguiza · October 4, 2019, 7:12pm

Hi @tcapelle,

I’d love to see the work that some of us have done ported to v2. That’d be awesome!
However, I have a big time limitation. I work with time series, and I’m looking for ways to get a better performance on some proprietary datasets that I use.
I’m looking at different data augmentation techniques (mixup, cutmix, etc,), semisupervised learning, architectures, training approaches, initialization, etc. so my priority is really to make as much progress as I can in these areas. This is pretty time consuming. And I don’t see the time pressure going down for at least a few weeks.
Having said that, please, let me know how can I help.
I’m willing to collaborate as much as I can with ideas, discussing potential approaches, etc. I have already shared all my fastai v1 code (timeseriesAI), and will continue to share any insights/ new code that I get in this area.

hfawaz · October 5, 2019, 7:37am

Cool results thanks for sharing, these data augmentation techniques seem promising. Can you explain what do you mean fixing the cone layer bias ? Because in InceptionTime I removed the bias from the convolutional layer.

oguiza · October 5, 2019, 8:15am

Thanks!
I meant fixing it from my Pytorch implementation of InceptionTime. So now both Pytorch and TensorFlow implementations are equivalent. Neither of them use bias in conv layers.

vrodriguezf · October 9, 2019, 4:08pm

Hi,

I know from reading the InceptionTime paper that long time series benefit from long filter sizes. But, is there any kind of rule, or limitation in this sense?

I have a dataset of time series with length 20000. What do you think would be a good set of values to filter lengths to play with it?

Best!

hfawaz · October 9, 2019, 7:46pm

Hi,

I believe that it really depends on your dataset.
If I were you I suggest starting with InceptionTime and tweaking the hyperparameters. Having a very long receptive field is usually good but you risk overfitting if you do not regularize properly.
Finally try subsampling if you have such very long time series in order to gain in speed.
Hope this helps

vrodriguezf · October 10, 2019, 8:13am

Thank you! I will play a little bit with InceptionTime hyperparameters. I am currently using the implementation on @oguiza 's repository. Very instructive!

BTW, do you guys know if there are advances in fastai and time series forecasting?

oguiza · October 10, 2019, 8:42am

Thank you!

I’d like to add something to Fawaz’s response.

I think it’s always important to try to understand the scale of the differences between classes in your time series.
Sometimes, the difference between classes is determined by a few data points (micro view). This is usually the case in short time series.
In other cases, the difference is due to the structure of the whole time series (macro view).
And in lastly, you may have a hybrid of both.
In the first case, you will need to extract local features to be able to assign a TS to a particular class. In the second one you’ll need to extract global features to do that.
InceptionTime, due to the large kernel sizes, is capable of extracting features from longer sections of the time series, which proves to be a benefit.
However, in the case of very long sequences (like in your case 20k), if the difference between classes if determined by global features, InceptionTime’s longer kernels may not be long enough to identify the required global features. In that case, it may be useful to subsample the TS to create shorter views of it where you apply the model.

Could you please explain what you mean by this, Victor?

vrodriguezf · October 10, 2019, 8:59am

Thank you @oguiza! I guess I should definitely go for subsampling…the data comes from physiological signals (like ECG) which are measured every millisecond.

With time series forecasting, I mean the task of predicting future values of a sequence (or a set of sequences) for a given future horizon. I’ve read some papers recently ( Guokun2018 and Shun-Yao2019 ) that use DL successfully for this task, but I am unaware if there is any implementation of this with the fastai style.

oguiza · October 10, 2019, 4:29pm

Hi Victor,
I don’t currently work on time series forecasting, so I’m afraid I won’t be able to help. There may be others in the forum who who might help you.
The only thing I remember I read was about 2 DL models that perform well in M4, which is a very important time series forecasting competition, help every few years. I think they have been mentioned before in this thread. Here are 2 links to models that perform well:

n-Beats: https://github.com/philipperemy/n-beats
Fast ES-RNN: https://github.com/damitkwr/ESRNN-GPU

Both of them have Pytorch implementation in GitHub.
As to the fastai style, the models are in pure Pytorch. So AFAIK, you’ll need to prepare data, and then use the standard training process.

vrodriguezf · October 10, 2019, 4:47pm

Thank you!! Very helpful links

Best!

stevevaius · October 11, 2019, 5:50am

Hi, I tried your code but I have some questions. First there is a error on 12th cell as

ValueError: Item wrong length 100 instead of 48000.

How can we fix it? And last but most important question is “How can I test prediction on a single univariate time series data with this colab notebook?” Please really looking fwd to solve this. Thanks for your code and efforts!

vrodriguezf · October 15, 2019, 6:06pm

Hi!

I am struggling a bit to get results with my dataset. I have around 3000 samples with a time series length of 1000. My intention is to use multivariate data, but for now I am using just one variable (ECG signal). There are 3 classes to learn.

I am trying to use the @oguiza’s implementation of InceptionTime, with a long kernel size (100). I get a curve in lr_find that looks good to me:

But during training, I see basically no learning of the network.

I’ve played with the parameters of the network (kernel size, depth, even the architecture), but I still don’t see that the network is learning during the first 5 epochs.

So, I am starting to ask myself whether there is something to learn at all. How can I run an easy classifier (aside from DL) to use as a baseline for checking if there is something to learn?

Best!