Time series/ sequential data study group

I am using fastai Mixup :relaxed: It works, but not yet sure if it performs good enough.
Here are the results for Res2Net50:

|                              |   epochs |        loss |   val_loss |   accuracy |   accuracy_ts |   max_accuracy |   time (s) |
|:-----------------------------|---------:|------------:|-----------:|-----------:|--------------:|---------------:|-----------:|
| Wine                         |      100 | 0.0472185   |   0.836495 |   0.75     |      0.75     |       0.895833 |         52 |
| BeetleFly                    |      100 | 0.000209084 |   1.11402  |   0.8      |      0.85     |       0.9      |         61 |
| InlineSkate                  |      100 | 0.0516765   |   2.72746  |   0.42     |      0.42     |       0.423636 |         98 |
| MiddlePhalanxTW              |      100 | 0.000941262 |   2.88568  |   0.532468 |      0.551948 |       0.571429 |         97 |
| OliveOil                     |      100 | 0.292994    |   0.38972  |   0.9      |      0.9      |       0.9      |         68 |
| SmallKitchenAppliances       |      100 | 0.00445449  |   1.60318  |   0.741333 |      0.741333 |       0.778667 |        105 |
| WordSynonyms                 |      100 | 0.00383701  |   3.29962  |   0.534483 |      0.532915 |       0.543887 |         84 |
| MiddlePhalanxOutlineAgeGroup |      100 | 0.00181109  |   3.13823  |   0.493506 |      0.493506 |       0.623377 |         92 |
| MoteStrain                   |      100 | 0.00239903  |   1.1964   |   0.744409 |      0.744409 |       0.794728 |        694 |
| Phoneme                      |      100 | 0.00118895  |   5.0102   |   0.224156 |      0.224156 |       0.244198 |        145 |
| Herring                      |      100 | 0.00332776  |   2.34432  |   0.546875 |      0.546875 |       0.65625  |         66 |
| ScreenType                   |      100 | 0.000509472 |   2.89669  |   0.482667 |      0.482667 |       0.581333 |         96 |
| ChlorineConcentration        |      100 | 0.00138494  |   0.806829 |   0.851823 |      0.851823 |       0.854427 |        196 |

How can I compute the number of params?

1 Like

No there are no missing values. I gues you could delete some randomly if you wanted to try it.

I’ve never dealt with missing values to be honest with you. I don’t know if they would even work.
Usually you would replace those missing values with a constant, or an average, or median, etc.
Sorry I can’t help more.

I’d think you’d want an external preprocessing step in your data frame to handle this with the average. There’s a few different methods to it but I usually do the average, so here I’d do it over the particular series instance (row). That’s how I’d go about missing values in this case :slight_smile:

1 Like

Makes sense! I will make mean imputation then. Thanks!

How would you proceed doing so?

  • Creating a new Timeseries core like notebook 40_tabular?
  • A more basic TensorData one?
  • Creating a DataBlock to read Timeseries?
    I would like to start implementing this on V2, but don’t know where to start.

Best to ask these questions in the v2 thread you just created.

New notebook: Time Series Data Augmentation (cutout, mixup, cutmix)

As you know one of the best ways to improve performance (especially when a dataset is small) is to do some type of data augmentation.
In the last coupld of years some cutout, mixup and cutmix have appeared as very powerful ways to improve performance in computer vision problems.
I’ve created a notebook to show you how easily you can start using them. In many cases, performance can be improved. If you want to see how it works you can go to timeseriesAI and open notebook 03_new_Time_Series_data_augmentations.ipynb
I hope you find it useful in you time series problems.

I saw the notebook earlier today, is awesome! Let’s build core.series on fastai V2!

I’ve completed 3-4 runs of InceptionTime baseline after fixing the cone layer bias (thanks to @tcapelle for pointing that out). The results of 500 epochs are now better, and almost the same as the one @hfawaz has with the TensorFlow version with 1500 epochs. So I think it’s a good start. Here are the results:

You’ll notice that the average performance is very close in 11 datasets, but there are 2 where it’s actually lower (Wine: -9%, and Herring: -7% accuracy compared to the avg of 5 TF InceptionTime runs).
I’ve digged a bit more to try to understand why, and have run Wine for 1500 epochs (with and without mixup). There’s a huge jump in performance (+23% without mixup, and +27% with mixup):

All these performance results use the same architecture as the original in TensorFlow. I have not made any single change yet. Any difference would mainly come from the training process/ optimizer.

So a few comments:

  1. some datasets benefit from a much larger number of epochs (Wine).
  2. mixup increases performance even after a long run. In the case of Wine an average of 3.7%. Actually, with a longer training and mixup, Wine goes from being the bottom performer, to being relatively close to the state-of-the-art.
  3. accuracy at the end of training tends to be the same or slightly better that the accuracy at the lowest training loss. This seems to indicate that there is little / no overfit even after 1500 epochs.
2 Likes

Thank you very much @oguiza for sharing your fantastic work and inspiring some many people to get on board.

Thank you so much for your feedback Farid. I really appreciate it!
I just try to learn and contribute as much as I can :smiley:

Hi @tcapelle,

I’d love to see the work that some of us have done ported to v2. That’d be awesome!
However, I have a big time limitation. I work with time series, and I’m looking for ways to get a better performance on some proprietary datasets that I use.
I’m looking at different data augmentation techniques (mixup, cutmix, etc,), semisupervised learning, architectures, training approaches, initialization, etc. so my priority is really to make as much progress as I can in these areas. This is pretty time consuming. And I don’t see the time pressure going down for at least a few weeks.
Having said that, please, let me know how can I help.
I’m willing to collaborate as much as I can with ideas, discussing potential approaches, etc. I have already shared all my fastai v1 code (timeseriesAI), and will continue to share any insights/ new code that I get in this area.

2 Likes

Cool results thanks for sharing, these data augmentation techniques seem promising. Can you explain what do you mean fixing the cone layer bias ? Because in InceptionTime I removed the bias from the convolutional layer.

Thanks!
I meant fixing it from my Pytorch implementation of InceptionTime. So now both Pytorch and TensorFlow implementations are equivalent. Neither of them use bias in conv layers.

1 Like

Hi,

I know from reading the InceptionTime paper that long time series benefit from long filter sizes. But, is there any kind of rule, or limitation in this sense?

I have a dataset of time series with length 20000. What do you think would be a good set of values to filter lengths to play with it?

Best!

Hi,

I believe that it really depends on your dataset.
If I were you I suggest starting with InceptionTime and tweaking the hyperparameters. Having a very long receptive field is usually good but you risk overfitting if you do not regularize properly.
Finally try subsampling if you have such very long time series in order to gain in speed.
Hope this helps :slight_smile:

Thank you! I will play a little bit with InceptionTime hyperparameters. I am currently using the implementation on @oguiza 's repository. Very instructive!

BTW, do you guys know if there are advances in fastai and time series forecasting?

Thank you!

I’d like to add something to Fawaz’s response.

I think it’s always important to try to understand the scale of the differences between classes in your time series.
Sometimes, the difference between classes is determined by a few data points (micro view). This is usually the case in short time series.
In other cases, the difference is due to the structure of the whole time series (macro view).
And in lastly, you may have a hybrid of both.
In the first case, you will need to extract local features to be able to assign a TS to a particular class. In the second one you’ll need to extract global features to do that.
InceptionTime, due to the large kernel sizes, is capable of extracting features from longer sections of the time series, which proves to be a benefit.
However, in the case of very long sequences (like in your case 20k), if the difference between classes if determined by global features, InceptionTime’s longer kernels may not be long enough to identify the required global features. In that case, it may be useful to subsample the TS to create shorter views of it where you apply the model.

Could you please explain what you mean by this, Victor?

1 Like

Thank you @oguiza! I guess I should definitely go for subsampling…the data comes from physiological signals (like ECG) which are measured every millisecond.

With time series forecasting, I mean the task of predicting future values of a sequence (or a set of sequences) for a given future horizon. I’ve read some papers recently ( Guokun2018 and Shun-Yao2019 ) that use DL successfully for this task, but I am unaware if there is any implementation of this with the fastai style.

Hi Victor,
I don’t currently work on time series forecasting, so I’m afraid I won’t be able to help. There may be others in the forum who who might help you.
The only thing I remember I read was about 2 DL models that perform well in M4, which is a very important time series forecasting competition, help every few years. I think they have been mentioned before in this thread. Here are 2 links to models that perform well:

Both of them have Pytorch implementation in GitHub.
As to the fastai style, the models are in pure Pytorch. So AFAIK, you’ll need to prepare data, and then use the standard training process.

1 Like