Time series/ sequential data study group

vrodriguezf · October 25, 2019, 9:01am

Thank you @oguiza, very inspiring notebook!

Actually I might be struggling with some scaling problems right now. I have a multivariate time series of 10 channels, 8 of them are related (their values sum up 1). I have 2 problems:

As you mention in your nb, using per_channel scaling would break the ratio between the dependent channels.
Some of the variables present values close to zero almost all the time, which would be a problem when using standardize, due to the standard deviation would be really close to zero. Is there a way to add an epsilon value to that scaling in your repo? I remember Jeremy doing that in some of the lessons of the course.

Best!

oguiza · October 25, 2019, 10:56am

Adding epsilon seems like a good idea to avoid issues. I’ll add it to the code.
In your case you may consider scaling data before creating the databunch. An option would be to standardize the 8 features one way, and the other 2 in a different way, create the databunch and set scale type to None (or remove). To test this quickly maybe you could pass 8 features only and select a scaling method and train the model. In this way you could check if maintaining the ratio between them makes sense or not.

vrodriguezf · October 25, 2019, 11:16am

Mmm yes, that looks good! I create two data bunches with different scaling properties and then combine them into one. I’ll give a try.

Thank you so much!

oguiza · October 25, 2019, 11:50am

Well I wasn’t thinking in combining 2 different databunches, but in testing them separately to quickly learn which scaling strategy could work best for your problem.

vrodriguezf · October 25, 2019, 11:59am

Yes, that’s easier of course, but at the very end I would like to use the whole set of channels for the classification.

GiantSquid · October 29, 2019, 2:54pm

I’m continuing to try out different things with my NBA forecasting project. One thing I’m unsure of is how to deal with different players. Basically the history of each player is its own time series. I’d like to train my model on all the players, in order to leverage all the data, and because there ought to be many common patterns. At the same time, players are clearly not identical. Any ideas on how to deal with this? My first thought is to create player embeddings, but I’m not sure if this is the best approach.

vrodriguezf · October 30, 2019, 11:27am

Hi! How are you doing forecasting in fast.ai?

vrodriguezf · October 30, 2019, 11:29am

Hi all,

This paper has just hit ArXiv, and looks promising:
ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Paper: https://arxiv.org/abs/1910.13051
Code: https://github.com/angus924/rocket

tcapelle · October 30, 2019, 12:06pm

I was just coming here to comment on this, it is super effective. I am trying this on my regression problem, and the results are amazing. Could the code be ported to pytorch to make use of GPU?
For my dataset 1M curves of 800 points it is slow.

vrodriguezf · October 30, 2019, 12:19pm

That’s great! To what extent it is improving your previous results?

oguiza · October 30, 2019, 12:24pm

Thanks a lot for sharing @vrodriguezf and @tcapelle!!
Do you know if ROCKET can be used with multivariate time series?

tcapelle · October 30, 2019, 12:24pm

I get more or less the same results (r2_score) as with InceptionTime, with just 3 lines of code, without GPU.

tcapelle · October 30, 2019, 12:25pm

Thta’s the other question I had. My timeseries have 2 channels, so I just reshaped them to one channel .view(N, -1), but probably the generate_kernels functions would need to be applied per channel. Also I am doing regression, so using RidgeCV instead of RidgeClassifierCV.

vrodriguezf · October 30, 2019, 12:26pm

How does it work exactly to reshape multiple channels into 1?

tcapelle · October 30, 2019, 12:27pm

2 channel TS of length 100:

data.shape
>(10000, 2, 100)

reshaped

data = data.view(10000, -1)
data.shape
>(10000, 200)

GiantSquid · October 30, 2019, 1:50pm

In order to adapt the classification setup to forecasting, I think you need to…

Change target from a classification label to the value you are trying to predict
Use a loss function suitable for regression (e.g. MAE or MSE)
Set c_out to 1
Make sure your labels are a FloatList (fastai should pick up on this automatically if your labels are floats, but if not you can specify it manually)

You can see my implementation of this in the NBA notebook in my fork of Ignacio’s timeseriesAI repo.

Fair warning: while I was able to get the model to train, it does not currently learn anything useful. Even when I use a small subset of training data, it isn’t able to overfit. So something (or more likely several things) is seriously wrong in my setup.

Currently I’ve gone back to the drawing board with much simpler models that are easier for me to understand and debug. If you can get fastai working for a forecasting problem I would love to learn what you did!

tcapelle · October 30, 2019, 2:04pm

@oguiza I just finished adapting Rocket for multi channel classification/regression.
You can check it here
I have zero experience with numba but it appears to work (had to remove some numba functions like mean, sum, etc… and replace them by for loops.)

vrodriguezf · October 30, 2019, 2:08pm

Awesome thank you so much @tcapelle!! What changes did you need to make over the original repo?

tcapelle · October 30, 2019, 2:13pm

If I get it correctly, you need to add the dimesion to the convolutional kernels, so I added a channel dimesion for each kernel.
Then I had to adjust how the kernel is applied, so added an extra for loop to perform the convolution channel wise, and then sum the results of both layers. I tried with some data I had o hand, and it get’s good results, so it should be safe to use.

oguiza · October 30, 2019, 3:09pm

Thanks so much, @tcapelle! That was fast, again!!
I don’t think I’ll have time today, but will take a deeper look at this tomorrow. Looks like a radically different approach to time series, but much faster than traditional approaches.
I’ve noticed they haven’t uploaded the code they used for the Satellite Image Time Series dataset (the one they recommend for larger datasets) yet. It’ll also be interesting to see how they apply this idea using logistic regression.