Time series/ sequential data study group

Interneuron · December 19, 2020, 4:04pm

It works great so far! I’ve just run the example code from your repo on some of my data, both minirocket and mini_dv.

I’m still absorbing the paper and code and plan to add multivariate support. Is there anything in minirocket that would prevent using ideas/code from the excellent multivariate extensions of rocket here?

austineaero · December 19, 2020, 6:28pm

Dear Experts,
I have downloaded @hfawaz InceptionTime model from github and I can see that @oguiza has modified it to run with pytorch, but I have a problem.
For about 3 weeks, I’ve been trying to tweak the model to suit my FOREX data in order to predict its price movement, but to no avail. I am familiar with LSTM but I want something better, and I know anything related to CNN will give me a better result.
I have used Transfer learning in MATLAB before but that was for images, and as you all know, MATLAB’s license is quite expensive, which is my major drive for switching to Python. Finding InceptionTime that builds on AlexNet warms my heart.
I’d like InceptionTime model that can just work on my FOREX data instead of “UCR_TS_Archive_2015”.

angusde · December 19, 2020, 10:02pm

Hi @Interneuron, great question.

The short answer is sort of. Minirocket is a bit different under the hood, but you can take a very similar approach to multivariate.

The ‘official’ multivariate implementation of minirocket, to the extent there is such a thing, uses essentially the same approach as the ‘official’ multivariate implementation of rocket (see sktime/rocket), i.e., channels are assigned randomly to each kernel. (For minirocket, channels are assigned randomly to each kernel/dilation combination.) Note that this is very much a ‘naive’ approach to multivariate input. It seems to work ok, but it hasn’t really been tuned, and in the case of minirocket I actually don’t know a lot about how it stacks up against other potential approaches, except that superficially it seems to be close to multivariate rocket in accuracy.

The multivariate implementation of minirocket is currently in progress as a pull request here. You should be able to just copy/paste the functions in minirocket_multivariate.py. Note that sktime uses black, so the code formatting can look pretty crazy, and some of the variable names have been changed a little to conform to sktime style.

If it’s helpful, I can make a ‘clean’ multivariate version available on the github repo.

I forgot to add: the multivariate version of minirocket has not (yet) been optimised properly, so it’s not as fast as it could be, but it should still be a lot faster than multivariate rocket.

oguiza · December 21, 2020, 10:04pm

Hi @angus,
It’s great to have you back in the forum, and especially after another great contribution like MINI-ROCKET!!
I must confess I’ve just scanned through the paper, and it looks very interesting, but haven’t really delved into it. I’ll do that in the next few days, and will probably have questions, comments, feedback.

angusde · December 21, 2020, 10:50pm

Hi @oguiza, no worries, I hope you’re doing well. No rush, happy to discuss whenever it suits you.

austineaero · December 23, 2020, 12:38pm

@oguiza Please scan 2 steps upwards, you will see my request.

If you can help me, that’d be nice.

oguiza · December 23, 2020, 1:04pm

Hi @austineaero,

Sorry for my late reply!

First of all, the problem you are trying to tackle is really hard, as FOREX quotes don’t move just based on previous prices. In addition to that, the signal to noise ration is very low, so I’m not sure how successful you or anybody else may be with this approach.

If you want to use the tsai library, I’d recommend you to take a look at the documentation. In particular at the data preparation section and the tutorial notebooks.

To use any of the models in tsai you will need to create an array of shape [samples x variables x timesteps]. To do this you can use the SlidingWindow function provided with the library. I don’t know if you are planning to use Close prices only, or OHLC, or what. Those would be your variables. As to the timesteps that will be determined by your window_len parameter. You will need to decide if you want to run a classification or regression task.

Once you have the data ready you can create a TSDataLoaders object. And then use ts_learner and an architecture of your choice to test it. And you can start to run it.

There are MANY decision you need to make:

which input data to use? All currencies? Just one? A few?
How many variables? Close? OHLC?
Window length? Stride? Overlapping windows?
Classification or regression?
Architecture?

etc.

As I said this will be an extremely difficult task, and don’t know how much success you will have, but you can certainly use tsai to test your ideas.

PS. There’s a very interesting book called " Advances in Financial Machine Learning" by Marcos Lopez de Prado that might give you some ideas on how you can approach this task.

mrfabulous1 · December 23, 2020, 1:27pm

Hi oguiza hope all is well!

I like your answer my son is a successful Forex trader, if the models were easy to create I’d clone him.

Cheers mrfabulous1

mrfabulous1 · December 23, 2020, 1:30pm

HI austineaero hope you are having a jolly day!

Does the above quote mean you are getting zero success or something like 10% success rate with your current models?

Cheers mrfabulous1

austineaero · December 23, 2020, 1:43pm

Hello @mrfabulous1,
I can say zero success as I’m greeted with error each time I try to run it.
It seems InceptionTime model was built with a particular time series in mind, so it’s not for all time series.

austineaero · December 23, 2020, 2:00pm

Yes @oguiza, it is difficult, and that is why I came here for help. Maybe someone has done it and would want to share.
I plan to use OHLC.
I’ll take a look at the book you recommended.
Thank you.

asoellinger · December 24, 2020, 7:08pm

I am looking at 10_nlp from fastbook and I want to train some embeddings on numerical data instead of text. The step that is stumping me is related to the adaption of defaults.text_proc_rules to the numerical time series domain. I have stock data from trading days and I want to put in a special character analogous to xxbos to indicate that one trading day ended and another is to begin. Are you aware of any brilliant ideas to add, shall we say, nuance to the data series in numerical analysis?
As an example, maybe since they’re prices, I could use some negative numbers as flags.

Thanks – aaron

geoHeil · December 25, 2020, 7:55am

@oguiza when using https://github.com/timeseriesAI/tsai with a non default metric like https://docs.fast.ai/metrics#CohenKappa are the train/validation loss columns directly representing the chosen metric? Or only the last one? And if not, what are they representing instead? As I want to perform anomaly detection, I guess I must change the default metric. So far, I have the feeling that the loss values (= used by the optimizer) are not representing the metric I would want to optimize for.

edit

Meanwhile I have found: https://docs.fast.ai/losses.html as well as how to apply them:
loss_func = CrossEntropyLossFlat(weight=class_weights)

However, I still need to figure out how to properly do this with class weights in the imbalanced context.

Secondly, I would like to understand how to use learn.predict(new_data). As I perate in the context of panel data (SlidingWindowPanel) I want to:

get predictions
but keep the ID / key

for each panel time series.

Furthermore:

how can I specify a threshold (in case of binary classification) i.e. to determine the class cutoff when evaluating the metrics (= transform 0.6 -> class label 1 or decide to already take a probability of 0.2 as class label 1)

oguiza · January 13, 2021, 11:40am

Hi @geoHeil,
Sorry for the late reply. I’ll try to briefly answer your questions:

When you select a metric in fastai it’s reflected in a column. But it doesn’t have any impact on the loss. The loss is set by default depending on your target, but you can always pass a new one if you want to change it. So it’s normal that when you add a metric, the loss doesn’t change.
As class weights, a common practice is to use the inverse of the class percent. In tsai this is automatically calculated. When you create a dataloaders, it’s an attribute called cws. So you can use: loss_func=CrossEntropyLossFlat(weight=dls.cws).
When you use SlidingWindowPanel you can set return_key=True to get the keys you have used. As to the predictions, you can just do: learn.get_X_preds(X_array). You should get them in the same order as you pass them. So you’d have both the prediction and the keys.
As to the threshold, there’s no way to pass it to the get_preds function AFAIK. What you can do in a binary task is the following:
- Extract the probabilities probas using learn.get_X_preds(X_array)
- preds_with_thr = (probas[:, 1] > thr)

oguiza · January 13, 2021, 11:51am

I’d like to share with you a new self-supervised callback I’ve added to the tsai library. It’s called TSBERT.

It allows you to pretrain any time series model in a self-supervised manner, ie. without labels. You can then fine-tune or train on a labeled dataset. It’s based on the “A Transformer-based Framework for Multivariate Time Series Representation Learning” paper.

I’ve tested it on a few datasets and it seems to work pretty well. Here are some results:

Screen Shot 2021-01-13 at 12.47.59

I’ve also added a notebook to demonstrate how it works.

This implementation can be used with any time series model (whether a transformer or not). In the notebook, for example, I’ve used InceptionTime.

I’d encourage you to use it. It’s very easy to use!

kamisoel · January 23, 2021, 2:45pm

Hi @oguiza,
First, thanks a lot for your awesome work with tsai! It is really useful for my work and already saved me a lot of time and trouble! I love it!
I also love the new addition of the TSBERT callback, but it seems like I found a bug concerning finetuning tsai models. The model parameter don’t get frozen if you call freeze() on them. I already opened an issue in github and I think I also found the root of the problem as well! The model has to be initialized as a Sequential(init, body, head) for the optimizer to freeze the parameters correctly. I would try to commit a fix, but I am still quite new to fastai and not that fluid in it yet.

oguiza · January 23, 2021, 5:27pm

Hi @kamisoel,
Thanks for your comments and for raising this issue. I’ve just seen it on GitHub. I hadn’t noticed it before since have not tried pre-trained models, but with TSBERT this needs to be fixed. I’ll look into it within the next few days.

vrodriguezf · January 27, 2021, 10:51am

For those interested in being up to date in the literature of time series, I’ve been managing a Zotero library for a while, storing new references that I come up with in arXiv. You can access it here:

oguiza · January 27, 2021, 7:38pm

I’ve already fixed this issue and updated the TSBERT tutorial accordingly. You can now use pretrained models in tsai with all architectures that include Plus in their name.
The results have not changed. Pretraining a model results in a better performance in all cases I’ve tested so far. So I’d encourage you to use this self-supervised approach.

el3oss · February 1, 2021, 6:29pm

Hi Oguiza and everyone,
Thank you very much for this forum and for this library which has been so useful to me.
I am learning to use Tsai/Fastai and it has been very straight forward.
currently I am using info from * Imaging time series to perform activity recognition from different sensors. My issue is that the dataset is multivariate and very large, so I cannot transform all the data windows from the sensors into images and also process them without running into memory errors. any hints on how I could fix this?