Time series/ sequential data study group

simone.parvizi · December 3, 2020, 3:06pm

Can anybody help me?

I’m new to deep learning and right now I’m stuck in a project I’m doing alone where I want to predict in an EEG pattern if the eyes are open or closed. The problem is that the accuracy is the same from start to end at 64.5%. I’ve tried to change the loss and the bs number but nothing changed. I have no idea what to do. Please help me thanks.

here’s the code(click on it):

j_simon · December 3, 2020, 3:52pm

I think you have time-series here, but us a tabular learner. This one is, if I understood that correctly, mainly learning embeddings but not the dependencies between the timesteps. If you look carefully your training loss goes down but valid goes up quite a bit. I think you use the wrong model, so it does not work.

The classic thing to try first here is an LSTM model I guess. I did not find a good fast.ai resource, only pure PyTorch, for doing that with sensor data (I have some PyTorch code here: https://github.com/joergsimon/xai-tutorial-april-2020 ). So maybe some more fast.ai experts might help you here.

As a side note: You did put a learning rate more or less at the end of the plot, where we also have a valley. For this tabular learner, you might also try with lr of f.e. 1e-4. But again, I think this is not the right model for the task anyway.

robmarkcole · December 3, 2020, 4:02pm

Interesting new (to me at least) package below, which is an implementation of the popular prophet but with pytorch, and the AR-net is using fastai2.

oguiza · December 3, 2020, 4:39pm

Hi @simone.parvizi,

If I interpret your data correctly, you have an EEG with 14 channels and around 15k long. There’s a column for the output. I guess the data is equally spaced and index in time.
If that is the case, the choice you made is not very helpful.
It would be better to use a time series model that takes subsequences of the entire sequence.
But the first thing you need to do is to convert the data you have into samples that can be processed by a time series model.
You may want to take a look at data preparation in the tsai library. You would need to create an array with shape: [n samples x channels x timesteps] using the SlidingWindow function. There are examples that show how to use this function.
That output is something you can then use to create a TSDataLoaders and a model like InceptionTime. These approach tends to work much better. Please, let me know if that works.

Romandovega · December 4, 2020, 1:46am

This tsai preparation library is amazing. This type of library has noodled in my head for over a year. I can’t believe you have done this. Really amazing.

neoyipeng · December 10, 2020, 12:13am

Anyone interested to join https://www.kaggle.com/c/jane-street-market-prediction? It looks to me like a time series classification contest.

nishant_g · December 10, 2020, 1:17am

Yeah, I am interested. I have seen similar competitions on Kaggle before, but I feel the Deep Learning community was never in a better position to solve sequential data.

gerardo · December 10, 2020, 5:13am

I would like to participate

Pomo · December 10, 2020, 6:19am

Jane Street is happy with the performance of its existing trading model for this particular question.

Ha ha. They could be a bit happier I’ll bet.

So Jane Street Group harvests our skills and best ideas world-wide for a mere $70,000?

Why don’t we form our own coalition and use any profits to give scholarships to ML students, or for something else socially useful?

rolfe1 · December 13, 2020, 2:24am

I am interested.

neoyipeng · December 13, 2020, 3:16am

@rolfe1, @gerardo, @nishant_g i’ll pm you guys, we can set up a group to discuss!

geoHeil · December 13, 2020, 6:03pm

Hi, I am just getting started with deep learning on time-series (and deep learning as well). https://timeseriesai.github.io/tsai//index.html is awesome! I have follwed the tutorial https://colab.research.google.com/github/timeseriesAI/tsai/blob/master/tutorial_nbs/05_TS_archs_comparison.ipynb for my own dataset - i.e. applied various models and used the:

dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[1024, 2048], batch_tfms=[TSStandardize()], num_workers=0)

when creating the model. I do not know if the batch size is too small/large (I have a P100) but so far plenty of memory is left. However, the losses for each iteration of fit_on_cycle are always nan - unlike for the example data when using my own data set. What is wrong or needs to be changed? Notice: I am operating in a highly unbalanced binary classification setting (= anomaly detection) with weak (ie. sometimes missing and not always 100% correct) labels

I think it has something to do with batch size and data loading - for a very small batch size (with very slow training) - the visualization at least shows some time series - whereas if the batch size gets larger it seems to be empty:

But even when changing the data loading code to:
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[2048, 4096], batch_tfms=[], num_workers=0) # images defined

which results in some time-series being displyed - the result is the same. It is all NaN for the losses.

oguiza · December 14, 2020, 5:26am

Hi @geoHeil,
Thanks for your comments about the tsai library!

I’d say the issue you have is likely coming from having missing labels. In a supervised setting like the one you are using, you need to have a label (that may be weak, or noisy) for every sample. Otherwise there’s nothing to be learned, and the loss returns a nan.
This matches your comment on the batch size, as large ones increase the chance of having an unlabeled sample in the batch.
Try removing the missing labels before creating the dataset, and it should work. Or you may fill the missing values with a new label (like ‘na’ for categories or 0, the mean, median, etc for regression). But do this before creating the dataset.

geoHeil · December 14, 2020, 6:48am

Unfortunately, I hadalready filled NaN/ any non labelled values with 0! It is highly unbalanced though for class 1 - in fact I only have some weak labels for class 1. So this did not solve it.

However, I have observed that there are some X values with Nulls.

geoHeil · December 14, 2020, 9:00am

I can confirm that NaN in X were the problem! There should not have been any. I need to double check the pre-processing. But it looks like from here on TsAI does a great job!

I wonder if the existing pre-processing functions (normalization) can simply be used when using the panel data @oguiza ?

oguiza · December 14, 2020, 12:37pm

To answer your question on preprocessing it depends. tsai provides multiple ways to standardize data. Yo can normalize or standardize, and do it using the entire training set (default), or by_sample, by_var (that is for all channels in training), or by_sample and by_var (for each channel in each sample). What it doesn’t provide is a way to standardize data by group (based on an id, like you may have in panel data). If that is the case, you will need to standardize data before creating the TSDatasets.
The preprocessing provided will probably cover most cases.
What I’ve seen in my experiments, is that the choice of preprocessing is very data dependent, so you may need to experiment with different preprocessing approaches.

vrodriguezf · December 17, 2020, 12:08pm

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Super interesting! Can’t wait for NANOROCKET!

oguiza · December 17, 2020, 4:17pm

Really interesting indeed. Thanks a lot for sharing Victor!

mrfabulous1 · December 17, 2020, 8:31pm

Hi vrodriguezf Hope all is well!
Your like a an early warning system for great papers!

Cheers mrfabulous1

angusde · December 18, 2020, 11:20pm

Hi all, I hope everyone is doing ok. Happy to answer any questions you might have, or help with troubleshooting.