Time series/ sequential data study group

oguiza · April 11, 2020, 11:15am

30% faster training than v1 with a scikit-learn-like API for numpy arrays

During the last couple of weeks I’ve been porting TimeseriesAI to fastai v2.

The new API looks like this:

dsets = TSDatasets(X, y=None, tfms=tfms, sel_vars=None, sel_steps=None, splits=splits, inplace=True)

If you want to try it there are 3 tutorial_nbs (all available in Colab).

JonathanSum · April 11, 2020, 12:52pm

Thank You for your hard work. I will definitely try out your colab notebook this week. In the past old time, I only know about using RNN to do time series prediction.

But with your work, I think I can have more powerful tool for it.

AjayStark · April 11, 2020, 1:06pm

Has anyone worked on Time series regression? Is there any library for this purpose?

Thanks,

SidNg · April 13, 2020, 3:01am

@oguiza,
I am running ROCKET on fast.ai v1. My time series is just a long vector X(0), X(1), X(2), …
My code is:

reshape my 1D vector to 3D tensors

X_train = torch.tensor(X_train, dtype=torch.float32, device=device).reshape(X_train.shape[0], 1, 1)
X_valid = torch.tensor(X_valid, dtype=torch.float32, device=device).reshape(X_valid.shape[0], 1, 1)
features=1
seq_len=1

Bu when I run:
n_kernels=10_000
kss=[7, 9, 11]
model = ROCKET(features, seq_len, n_kernels=n_kernels, kss=kss).to(device)

I got error:
ValueError Traceback (most recent call last)
in ()
1 n_kernels=10_000
2 kss=[7, 9, 11]
----> 3 model = ROCKET(features, seq_len, n_kernels=n_kernels, kss=kss).to(device)

in init(self, c_in, seq_len, n_kernels, kss)
17 convs = nn.ModuleList()
18 for i in range(n_kernels):
—> 19 ks = np.random.choice(kss)
20 dilation = 2**np.random.uniform(0, np.log2((seq_len - 1) // (ks - 1)))
21 padding = int((ks - 1) * dilation // 2) if np.random.randint(2) == 1 else 0

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: ‘a’ cannot be empty unless no samples are taken

dhoa · April 17, 2020, 8:59am

Hi, Can I ask about the TimeSeries classification for multivariable inputs using Image? I have found this technique in a notebook of Oguiza but the code is quite long that I’m still don’t understand how the multi signal can be encode to an image. It is easy with number of variables < 3 that we can use 3 channels images but how about with number of variables > 3 ? Because we can mix up with correlated color.

How good is this technique? Does it have a limit for number of variable to make it work ?

The notebook I refered to is this one https://gist.github.com/oguiza/26020067f499d48dc52e5bcb8f5f1c57 .

Thanks,

vrodriguezf · April 17, 2020, 9:45am

I think the image encoders used in that notebook come from the pyts library. You can have a look here.

dhoa · April 17, 2020, 11:10am

Great ! Thanks @vrodriguezf. I’m quite new in TimeSeries . I found this https://pyts.readthedocs.io/en/stable/auto_examples/multivariate/plot_joint_rp.html#sphx-glr-auto-examples-multivariate-plot-joint-rp-py in pyts library and maybe we can use for multivariate TS.

Thanks again for your help,

cc111222 · April 24, 2020, 2:07pm

Hello everyone !
I am currently working on TS prediction and classification ,
For the classification part , I am bit confused as i have Multi Variate Time Series comprising of more than 30 time series .
I just have a single feature associated with them .
How can i implement clustering of TS , there may be varied length and missing values too in the data .
I am confused as literature suggests using DTW for measure of similarity , has anyone used DTW with CNN or DL methodologies , if yes then can you walk me through .

Thanks a lot.

vrodriguezf · April 24, 2020, 2:22pm

Hi!

If you are ok with using R for time series clustering, the TSclust has a great bunch of dissimilarity measures that you can use for clustering, some of them accept time series with different lengths. If you go for that path, the package imputeTS can help you with the missing values.

dovlex · April 25, 2020, 1:56pm

Hi @vrodriguezf, I had the same question and I contacted the author. He responded:

“I guess you can if you slightly change the output of the dense layer to 1, and also change the input of AR part to the series of you target variable. However, as you know, it needs experimental comparison to make sure whether DSANet will perform better than other methods on your data.”

I am also working on an exactly same problem as you are. Learning from multiple series that correlate/influence a target series, the series I want to forecast.

It seems to me that DSANet and Temporal Pattern Attention for MTS Forecasting are SOTA in the field of MTS forecasting. Have you found and tried some other approaches?

Best,
Vladimir

vrodriguezf · April 27, 2020, 8:47am

Hi @dovlex. Not really, right now I would like to explore how to add multiple time-dependent input variables to a SOTA architecture for univariate forecasting such as N-beats.

remapears · May 1, 2020, 2:27pm

Dear @oguiza, first of all, thank you so much for such an impressive work!

Second, I have been using your TS library for a multi-label classification problem of univariate TS following your intro classification notebook, yet when I want to add a data augmentation technique (such as cutout or cutmix) I get the following problem:

db=(TimeSeriesList.from_df(df,’.’, cols=df.columns.values[:-1],feat=None)
.split_by_rand_pct(valid_pct=0.2,seed=seed)
.label_from_df(cols=‘labels’,label_cls=MultiCategoryList,label_delim=’ ')
.databunch(bs=bs,val_bs=bs*2,num_workers=0,device=torch.device(‘cuda’))
.scale(scale_type=scale_type,scale_by_channel=scale_by_channel,scale_by_sample=scale_by_sample,scale_range=scale_range)
)

arch = InceptionTIme
model=arch(db.features,db.c,**arch_kwargs).to(device)
learn=Learner(db,model,metrics=[acc_02,f_score], opt_func,loss_func=loss_func).cutout().show_tfms()

I get: No transformation has been applied. I was expecting to see visualizations of augmentation examples…

I tried to define transform in the db definition with tfms=None but I still got the same problem.

Am I doing something wrong?

remapears · May 1, 2020, 2:48pm

In addition, if I do not use the tfms_show() I get an error when starting training or searching for best LR (learn.lf_find) :

TypeError: Normal() received an invalid combination of arguments - got (int, float, tuple, device=torch.device, dtype=torch.dtype), but expected one of:

(Tensor mean, Tensor std, torch.Generator generator, Tensor out)

(Tensor mean, float std, torch.Generator generator, Tensor out)

(float mean, Tensor std, torch.Generator generator, Tensor out)

oguiza · May 2, 2020, 6:14am

Thank you so much!

I think the issue you mention is due to the fast you are using multi-label classification. When I designed this function, I didn’t have any multi-label data and couldn’t test it. So I’d assume it doesn’t work for multilabel problems.

I’m currently working on porting the timeseriesAI repo to fastai2. (BTW, there are many benefits in using fastai2, like 1.5 faster training when using numpy arrays).

The issue you mention would require some testing from my side, when this is already managed in a simple way in fastai2 (show_batch(unique=True)).

I’m really sorry @remapears I can’t help more, but I don’t have any plans to spend much time updating v1 based timeseriesAI. I’d actually encourage you to try the updated version of timeseriesAI based on fastai2. There is an updated intro notebook that might be interesting to you, although there are still missing components (cutout, cutmix, etc) that I have not added, tested yet.

remapears · May 2, 2020, 1:14pm

Oh sure! I will move to the fastai2 right away! And I ll be keeping an eye on any updates from your side!
Thank you again!

marteen · May 7, 2020, 1:17pm

Hi guys,

I need a quick sanity check on something.

I am measuring a signal with a high rate (4 MHz) for 1 second each, so I have 4 Million datapoints per sample. The signal is a periodical stochastic signal, so it contains a lot of noise but also recurrent patterns.

Since my datasets are just around 300 Samples each and the Samples potentially contain much Information, my strategy is to subsample each Sample, say 200 Samples of len 2000% and then use a random train/test Split with a classificator (eg a 1d CNN).

Am I missing something here or is this an acceptable approach? Testset Performance is good. Only thing I can imagine is that through subsampling I am encoding something in my model, but then again this would be a relevant feature if each subsample has the same length, not something I would not want to encode. I hope this is understandable :D.

SidNg · May 8, 2020, 6:38am

Hi all,
I’m interested in using time series tools like InceptionTime & Rocket but have some queries on data preparation.

I have difficulty understanding the terms often used in the notebooks & UCR datasets.
What exactly does Samples, Features and Sequence length mean ?

Let’s say I have a univariate time series X(0), X(1), X(2),… X(N).

How do I construct my data so I can feed into the models like Inception Time ?

My understanding --> Samples = N (length of time series)
–> Seq length = 10 (like NLP, lookback sequence if X(10), then look back X(0) to X(9)
Does that mean I have to create 10 extra columns [ lookback X(9) to X(0) ] at each time step X(t) ??
–> what about Features ? Do I have 1 feature for univariate ?

Any help is much appreciated
cheers
sid

Shiva_K · May 8, 2020, 3:53pm

This is really amazing stuff! Are there any ways we could extend Gramian Angular Fields or are there any other techniques for transforming a multivariate time series to 2D images for a CNN. Any links or resources would be much appreciated. Thanks.

vrodriguezf · May 8, 2020, 4:49pm

pyts has joint recurrence plots for this.

jerron · May 14, 2020, 6:18pm

Not sure if this question was answered. I did a search here but didn’t find similar questions although I feel it is quite common.

I try to use the timeseriesAI https://github.com/timeseriesAI

I have multivariate time series in tabular form. Each column of it is a time series for a variable. The first column is the time mark. The last column is the target. Each row are the values of all the variables at the given time. How do we transform such data into the expected format of 3-d array:

Samples
Variables
Length (aka time or sequence steps)

I guess I need split the rows into smaller pieces to get multiple samples. Say, if my orginal data have 10k rows, I need to break them into smaller pieces like 100 timestamps in each. Can/Shall I make the samples overlap? For example, the first sample is the values for time 0 to 99, and the second sample is the values for time 1 to 100. Or shall I make them disjoint, so that the second sample should be values for time 100 to 199?

Thank you!