Time series/ sequential data study group

@oguiza Thank you for the reply. Do you think with this, say I had 4 years of data and I wanted to predict the next half year. Within the epoch training and for minibatch can you use this with the data bunch to train as well? Even if I can perform computations on this, can it be piped into the learner without causing memory issues? My understanding of the memory usage in an epoch is that the dataset is stored in memory, and the current minibatch is then tossed to the GPU to update weights.

If I only had 8gb of computer memory and I wanted to train on say a 50gb file, can this method be used with epochs and minibatches to train the tabular learner? I am really still kind of new to this.

I’ve used np.memmap as a replacement for numpy arrays. In my case I also have an 8GB RAM, and a 20GB dataset. Data is stored on disk, and the dataloader creates a batch on the fly.

A limitation to this approach is that np.memmap can only store data with the same dtype. So it’d be more complex if you want to use multiple dtypes in your tabular data.

So for example, if I wanted to do rossman type data that is 50gb that has several different types of columns, come numerical and some categorical, the np.memmap could only hold one column of data or it could hold all of the character columns? I am wondering how things like entity embeddings would work in this case too. Sorry, I am trying to avoid having to update my hardware and train on bigger data.

The datasets I use only contain 1 dtype. For example all your continuous data could be in a single array. If you have multiple dtypes, you would need to have multiple np.memmap arrays (one for continuous, and one for categorical, for example). This would require you to create a custom dataset where you can pass those datasets, so it could probably be done, but it’s more complex. As I said, I have not investigated this approach.

Fastseq is now added to the unofficial fastai extension repository :slight_smile:

(I have also put links to the two time serie V2 implementation while waiting for convergence to a single official repository)

2 Likes

The use-case you described here above falls under the time series (multipoint) forecasting. The case treated in Rossmann is a regression: It is a kind of a single point forecasting. There are many deep learning model used in time series forecasting: some are listed here.

You don’t need to load all your data in the RAM at once, and you can mix continuous data with categorical data (also called covariate variables such as day-of-the-day, hour-of-day, promo-dates, etc.)

You can use some lazy loading techniques to only load the chunks that you need to build your batch, and train and update your model per batch. As it is illustrated, here below, the model only need to have access to 2 small windows: 1) context window also called lookback window (green rectangle), and 2) prediction window also called forecast window (cyan color)

**zi,t **: is the curve that we would like to forecast (i.e. energy demand, sales, etc). The forecast starts at the end of the time serie.

xi,1,t and ui,2,t: feature time series or co-variate variables (respectively categorical and continuous data)

ui,2,t: is represent the day of the week in this example. It is a categorical data in this case, an embedding is used when we train a given model.

1 Like

@oguiza,

When asked to plot a CUDA Tensor, matplotlib.pyplot complains

TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

But after
from fastai_timeseries import *

matplotlib and scikit can handle CUDA Tensors directly.

What kind of magic is this?

Hi there,
I work with a time-stamped log data and I want to make a model that could generalize user “habits”. For example, I have a huge dataset which made during an edX MOOC course (Online Video-Based Education - 6 weeks). Every user has log history and based on this I want to predict the user outcome after the first, second, etc. week. Another example is I have a customer purchase history, and I want to predict the amount of the next purchase.
In the first example, I have only categorical data (play, stop, pause, site viewing, etc.), while in the second I have continual and discrete variables.
As an initial project I used GRU and LSTM, but I want to use a more sophisticated model.
My goal is to make a regression (prediction) on time-series data.
My first idea to use BERT for the first and InceptionTime or TCNN for the second.
Do you have any suggestions?
InceptionTime is made for Time-series classification. It can use for regression?

image

1 Like

Yes you can use InceptionTime for regression. Just adapt the last layer and the loss function. I think there are some examples in the timeseries_fastai repo.

I came here to find what I can use to solve https://www.kaggle.com/c/liverpool-ion-switching and all these repos made me a bit confused.

Here https://github.com/tcapelle/timeseries_fastai the data format in examples is different. As I see it, in the competition we have different batches, and we aim to predict x for each time step and y.

Is there anything handy I could use?

Hi all

Iam a medical student and new to DL, trying to make my way through part 2 course.

I find that much of the medical data that i want to use is irregular sampled.

How do you guys deal with irregular sampled data?

I found this paper which deals with irregular sampled data: https://arxiv.org/abs/1909.12064
Have any of you guys implemented something like this?

1 Like

As far as I know there is nothing implemented yet for irregular time series in fastai. What I normally do is to make the data regular by splitting it into evenly spaced intervals, and filling missing values properly.

1 Like

30% faster training than v1 with a scikit-learn-like API for numpy arrays

During the last couple of weeks I’ve been porting TimeseriesAI to fastai v2.

The new API looks like this:

dsets = TSDatasets(X, y=None, tfms=tfms, sel_vars=None, sel_steps=None, splits=splits, inplace=True)

If you want to try it there are 3 tutorial_nbs (all available in Colab).

5 Likes

Thank You for your hard work. I will definitely try out your colab notebook this week. In the past old time, I only know about using RNN to do time series prediction.

But with your work, I think I can have more powerful tool for it.

Has anyone worked on Time series regression? Is there any library for this purpose?

Thanks,

1 Like

@oguiza,
I am running ROCKET on fast.ai v1. My time series is just a long vector X(0), X(1), X(2), …
My code is:

reshape my 1D vector to 3D tensors

X_train = torch.tensor(X_train, dtype=torch.float32, device=device).reshape(X_train.shape[0], 1, 1)
X_valid = torch.tensor(X_valid, dtype=torch.float32, device=device).reshape(X_valid.shape[0], 1, 1)
features=1
seq_len=1

Bu when I run:
n_kernels=10_000
kss=[7, 9, 11]
model = ROCKET(features, seq_len, n_kernels=n_kernels, kss=kss).to(device)

I got error:
ValueError Traceback (most recent call last)
in ()
1 n_kernels=10_000
2 kss=[7, 9, 11]
----> 3 model = ROCKET(features, seq_len, n_kernels=n_kernels, kss=kss).to(device)

in init(self, c_in, seq_len, n_kernels, kss)
17 convs = nn.ModuleList()
18 for i in range(n_kernels):
—> 19 ks = np.random.choice(kss)
20 dilation = 2**np.random.uniform(0, np.log2((seq_len - 1) // (ks - 1)))
21 padding = int((ks - 1) * dilation // 2) if np.random.randint(2) == 1 else 0

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: ‘a’ cannot be empty unless no samples are taken

1 Like

Hi, Can I ask about the TimeSeries classification for multivariable inputs using Image? I have found this technique in a notebook of Oguiza but the code is quite long that I’m still don’t understand how the multi signal can be encode to an image. It is easy with number of variables < 3 that we can use 3 channels images but how about with number of variables > 3 ? Because we can mix up with correlated color.

How good is this technique? Does it have a limit for number of variable to make it work ?

The notebook I refered to is this one https://gist.github.com/oguiza/26020067f499d48dc52e5bcb8f5f1c57 .

Thanks,

I think the image encoders used in that notebook come from the pyts library. You can have a look here.

1 Like

Great ! Thanks @vrodriguezf. I’m quite new in TimeSeries . I found this https://pyts.readthedocs.io/en/stable/auto_examples/multivariate/plot_joint_rp.html#sphx-glr-auto-examples-multivariate-plot-joint-rp-py in pyts library and maybe we can use for multivariate TS.

Thanks again for your help,

Hello everyone !
I am currently working on TS prediction and classification ,
For the classification part , I am bit confused as i have Multi Variate Time Series comprising of more than 30 time series .
I just have a single feature associated with them .
How can i implement clustering of TS , there may be varied length and missing values too in the data .
I am confused as literature suggests using DTW for measure of similarity , has anyone used DTW with CNN or DL methodologies , if yes then can you walk me through .

Thanks a lot.