Time series/ sequential data study group

Where does it crash exactly in the pipeline? Have you tried to reduce the batch size?

Hi vrodriguez,

Well I am not using fastai library.
I just have an array containing segmented data, and then when applying for example the markov or the gramian angular fields algorithm from PYTS library it is making the ram crash because the resulting images are huge.
While when using yout library it doesn’t! So I just want to know what eas your trick! Are you doing batch processing? If so how dis you do it. My code is simillar to the one bellow

Gmaf = mkv_transformation(array)

Where array shape is a 19000 x 100X 3 .
I should finally have 19000 images of size (224x224)

Hi @el3oss,
I’m not sure what makes your system crash. But these image-like objects (they are not proper images, as they have a channel per variable) are pretty big, so you won’t be able to store many in memory.
tsai creates these representation online, when the batch is created. The process is slow though, especially if you have a large dataset. So I try to always use the raw data instead of this expensive transforms. Having said that, if you still want to apply any ts to image transform, you have 2 options:

  • perform the transform online (per batch). This will slow down the process as you will need to create the images. It has the benefit that you may be able to apply data augmentation. You will need to manage the batch sie to ensure you don’t get an OOM error.
  • preprocess all data and save the output to disk (numpy array for example). Then you’ll need to create batches from the data on disk. (This is not implemented in tsai).
1 Like

Dear oguiza,
Many thanks for your answer.
Yeah I want to write a paper on different encoding techniques and see if it improves activity recognition.
As you said I tried the tsai approach but it was very slow, I also followed your advice to save them the images in an array and then use memmap from numpy and classify them.
Just one question, once I have the images on my disk, can I load them by batches and do my classification in order to avoid my ram to crash again? If so, using memmap from numpy as you kindly explained in your notbook, woukd work?

Regards

A Discord server would be good. I have so many questions because of the learning curve but it’s hard to get them answered.

1 Like

How can I plot the 10 most confusing time-series examples in TSAI? I know that in fast AI something along these lines seems to work Wanting to plot most confused classes - #9 by muellerzr But I wonder if the learn.show_results() could somehow include this directly? tsai/core.py at 57a79ddb8ebf8a187277303be0ef2a7853673060 · timeseriesAI/tsai · GitHub

1 Like

Hi guys
I have hit a brick wall and need your help.
My model is only ever using the last value it encounters to make predictions.
I tested this by only giving it the minimum 9 data points required by MiniRocket. These 9 points usually only contain 1 value due to high frequency sampling.

This gave the best accuracy, which turns out to exactly match the random movement of the time series (i.e. no predictive ability beyond randomly guessing).

When I add more prior data, all relevant, the accuracy only goes down. In other words, the more i ask the model to use data other than the last encountered value, the worse it gets.

I can’t understand what’s going wrong. Just relying on the last value is useless as I need to predict movements in the data.

Hi @shado,
I’m afraid it’s impossible to help you with the information you have provided. Is there any code you can share to further explain what you issue is?

1 Like

@oguiza thank you. I am using this code:

wl = 20
stride = 20
print(‘input shape:’, X_train_full.shape)
X_train, y_train = SlidingWindow(wl, stride=stride, check_leakage=True, get_y=[0])(X_train_full)

input shape: (16800, 16)
Out[94]:
((839, 16, 20), (839,)

I’m not how many steps ahead get_y is giving me. It looks like only the first value of the next window, but I need at least one step ahead (20 values) otherwise it’s not predicting t+1 just t0.

E.g. say these are two of my windows:
w1 = aaaaaaaaaaaaabbbbbbb
w2 = bbbbbbbcccccccccccccc
then for my first y prediction it will give me b since that is the last value it saw.
Whereas I need it to give me c, since that is the next value that is actually useful to me.

Hi @shado,

There are 2 parameters in SlidingWindow that may be useful to you:

  • horizon = number of future datapoints to predict:
    * 0 for last step in each sub-window.
    * n > 0 for a range of n future steps (1 to n).
    * n < 0 for a range of n past steps (-n + 1 to 0).
    * list : for those exact timesteps.
    In your case, it seems the horizon should be set to 20.

  • y_func = function to calculate the ys based on the get_y col/s and each y sub-window. y_func must be a function applied to axis=1!
    A simple example of y_func would be this where the mode is calculated along axis=1 (in your case the 20 values):

def y_func(o): return scipy.stats.mode(o, axis=1).mode

You may want to take a look at tsai's Data Preparation documentation for more details.

3 Likes

Hey, Could you plz share the dataset link from which you got it

@oguiza do you know the impact of weight initialization for TSAI? I have the feeling that for my data it greatly changes the outcome by simply restarting the script (dataset splits are the same so I guess it must be weight initialization). Are there some suggestions or best practices?

1 Like

Hi everyone! @KevinB and I are playing with Informer for time series forecasting. Both of us are relatively new to the time series stuff.
Many of the results presented in paper are obtained on their own datasets and comparison is done with popular models trained by authors of the paper. It would be nice to compare the results of Informer on some well established benchmark. It would be of great help if someone experienced with the topic could suggest a dataset and reasonable baseline models for forecasting with horizon of 20-100 time steps or direct us in a direction to search.
Thanks!

1 Like

@geoHeil at one point I struggle to produce consistent result over multiple runs and I decided to seed all my runs:

def random_seed(seed_value, use_cuda):
    np.random.seed(seed_value) # cpu vars
    torch.manual_seed(seed_value) # cpu  vars
    random.seed(seed_value) # Python
    if use_cuda: 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

random_seed(77, True)

Not sure if is the best way though. But I get the exact numbers in training and results for every run. Seeding allows me to do other hyperparameter tuning without concerns about the results improvements coming from all the random factors.

1 Like

Many thanks!
This is very very useful (to at least get fully reproducible results) as a starting point.

1 Like

New interesting time series paper in the field of self-supervised learning!

Learning Timestamp-Level Representations for Time Series with Hierarchical Contrastive Loss.

The code is available here:

4 Likes

@oguiza @muellerzr @vrodriguezf
Hi everyone, many thanks for this fantastic library, that has helped me a lot on my studies on time series.

I am facing a challenge and would like to see if there is an interesting solution using tsai.

I have a small dataset, containing different sizes wîdows of data for four different activitiies. The dataset is invalanced, class A has only four chunks, class B has 5, class C has 30 chubks and class D has 35.

I need to balance the data and also to generate a lot of other chunks or surrogateS data. In order to apply gmaf and then use Deep learning.

Does the Tsai, library has some ways of doing this?

Thanks for your feedback @el3oss! I’m really glad tsai has helped you with your time series tasks.

As to the dataset you mention, I’m not sure I understand your challenge. We might be able to help if you provide a more visual example or maybe a gist. I’m not share what you mean by chunks and the different sizes of windows. Do you mean instances or samples of each class?
Also, what do you mean by gmaf?

Hi @oguiza,
many thanks for your feedback.
By chunks I mean windows of data from triaxial accelerometer (X,Y,Z), which corresponds to a data acquisition during a certain time.

There are different participants performing a specific rehabilitation activity consisting of gripping an object and putting it somewhere else (a total of 78 trials), and each of this trial is labeled with a performance score that can be: 0, 1, 2, 3 depending on the performance
you can see an example of a chunk for the x-axis of an accelerometer for one trial below:
Capture
This particular activity lasted for 2 seconds but the length differ from trial to another, I will pad all of them with zeros so they have the same length later on.
The issue is that the data is very small as you can see on the table below:
f
so as you can see:

  • Dataset is unbalanced
  • The number of chunk/class is very small.
    Since I cannot do new acquisition, I thought about creating some artificial data in order: (1) to balance the classes and (2) to augment the dataset size maybe 100 times or more, so that I can encode the different windows to images and classify them using some DL algorithm.

Are there any ways to do this with Tsai?

Thank you

I think there is something broken in the imaging formats. For the tutorial in the Univariate time series explanation, they all come out wonky, and even on my personal data as well