Time series/ sequential data study group

vrodriguezf · July 31, 2020, 10:17am

What implementation are you trying? One of my colleagues improved a lot the performance passing from the fastai version to the Pytorch one. By using generic blocks instead of interpretable ones, and ensembling more or less as explained in the paper (Section 3.4) the results started to be competitive.

boris1 · August 4, 2020, 3:39am

Did your colleague compare N-BEATS to simpler architectures?

I’m using the generic version of N-BEATS. Ensembling helps obviously. Still, in my case, a basic ResNet ensemble slightly outperforms an N-BEATS ensemble.

vrodriguezf · August 4, 2020, 4:49pm

So far we have compared against a single univariate LSTM (not ensembled) and N-BEATS does better than that. However, compared to other multivariate approaches from the literature of that domain of application, N-BEATS is not the best, probably because it is just using one variable

jonadelson · August 8, 2020, 7:58pm

I have a dataset which is composed of several time series. The problem I have is that the columns that identify a time series are sometimes only part of the complete time series. For example, I might have a time series that goes from January to March and then another time series that is a continuation of that from March to May. I don’t have an easy way to identify that one time series is a continuation of the other and was wondering if you all had some suggestions on techniques that could help with automatically connecting the two.

akshat_suwalka · August 13, 2020, 4:37am

Hi, Is there anybody who had used fastai to solve the following problem =>

I have 10 sequenced images of one category and 10 other sequence images for second category and such thing i had for 32 samples.
My aim is to get the model in which i pass input of sequences of image and it will classify for me and tell that ok this sequence of images is of one/second category.

Anybody who had face the similar problem can throw some light on this issue?

vrodriguezf · August 14, 2020, 9:39am

IIRC @tcapelle has some work on image sequence classification with fastai2 for the UCF 101 dataset

willbk · September 4, 2020, 6:40am

Greetings and thanks a lot for the great work. These days transformer has been influential for NLP tasks and sequential data, and wonder if you guys have any thought on it’s application for time series applications, i.e. classification and forecast. Thank you.

vrodriguezf · October 7, 2020, 9:02am

ICLR 2021 gives us some interesting advances in Transfomers for Multivariate Time Series representation learning:

The proposed approach seems to outperform ROCKET in many regression & classification datasets, and I guess that, unlike ROCKET, here you can also obtain interpretability insights from the trained model.

oguiza · October 7, 2020, 10:55am

Thanks a lot for sharing Victor!
It looks very interesting. I’ll surely take a closer look as soon as I find some time.

It’s a shame though there’s no code attached. I’d really like to have a good performing implementation of a Transformer (I have one in the new version of timeseriesAI that I’m about to release, but haven’t been able to achieve good results on my own datasets). I wonder if they have at least shared enough details to be able to replicate their architecture.

vrodriguezf · October 7, 2020, 11:12am

That’s a good point. I was very suprised of not seeing a Github repo linked in the paper, fortunately it is becoming a common thing at hgh level conferences like this one. Hopefully they will update the paper and include it!

JakobV · October 12, 2020, 8:13am

Hello!

I’m trying to use the AWD-LSTM on my own dataset, but I’m having probelms with the TextDataLoader. I studied this example, and formatted my own data to be similar.

In my paper space notebook, I first test everything with a slighly modified version of the example and it works fine.

But the when I generate my own data, formatted in the same way, I get “Could not do one pass in your dataloader, there is something wrong in it”.

dls.one_batch() results in "AttributeError: ‘tuple’ object has no attribute ‘shape’ "

Any suggestions?

#Imports
import pandas as pd
from fastbook import *
from fastai.text.all import *

# Create a local csv file of the IMDB example 
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df = df.drop(['is_valid'], axis=1)
df.to_csv('URLs.IMDB_SAMPLE', sep=',', index=False)

# Working example 
df = pd.read_csv('URLs.IMDB_SAMPLE')
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)

#Functions for dataset generation
import random

def int_series_as_str(n, a, b):
    r = str(random.randint(a,b))
    for i in range(n):
        r += ' ' + str(random.randint(a, b))
    return r


def create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes):
    data = 'label,text\n'
    for i in range(n_samples-1):
        data += random.choice(classes) + ','
        data += '"data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end"'+ '\n'
    data += random.choice(classes) + ','
    data += 'data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end'
    return data

#Generate data, and load
n_samples = 1000
n_ints = 100
rng_frm = -10
rng_to = 100
classes = ['negative', 'positive']

d = create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes )
file_name = "data.csv"
with open(file_name, "w") as text_file:
    text_file.write(d)
    
df = pd.read_csv(file_name)
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)

# try one batch
dls.one_batch()

Originally posted here with more background information, but I flagged the post as it was suggested that I should move it here. I edited the post to focus on the main issue I’m having, but I’ll gladly elaborate more on the project here if the my other post get removed and that’s desirable

JakobV · October 13, 2020, 8:44am

I got it to work. Problems seemed to araise from to few samples, and the batac size paramater beeing mismatched.

vrodriguezf · November 4, 2020, 11:03am

Another paper, this time from IBM research, that showcases the booming of transformers across different data types:

TauvicR · November 4, 2020, 2:22pm

I once worked for a customer on monitoring data looking for unusual events (anomaly dectection).
I tried to use ElasticSearch that has some machine learning and anomaly detection. Studied hard on the theory of seasonality. But at the end i decided to go for a simple solution based on common sense.

Read more about it in my blog:

gprui · November 8, 2020, 8:36am

question on cleaning up the datasets:
Hello guys, I’m trying to get into the world of time series classification, and the forum here has been amazing and enriching for me so far! Thanks so much for that!

I am currently working on a classification project, and with the help of inceptionTime I am getting high success rates on the validation set, but always fail when it comes to the test set.

My hypothesis is that there is “noise” in the training set: mislabled data or data that can not be determined at all to which class it belongs.

My question, regardless of whether my hypothesis is correct is - is there a way to detect such noise in the datasets, and clean it up? In image training, for example, a quick human look at the data can do the job. But it is much more complicated in our case.

I guess there may be a technique of modeling the distance between examples in a dataset - and thus get a picture of the latent space, and then it will be possible to detect anomalies in it. Do you know such a thing?

vrodriguezf · November 8, 2020, 5:39pm

Works like https://www.researchgate.net/publication/332989762_TimeCluster_dimension_reduction_applied_to_temporal_data_for_visual_analytics can help you out in visualizing a 2d picture of your time series dataset.

Regarding your drop in performance,I would ensure that your validation set reflects the characteristics of the test data, to minimize the surprise you get when moving the model to test

oguiza · November 10, 2020, 9:52pm

Hi all,

I just wanted to let you know that during the last few weeks I’ve been updating the timeseriesAI/tsai library to make it work with fastai v2 and Pytorch 1.7. I’ve also added new functionality and tutorial nbs that may address some of the issues/ questions raised in this forum.

These are the main changes made to the library:

New tutorial nbs have been added to demonstrate the use of new functionality like:
- Time series data preparation
- Intro to time series regression
- TS archs comparison
- TS to image classification
- TS classification with transformers
Also some tutorial nbs have been updated like Time Series transforms
More ts data transforms have been added, including ts to images.
New callbacks, like the state of the art noisy_student that will allow you to use unlabeled data.
New time series, state-of-the-art models are now available like:
- XceptionTime
- RNN_FCN (like LSTM_FCN, GRU_FCN)
- TransformerModel
- TST (Transformer)
- OmniScaleCNN
- mWDN (multi-wavelet decomposition network)
- XResNet1d
Some of the models (those finishing with a Plus) have additional, experimental functionality (like coordconv, zero_norm, squeeze and excitation, etc).

The best way to discover and understand how to use this new functionality is to use the tutorial nbs. I encourage you to use them!

You can find the tsai library here: https://github.com/timeseriesAI/tsai

You’ll be able to clone the repo or pip install the library.

I hope you’ll find it useful.

vrodriguezf · November 11, 2020, 8:40am

Wow this is really a good gift for the singles day!!! Thank you so much @oguiza!!! I can’t wait to have a look to all those notebooks.

Best!

vrodriguezf · November 19, 2020, 10:04am

Look at this paper

Although it does not provide a comparison to state of the art in the field, the idea of doing time series forecasting as a computer vision task looks quite funny and interesting!

mrfabulous1 · November 19, 2020, 6:27pm

Hi vrodriguezf hope you are having a beautiful day!

Great post!

The above sentence made me laugh as it seems that we at fastai love it, when we can turn things into an image classification problem.

I don’t know if its because its generally the first model we create or if we as humans just love images.
I wonder if Jeremy had taught GANS as the first model, would GANS would be as popular as image classification is on this forum.

Cheers mrfabulous1