IIRC @tcapelle has some work on image sequence classification with fastai2 for the UCF 101 dataset
Greetings and thanks a lot for the great work. These days transformer has been influential for NLP tasks and sequential data, and wonder if you guys have any thought on it’s application for time series applications, i.e. classification and forecast. Thank you.
ICLR 2021 gives us some interesting advances in Transfomers for Multivariate Time Series representation learning:
The proposed approach seems to outperform ROCKET in many regression & classification datasets, and I guess that, unlike ROCKET, here you can also obtain interpretability insights from the trained model.
Thanks a lot for sharing Victor!
It looks very interesting. I’ll surely take a closer look as soon as I find some time.
It’s a shame though there’s no code attached. I’d really like to have a good performing implementation of a Transformer (I have one in the new version of timeseriesAI that I’m about to release, but haven’t been able to achieve good results on my own datasets). I wonder if they have at least shared enough details to be able to replicate their architecture.
That’s a good point. I was very suprised of not seeing a Github repo linked in the paper, fortunately it is becoming a common thing at hgh level conferences like this one. Hopefully they will update the paper and include it!
I’m trying to use the AWD-LSTM on my own dataset, but I’m having probelms with the TextDataLoader. I studied this example, and formatted my own data to be similar.
In my paper space notebook, I first test everything with a slighly modified version of the example and it works fine.
But the when I generate my own data, formatted in the same way, I get “Could not do one pass in your dataloader, there is something wrong in it”.
dls.one_batch() results in "AttributeError: ‘tuple’ object has no attribute ‘shape’ "
#Imports import pandas as pd from fastbook import * from fastai.text.all import * # Create a local csv file of the IMDB example path = untar_data(URLs.IMDB_SAMPLE) df = pd.read_csv(path/'texts.csv') df = df.drop(['is_valid'], axis=1) df.to_csv('URLs.IMDB_SAMPLE', sep=',', index=False) # Working example df = pd.read_csv('URLs.IMDB_SAMPLE') dls = TextDataLoaders.from_df(df, text_col='text', label_col='label') learn = text_classifier_learner(dls, AWD_LSTM) #Functions for dataset generation import random def int_series_as_str(n, a, b): r = str(random.randint(a,b)) for i in range(n): r += ' ' + str(random.randint(a, b)) return r def create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes): data = 'label,text\n' for i in range(n_samples-1): data += random.choice(classes) + ',' data += '"data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end"'+ '\n' data += random.choice(classes) + ',' data += 'data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end' return data #Generate data, and load n_samples = 1000 n_ints = 100 rng_frm = -10 rng_to = 100 classes = ['negative', 'positive'] d = create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes ) file_name = "data.csv" with open(file_name, "w") as text_file: text_file.write(d) df = pd.read_csv(file_name) dls = TextDataLoaders.from_df(df, text_col='text', label_col='label') learn = text_classifier_learner(dls, AWD_LSTM) # try one batch dls.one_batch()
Originally posted here with more background information, but I flagged the post as it was suggested that I should move it here. I edited the post to focus on the main issue I’m having, but I’ll gladly elaborate more on the project here if the my other post get removed and that’s desirable
I got it to work. Problems seemed to araise from to few samples, and the batac size paramater beeing mismatched.
Another paper, this time from IBM research, that showcases the booming of transformers across different data types:
I once worked for a customer on monitoring data looking for unusual events (anomaly dectection).
I tried to use ElasticSearch that has some machine learning and anomaly detection. Studied hard on the theory of seasonality. But at the end i decided to go for a simple solution based on common sense.
Read more about it in my blog:
question on cleaning up the datasets:
Hello guys, I’m trying to get into the world of time series classification, and the forum here has been amazing and enriching for me so far! Thanks so much for that!
I am currently working on a classification project, and with the help of inceptionTime I am getting high success rates on the validation set, but always fail when it comes to the test set.
My hypothesis is that there is “noise” in the training set: mislabled data or data that can not be determined at all to which class it belongs.
My question, regardless of whether my hypothesis is correct is - is there a way to detect such noise in the datasets, and clean it up? In image training, for example, a quick human look at the data can do the job. But it is much more complicated in our case.
I guess there may be a technique of modeling the distance between examples in a dataset - and thus get a picture of the latent space, and then it will be possible to detect anomalies in it. Do you know such a thing?
Works like https://www.researchgate.net/publication/332989762_TimeCluster_dimension_reduction_applied_to_temporal_data_for_visual_analytics can help you out in visualizing a 2d picture of your time series dataset.
Regarding your drop in performance,I would ensure that your validation set reflects the characteristics of the test data, to minimize the surprise you get when moving the model to test
I just wanted to let you know that during the last few weeks I’ve been updating the timeseriesAI/tsai library to make it work with fastai v2 and Pytorch 1.7. I’ve also added new functionality and tutorial nbs that may address some of the issues/ questions raised in this forum.
These are the main changes made to the library:
- New tutorial nbs have been added to demonstrate the use of new functionality like:
- Time series data preparation
- Intro to time series regression
- TS archs comparison
- TS to image classification
- TS classification with transformers
- Also some tutorial nbs have been updated like Time Series transforms
- More ts data transforms have been added, including ts to images.
- New callbacks, like the state of the art noisy_student that will allow you to use unlabeled data.
- New time series, state-of-the-art models are now available like:
- RNN_FCN (like LSTM_FCN, GRU_FCN)
- TST (Transformer)
- mWDN (multi-wavelet decomposition network)
- Some of the models (those finishing with a Plus) have additional, experimental functionality (like coordconv, zero_norm, squeeze and excitation, etc).
The best way to discover and understand how to use this new functionality is to use the tutorial nbs. I encourage you to use them!
You can find the
tsai library here: https://github.com/timeseriesAI/tsai
You’ll be able to clone the repo or pip install the library.
I hope you’ll find it useful.
Wow this is really a good gift for the singles day!!! Thank you so much @oguiza!!! I can’t wait to have a look to all those notebooks.
Look at this paper
Although it does not provide a comparison to state of the art in the field, the idea of doing time series forecasting as a computer vision task looks quite funny and interesting!
Hi vrodriguezf hope you are having a beautiful day!
The above sentence made me laugh as it seems that we at fastai love it, when we can turn things into an image classification problem.
I don’t know if its because its generally the first model we create or if we as humans just love images.
I wonder if Jeremy had taught GANS as the first model, would GANS would be as popular as image classification is on this forum.
Can anybody help me?
I’m new to deep learning and right now I’m stuck in a project I’m doing alone where I want to predict in an EEG pattern if the eyes are open or closed. The problem is that the accuracy is the same from start to end at 64.5%. I’ve tried to change the loss and the bs number but nothing changed. I have no idea what to do. Please help me thanks.
here’s the code(click on it):
I think you have time-series here, but us a tabular learner. This one is, if I understood that correctly, mainly learning embeddings but not the dependencies between the timesteps. If you look carefully your training loss goes down but valid goes up quite a bit. I think you use the wrong model, so it does not work.
The classic thing to try first here is an LSTM model I guess. I did not find a good fast.ai resource, only pure PyTorch, for doing that with sensor data (I have some PyTorch code here: https://github.com/joergsimon/xai-tutorial-april-2020 ). So maybe some more fast.ai experts might help you here.
As a side note: You did put a learning rate more or less at the end of the plot, where we also have a valley. For this tabular learner, you might also try with lr of f.e. 1e-4. But again, I think this is not the right model for the task anyway.
Interesting new (to me at least) package below, which is an implementation of the popular prophet but with pytorch, and the AR-net is using fastai2.
If I interpret your data correctly, you have an EEG with 14 channels and around 15k long. There’s a column for the output. I guess the data is equally spaced and index in time.
If that is the case, the choice you made is not very helpful.
It would be better to use a time series model that takes subsequences of the entire sequence.
But the first thing you need to do is to convert the data you have into samples that can be processed by a time series model.
You may want to take a look at data preparation in the
tsai library. You would need to create an array with shape: [n samples x channels x timesteps] using the
SlidingWindow function. There are examples that show how to use this function.
That output is something you can then use to create a TSDataLoaders and a model like InceptionTime. These approach tends to work much better. Please, let me know if that works.
This tsai preparation library is amazing. This type of library has noodled in my head for over a year. I can’t believe you have done this. Really amazing.