Time series/ sequential data study group

Hi, I’m very interested in such a series of competitions.

For the future, I’d like to mention another potential source for datasets: a relatively high profile competition for forecasting is organized every year. The last one, M4 was run on a dataset of 100,000 different time series, at different intervals and of different natures. You can read more about the dataset here.

@tukun previously mentioned the results of this competition and the fact that although the leaderboard was dominated by pure statistical methods, the winner was using a hybrid ES-RNN method.

6 Likes

Hi @oguiza,

I will participate! I appreciate your comments! The choice to take 3 subsequences is just to be efficient with preprocessing, Jeremy explains this well in a video of part1 of v2 of his course on NLP. Because the RNN doesn’t just take timesteps 1 to N-1 and output the value at step N, it outputs a sequence of length N-1 (which it’s aiming for to predict timesteps 2 to N); and the RMSE is computed all N-1 of those values. The reason the timeseries is chosen to be cut in 3 pieces (say) is because if we fed the whole timeseries in at once it can be memory intensive to back propagate errors through very many time steps (and if we decide that information from very long ago is not relevant then it should not matter how the state was updated back then). So basically picking every possible subsequence of length N-1 and predicting the Nth is equivalent to the setup we have except just running more epochs. I will try to find the bit of the clip from that video where Jeremy explains this. For a long time when I was working with training RNN I would do exactly as you suggested and extract every sequence of a fixed (or slightly random) length and predict only the next timestep and compute RMSE on only that one prediction . I found this did not improve the ultimate loss on the model, caused the model to train significantly slower, and was much more memory intensive (the dataset becomes huge!).

Now I have a problem I was hoping you could help me investigate. Maybe @marcmuc or someone else in the thread would be kind enough to help as well. I’m attaching a notebook which demonstrates the problem. This is in regards to my idea of implementing multi-image classification. Say you have two (224x224) RGB images and batch_size is one, we can take the two tensors (1,3,224,224) and concatenate them into one tensor (1,6,224,224) where the first 3 channels is fed into one resnet core, the next 3 is fed into a second, and the output is concatenate to be fed into a linear classifier. I notice you have had very good success with combining 3 images for the olive oil classification task, and your method involves sending each as a greyscale image (one in each channel) is that correct? I think this makes sense since the color here is not relevant (its generated by a somewhat arbitrary heatmap for our GADF/RP/MTF etc.) However I still want to experiment with my setup because it will allow us to use arbitrarily many images (within limit of memory) and not limit us to 3 channels. I am attaching a notebook which demonstrates the problem; long story short I have defined the “stacked resnet core” architecture and the “stacked images” dataset and I’m ready to get started with testing except for one big problem… the architecture doesn’t seem to work on GPU for some reason! As you can see in the gist, when I pull the model back onto the CPU it works fine but when it’s on GPU it throws an error. So If anyone has any ideas about what’s going on here… would greatly appreciate it!

EDIT: @marcmuc has solved the problem :slight_smile: thank you very much! Will post a working notebook with some results a little later this week; will take a couple days off for the holiday, cheers everyone!

2 Likes

I’m glad you day you’ll participate!
thanks for the explanation on the RNNs. I didn’t understand you were predicting a sequence. It makes sense.
On the multi-image classification I think it’s a great idea. I was discussing something along those lines, but don’t know enough Pytorch yet to be able to implement this type of approach. But it makes all the sense to be able to combine multiple images and feed that into a NN. I guess there is an alternative approach to your idea, that is to combine the images first, pass them through a convolutional layer, and then load the activations into a resnet model.
I have no idea what’s the issue with your implementation. I have not had any issue, but have only worked with the official prepackaged learn and resnet. I’ve just created images stacking up to 3 arrays as you describe (ts —> encoder —> 1D array —> 1 channel). Sorry I can’t help (yet!)

Excellent Henri!
I agree with you that we’ll need to increase the complexity of the time series challenge to make it more interesting. (Although it won’t be eacy to beat the SOTA, I would guess…)
I just thought it’d be good to start with a relatively simple (univariate) time series problem, and then move on to more complex (multivariate) tasks. We’ll see how this ‘learning competition’ goes, and we can then decide which one to do next.

1 Like

Ok, guys, so based on feedback from @henripal and @mb4310 to my message on the time series ‘learning competition’, and the :blue_heart: from @marcmuc and @Daniel.R.Armstrong I’ll go ahead and create another thread for the Eathquakes problem with the rules I mentioned before. In this way other people who don’t necessarily visit this thread will also see it and might get involved. But at least 5 of us might participate, and hopefully learn from each other. Let’s see how it goes!

1 Like

@mb4310,
I’ve just created a thread to launch the TS learning competition. I was thinking that since you have created a pretty useful notebook to be able to extract data from UCR, you might want to create a gist with the minimum content so that participants can easily upload data into their notebook. It’s just so people don’t waste their time when something is already created. What do you think? At the very least, you’ll have a like from me!..

2 Likes

I will do that when I get home from work this afternoon!

Great! Thanks!!

I will also be interested.

Great @tukun! Thanks! Looking forward to reading your posts!

1 Like

Hi group,

I was thinking that in order to make it a more interesting experience, we should reduce the timeline of this competition. We could relabel these competitions and call them ‘fast learning competitions’, like the do in chess with ‘fast chess’. Are you ok if we reduce it to just one week since announcement? Otherwise I believe participants will relax a bit, and it won’t be such a good learning experience as I think it could be.
In this way we could have a new competition every week, and will thus be able to learn and share more about time series.
Agree? If so please click like, otherwise share your thoughts.

I’d also like to ask you please send your first posts to the learning competition to make this really interesting. I’m sure we we all have something to learn and contribute.

1 Like

I guess you had issue and couldn’t create the notebook. Since I’m a few hours ahead of you, I’ve created one and loaded to a repo I have created for this competition.
I have taken inspiration from the CNN notebook you created, although I didn’t include anything related to images, as this is just one of the multiple approaches you could take to this problem.
Thanks in any case!

Proposed TSSD study group goals

Hi all,

Based on some of the insights we are getting from the Earthquakes competition, I think it’s a good time to start a discussion on what the ultimate goals of this TS projects should be.
I can share what I’d personally like to achieve, but please feel free to challenge, add your goals, agree, etc.

Goals:

  1. Identify a series of proven, reproducible, state of the art techniques with TS data that can be used on other datasets in the real world (this is what we can achieve with competitions)
  2. Generate and test new ideas to improve the current state of the art.
  3. Create a notebook/ collection of notebooks containing identified best practices.
  4. Develop the TSSD study group members expertise in this area.

Stretch goals:

  1. Maintain/ increase community of people interested in TSSD beyond this course.
  2. If we can generate a solid notebook/s with best practices in this area, we could propose an expansion of the current fastai library to add time series. Now there are 4 applications: vision, text, tabular and collaborate. I don’t know if Jeremy would be interested in adding time series to the list. In that way TS best practices/ state of the art techniques could be applied with fastai out of the box.

Please, let me know your thoughts.

5 Likes

Large benchmark Time Series Classification studies: summary

After reading @henripal’s post I have performed a review of that paper and 4 other recent Time Series Classification papers that are very interesting:

  1. The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version (Bagnall, 2016)
  2. Deep Learning for Time-Series Analysis (Gamboa, 2017)
  3. Deep learning for time series classification: a review (Fawaz, 2018)
  4. Proximity Forest: An effective and scalable distance-based classifier for time series (Lucas, 2018)
  5. Is rotation forest the best classifier for problems with continuous features?(Bagnall, 2018)

I’ve prepared a summary for me that I thought I could share.

Summary:

  1. 1-NN dynamic time warping with a warping window set through cross-validation (DTW) has been extremely difficult to beat for over a decade, but it’s no longer considered state of the art.

  2. SOTA TSC algorithms for univariate problems:
    1.1. HIVE-COTE: current state of the art, but hugely computationally intensive. It combines predictions of 35 individual classifiers built on four representations of the data. Impractical in many problems. The HIVE version uses a hierarchical vote.
    1.2. Resnet: same performance as COTE but much faster python code

  3. Other interesting TSC algorithms in these studies:
    2.1. Shapelet Transform (ST): extracts discriminative subsequences (shapelets) and builds a new representation of the time series that is fed to an ensemble of 8 classifiers. While it is considered a state-of-the-art classifier, it has little potential to scale to large datasets given its training complexity. python code
    2.2. BOSS (Bag-of-SFA-Symbols): forms a discriminative bag of words by discretizing the TS using a Discrete Fourier Transform and then building a nearest neighbor classifier with a bespoke distance measure. It is of limited use on large data sets as it has a high training complexity. The authors produced a similar approach with improved scalability, the BOSS in Vector Space (BOSS-VS). The same authors recently proposed WEASEL, which improves on the computation time of BOSS and on the accuracy of BOSS-VS, but has a very high memory complexity (it doesn’t scale beyond 10k TS) python code
    2.3. Proximity Forest (PF): new algorithm presented in Aug 2018. It is similar to Random Forest but replaces the attribute-based splitting criteria by a random similarity measure java code. I don’t think there is any python code yet.
    2.4. Rotation Forest (RotF): an algorithm that has recently been used with very good results in TSC. I’ve contacted Dr. Bagnall, and he’s sent me this python code. They are still trying to optimize code.
    2.5. Fully Convolutional Network (FCN): python code
    2.6. Encoder: whose architecture is inspired by FCN with a main difference where the GAP layer is replaced with an attention layer python code

  4. The studies are inconclusive as to best algorithms to use in multivariate TS due to the small number of datasets used. However, FCN, Encoder, and Resnet also seem to work well.

  5. Most non-DL state-of-the-art algorithms do not scale to large time series datasets. This still needs to be confirmed with Proximity Forest and Rotation Forest.

20 Likes

Hi group,

This sounds like great goals. I am quite new to time series forecasting and still surprised how this seemingly simple task can be difficult and different (univariate vs. multivariate, short vs. long, well predictable vs mostly noise, seasonal, with change points, …).

One another great dataset and source of information not mentioned above (correct me if I am wrong) is Kaggle’s Wikipedia Traffic Forecasting Challenge. There are two reasons why it is exceptional: 7 out of 8 best solutions use deep learning (the remaining one is a very smart application of Kalman filtering). And if you look into a discussion, many of them are well described there. I wonder how much work would it be to create a pytorch/fastai simple solution like this one.

My another interest (as somebody coming originally from statistics & probability) is when it is actually a good idea to use deep learning for time series forecasting. It is not a simple question and many ML applications here are questionable, as is shown in this highly accessed PLOS One paper:

Statistical and Machine Learning forecasting methods: Concerns and ways forward

8 Likes

A question for @mb4310,

I have attempted to play with https://github.com/mb4310/ts/blob/master/CNN.ipynb on my lilliput GPU of 4 GB
It runs into OOM for most image sizes + batch sizes.
What is the size of GPU for your playground?
I apologize if this was mentioned elsewhere.
Thanks

If I remember correctly I believe he was using a 1080Ti

Thanks @whamp

But the same applies here as in Jeremies notebooks, you can always reduce the batchsizes if you have memory problems and if that is not enough, reduce the image size (with potential effects on possible accurracy in case of ts-2-image usecases). You also have to see that @mb4310 s approach stacks 3 resnets, so that is pretty “expensive”, so on a 1050 that is maybe not the ideal arch. :wink: But I still think it is an interesting idea to play around with…

Thanks for linking the kaggle datatset and the paper. Just wanted to point out that 90% of what is currently being discussed here in the channel is time series classification, not time series forecasting. Which at least in most cases is something very different. But I would also be interested in looking at ts forecasting more closely, especially multistep forecasting.