Time series/ sequential data study group

Are you guys mainly interested in classification applications or also forecasting? Can anyone here recommend good resources for ts forecasting or also multivariate forecasting, or forcasting univariate but using multivariate inputs with LSTMs , NNs etc.? There is so much stuff out there for ARIMA etc…

3 Likes

I am working on Predictive Maintenance use cases where we try to predict the failure of various kinds of devices in advance (days to weeks). It’s mostly a multivariate classification problem with tabular data where based on various factors (Temporal Data : devices performance parameters on a particular day, days since last repair, age of the device, external weather condition etc. Static Data : Model of the device, geographical location of the device etc), the model tries to predict if the device is going to fail within next X days (Label : 1 ) or not (Label : 0). Would be interested to understand how Deep Learning can be used for such use case. Not sure if there is any comparative study about Deep Learning vs Tree based approaches for such use case. Will be happy to hear from others.

8 Likes

Could you please describe the nature of the public dataset you are looking for?

I have not gone into the details, but Uber has a nice blog about the very same topic (Link). The result is little surprising:

…pure machine learning and neural network (NN) methods performed worse than standard algorithms like ARIMA or Exponential Smoothing (ES), and still worse against various combinations of these base statistical methods. However, the winner of the competition, with a solid margin, was Slawek’s hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It mixes hand-coded parts like ES formulas with a black-box recurrent neural network (RNN) forecasting engine.

8 Likes

Electrical energy consumption and generation data for different households, businesses, offices, industries as well as PV, wind and CHP generation on an at least 15min readings level, although as mentioned 1h would also be okay for demonstration purposes. I am aware of the London households datasets, and then there are several datasets from utilities or whole countries for data consumption, but I am looking for installation level / site level data.

2 Likes

Very good thread! Very interesting to learn about using CNNs to classify time-series from images, will definitely give that a go myself.

Personally I work with financial data through my institution and cannot share any of the datasets, but since watching v2 of this course some time ago I have adopted Jeremy’s ULMFiT approach to time-series classification. Briefly, the idea behind ULMFiT is before training a classifier (on some NLP task), one trains a language model (essentially a forecaster, a model which takes a sequence as input and tries to predict the next item in the sequence). Once the LM is strong enough, one slaps on a linear classifier to the end and fine tunes it for the classification task.

This method works exactly the same way for time series and I have used the method for improving classifiers that were trained from scratch in the domain of risk management (probability of default on loan, data consists of individuals deposit activity); asset pricing, and cash-flow optimization. I will try to take a look at some of those datasets in the link @oguiza was kind enough to share and put together a notebook illustrating the procedure.

I think it would be interesting if we try to put together a notebook as a team that compares some of these different methods:

i) CNN classification of un-altered time-series image
ii) CNN classification of transformed time-series image (Gramian Angular Field or maybe a 2d plot in time-frequency domain after wavelet transform… just a thought, see https://www.mathworks.com/examples/wavelet/mw/wavelet-ex11554099-continuous-wavelet-analysis for an example of what this might looks like)
iii) Direct RNN-based approach without ULMFiT pretraining
iv) RNN-based approach with ULMFiT
v) CNN based approach directly on time series with 1d convolution along the temporal dimension

Happy to add more to the list! A good place to start would be agreeing on one or two datasets/tasks and implementing one (or more) of the above methods and sharing a jupyter notebook here, then as we go someone can combine all these into one notebook and once there we an use this to quickly pit these methods against each other on wide range of tasks.

28 Likes

There was a competition in DrivenData (in collaboration with Schneidier) for forecasting energy consumption (Link). As per the data description, it had site level data at an interval of 15 minutes. But, unfortunately, the competition is closed now and the dataset is not also available. Not sure if there is any other source to get the data.

2 Likes

Wow, I think those are excellent ideas!
All perfectly aligned with my goals for this course:

  • Learn new DL approaches, techniques, resources, tricks, etc helpful to achieve state of the art results, in particular as they apply to time series (TS).
  • Practice/ code as much as possible.
  • Gain confidence in the use of DL.
  • Get some concrete examples of how these techniques can be applied to TS (notebooks). Develop some proof of concept of how these techniques could be applied.
  • Create a TS study group where we can all share our experience and benefit from each each other. Hopefully we could maintain this collaboration beyond this course.

Based on feedback so far, I’ve created a repo proposal where we can collaborate in an organized way. I’ve create a mock-up for you to review and comment.
I’ve structured it in 2 sections:

1. Practical DL applied to TS notebook collection: here we could share our own notebooks.
2. DL applied to TS resources: third party resources: this would be used to upload useful third party resources like repos, webs, papers, etc.

The idea is that all those interested could collaborate in the creation and maintenance
of the repo. I think this could provide a good learning opportunity for many of us.

Please, let me know what you think.

19 Likes

By the way, I’ve included a few references to papers as an example of how we could build a library. I have a list with other papers also interesting. Something I think is very useful is the fact that these ones comes with the code in a github repo, so they should be relatively easy (?) to replicate/ apply to other problems.

2 Likes

I have put together a notebook illustrating approaches (iii) and (iv) from my earlier post on TSC datasets (happily I have found they are all the same format, so the same notebook should work by just switching the name of the dataset around!) I am debugging it this afternoon and adding some comments so it makes sense.

What I have found is the LM/forecaster pretrain approach doesn’t really work at all on datasets where the number of time-series is very small or the length of the time-series is very small. Generally at work, I apply this technique on problems where i have hundreds or thousands of datasets to train off.

I will work on adjusting the datasets I’ve defined so they work seamlessly on multivariate time-series too. Going to be looking into some of those CNN approaches next applied to wavelet transforms of the data. Will keep you updated, cheers!

2 Likes

Hi! I think this thread is great as well!! I am working with time series which are signals coming from devices in a sort-of industrial complex, with the idea of finding anomalies in their behavior in order to do preventive maintenance before those can provoke further problems.

I am training NN based on LSTM with devices that are healthy and using predictions from this network with data of the devices to check if the predictions are ok or if they deviate from what the NN learned (there is an anomaly).

My starting point was this paper and master thesis from KTH:

I like this approach more than a classifier because for a classifier we need labelled data. And it is very difficult to get proper labels in this kind of problem: failures can be very different, and you cannot know when the device starts to behave anomalously before you can appreciate it just by looking at the signals. Not to mention that normally there are not that many examples of the broken behaviour.

Another approach that looks interesting is this: LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection, but I didn’t have the time to try it.

10 Likes

Hi! Have been lurking here mostly but v. interested in this. Have done a little TS work, a lot without DL and a little with DL.

I’d very importantly add that the repo needs non-DL benchmarks :slight_smile:

  • traditional stats (SARIMAX, …)
  • boosting methods (these were dominating the leaderboards for the Wikipedia Time Series Kaggle comp)
  • other non DL ML methods (FB’s prophet is a good example)
5 Likes

Thanks for your replies.
I don’t have any problem extended the scope of the group to non-DL approaches. That’s fit the ‘Time series/ sequential data study group’ as initially stated. I’ll go ahead and change the repo name to reflect this change.
The only thing I’d like to highlight though is that I’d like to keep this very practical. And I think the best way to achieve this is by making sure that entries to the repo have code attached. So basically are our own notebooks that we have created, or third party notebooks, repos, etc.
I’d prefer to steer away from other approaches that are not so practical where you see lists of papers without any code. Those are many times interesting, but not as useful as having code we can quickly implement.
If we agree that this proposed approach, we could really end up with a pretty good collection of notebooks benefiting from the knowledge and experience of others.
What do you think?

7 Likes

Hi guys, my notebook is ready but I need push permission to upload it to the repo. Anyone can help with that please (I don’t have much experience with git, lol).

1 Like

Wow, that was quick!
I don’t have much experience either.
For those with more experience, ,how should we do this? What’s the best way to have other upload their work to a repo? Grant them collaborator status?

Very interesting. I work in similar problem space. Will go through the papers!

1 Like

@oguiza Check out this link :

My github username is mb4310 can find it here https://github.com/mb4310. Thanks!

1 Like

I love the project! I see two different issues that might make this process a little harder than it needs to be. The first one is the whole git process, and the other is the notebook stripout that needs to be done, apparently it can get very messy if you don’t strip-out the notebook when you push them to github.
here is an article about using using notebooks and git.
fast.ai has a nice tutorial on notebook stripout on how to do it for the fast ai library, I am assuming that we could modify that for your repo.
If you are interested in a fairly easy process to get started with git, I made a tutorial for fast.ai(don’t use this for notebooks) how to create a PR(pull request).

I think it might be easier to have one person update the notebook in the repo,and everyone that wants to make additions, could create a gist and share it here, the maintainer of the repo could update the notebook manually. How to create a gist

It might be even easier to use use google colab instead of github. Everyone could just share the file. There are limitation but it is far less complicated for beginners.

7 Likes

Speaking of non-DL methods: Can anyone here point me to good ressources (preferably runnable code :wink: ) wrt (Hidden) Markov Models for time series prediction?

I’ve already added you as a collaborator @mb4310.

I’ve read your replies and here’s my proposal to make it as easy as possible for anyone to contribute.
If you want to collaborate with a notebook I guess the easiest thing to do would be:

  1. Create a gist.
  2. Ask me in this forum to grant you collaborator status to the shared repo (if you don’t have it yet)
  3. Post the link to your gist in the README.md doc in section 1 of the repo.

In case you want to share some third party work:

  1. Ask me in this forum to grant you collaborator status to the shared repo (if you don’t have it yet)
  2. Post the link to your source docin the README.md doc in section 2 of the repo.
    But if anyone has a better idea, please let us know!