Time series/ sequential data study group

oguiza · November 14, 2018, 1:15pm

Wow, I think those are excellent ideas!
All perfectly aligned with my goals for this course:

Learn new DL approaches, techniques, resources, tricks, etc helpful to achieve state of the art results, in particular as they apply to time series (TS).
Practice/ code as much as possible.
Gain confidence in the use of DL.
Get some concrete examples of how these techniques can be applied to TS (notebooks). Develop some proof of concept of how these techniques could be applied.
Create a TS study group where we can all share our experience and benefit from each each other. Hopefully we could maintain this collaboration beyond this course.

Based on feedback so far, I’ve created a repo proposal where we can collaborate in an organized way. I’ve create a mock-up for you to review and comment.
I’ve structured it in 2 sections:

1. Practical DL applied to TS notebook collection: here we could share our own notebooks.
2. DL applied to TS resources: third party resources: this would be used to upload useful third party resources like repos, webs, papers, etc.

The idea is that all those interested could collaborate in the creation and maintenance
of the repo. I think this could provide a good learning opportunity for many of us.

Please, let me know what you think.

oguiza · November 14, 2018, 1:24pm

By the way, I’ve included a few references to papers as an example of how we could build a library. I have a list with other papers also interesting. Something I think is very useful is the fact that these ones comes with the code in a github repo, so they should be relatively easy (?) to replicate/ apply to other problems.

mb4310 · November 14, 2018, 1:47pm

I have put together a notebook illustrating approaches (iii) and (iv) from my earlier post on TSC datasets (happily I have found they are all the same format, so the same notebook should work by just switching the name of the dataset around!) I am debugging it this afternoon and adding some comments so it makes sense.

What I have found is the LM/forecaster pretrain approach doesn’t really work at all on datasets where the number of time-series is very small or the length of the time-series is very small. Generally at work, I apply this technique on problems where i have hundreds or thousands of datasets to train off.

I will work on adjusting the datasets I’ve defined so they work seamlessly on multivariate time-series too. Going to be looking into some of those CNN approaches next applied to wavelet transforms of the data. Will keep you updated, cheers!

rpicatoste · November 14, 2018, 1:56pm

Hi! I think this thread is great as well!! I am working with time series which are signals coming from devices in a sort-of industrial complex, with the idea of finding anomalies in their behavior in order to do preventive maintenance before those can provoke further problems.

I am training NN based on LSTM with devices that are healthy and using predictions from this network with data of the devices to check if the predictions are ok or if they deviate from what the NN learned (there is an anomaly).

My starting point was this paper and master thesis from KTH:

I like this approach more than a classifier because for a classifier we need labelled data. And it is very difficult to get proper labels in this kind of problem: failures can be very different, and you cannot know when the device starts to behave anomalously before you can appreciate it just by looking at the signals. Not to mention that normally there are not that many examples of the broken behaviour.

Another approach that looks interesting is this: LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection, but I didn’t have the time to try it.

henripal · November 14, 2018, 2:00pm

Hi! Have been lurking here mostly but v. interested in this. Have done a little TS work, a lot without DL and a little with DL.

I’d very importantly add that the repo needs non-DL benchmarks

traditional stats (SARIMAX, …)
boosting methods (these were dominating the leaderboards for the Wikipedia Time Series Kaggle comp)
other non DL ML methods (FB’s prophet is a good example)

oguiza · November 14, 2018, 2:12pm

Thanks for your replies.
I don’t have any problem extended the scope of the group to non-DL approaches. That’s fit the ‘Time series/ sequential data study group’ as initially stated. I’ll go ahead and change the repo name to reflect this change.
The only thing I’d like to highlight though is that I’d like to keep this very practical. And I think the best way to achieve this is by making sure that entries to the repo have code attached. So basically are our own notebooks that we have created, or third party notebooks, repos, etc.
I’d prefer to steer away from other approaches that are not so practical where you see lists of papers without any code. Those are many times interesting, but not as useful as having code we can quickly implement.
If we agree that this proposed approach, we could really end up with a pretty good collection of notebooks benefiting from the knowledge and experience of others.
What do you think?

mb4310 · November 14, 2018, 3:06pm

Hi guys, my notebook is ready but I need push permission to upload it to the repo. Anyone can help with that please (I don’t have much experience with git, lol).

oguiza · November 14, 2018, 4:04pm

Wow, that was quick!
I don’t have much experience either.
For those with more experience, ,how should we do this? What’s the best way to have other upload their work to a repo? Grant them collaborator status?

tukun · November 14, 2018, 4:06pm

Very interesting. I work in similar problem space. Will go through the papers!

mb4310 · November 14, 2018, 5:23pm

@oguiza Check out this link :

My github username is mb4310 can find it here https://github.com/mb4310. Thanks!

Daniel.R.Armstrong · November 14, 2018, 5:47pm

I love the project! I see two different issues that might make this process a little harder than it needs to be. The first one is the whole git process, and the other is the notebook stripout that needs to be done, apparently it can get very messy if you don’t strip-out the notebook when you push them to github.
here is an article about using using notebooks and git.
fast.ai has a nice tutorial on notebook stripout on how to do it for the fast ai library, I am assuming that we could modify that for your repo.
If you are interested in a fairly easy process to get started with git, I made a tutorial for fast.ai(don’t use this for notebooks) how to create a PR(pull request).

I think it might be easier to have one person update the notebook in the repo,and everyone that wants to make additions, could create a gist and share it here, the maintainer of the repo could update the notebook manually. How to create a gist

It might be even easier to use use google colab instead of github. Everyone could just share the file. There are limitation but it is far less complicated for beginners.

marcmuc · November 14, 2018, 6:41pm

Speaking of non-DL methods: Can anyone here point me to good ressources (preferably runnable code ) wrt (Hidden) Markov Models for time series prediction?

oguiza · November 15, 2018, 5:47am

I’ve already added you as a collaborator @mb4310.

I’ve read your replies and here’s my proposal to make it as easy as possible for anyone to contribute.
If you want to collaborate with a notebook I guess the easiest thing to do would be:

Create a gist.
Ask me in this forum to grant you collaborator status to the shared repo (if you don’t have it yet)
Post the link to your gist in the README.md doc in section 1 of the repo.

In case you want to share some third party work:

Ask me in this forum to grant you collaborator status to the shared repo (if you don’t have it yet)
Post the link to your source docin the README.md doc in section 2 of the repo.
But if anyone has a better idea, please let us know!

oguiza · November 15, 2018, 6:18am

I certainly can’t Mark. I’m not familiar with this technique.

you got the message!

mb4310 · November 15, 2018, 2:41pm

Hi guys, I’ve created two gists but for me they are not rendering when I view the page; either way you should be able to download the notebook and view it locally.

update: find both files here https://github.com/mb4310/ts ; the gist was not working

The first demonstrates how ULMFiT-type approach works in the context of time-series classification. A couple of comments: I have not experimented extensively but have found the results are pretty much consistently worse across the board than the convolutional approach, perhaps because there is not enough time-series to train a strong forecaster on (my experience is approach works well on domains with 1000s of training time-series that are quite long). Anyways hope you find it interesting!

The second repackages the work done by @oguiza using a CNN and transformation for classification. I have included it because it automates the transform process and works seamlessly on any of the UCF univariate time-series ‘out-of-the-box’ (you basically just give the name of the dataset and the transformation you want and run three cells) in case anyone would like to experiment further and compare results on different datasets.

I am working on experimenting with an approach similar to that taken in the papers shared by @rpicatoste above; basically train a separate forecaster for each separate class, and given a new time-series have all the forecasters try to predict, measure the errors and either train a classifier on the different error timeseries or else just pick the class with the lowest MSE. Should have that up by later tonight/tomorrow.

Next I want to add continuous wavelet transform to the list of transforms available in the CNN notebook just to see how it compares. The problem is there are some parameters you generally have to tune which will change from problem to problem and I’m trying to find a good “rule of thumb” so that the notebook is still easy to run without having a background in sig proc.

Finally I’d like to add the following functionality to the transform notebook which I think will ultimately yield the best results: multi-image classification. So take exactly the lesson-1 approach but instantiate several resnets (well, minus the last few layers), feed each one a separate image, concatenate the output of the cores and train a classifier on top of the concatenation. This will require tinkering a bit with the fastai library or else just some ugly hacking and it could be a couple days.

Anyways, welcome any and all feedback— hope someone finds some of this helpful, cheers!

EDIT: The gist was not loading properly even when I downloaded the data and try to run locally… It worked it seems when I uploaded it directly to a git repo so I have uploaded it to mine, if you guys would like I can upload it to the repo created by @oguiza as well.

marcmuc · November 15, 2018, 5:02pm

Thanks for sharing that! One question: I think I don’t quite understand where you are going with the Multiimage Multinet Classifier?! What would be the seperate images? Could you explain a little more?!

Daniel.R.Armstrong · November 15, 2018, 5:02pm

Thank you for sharing when you did the ULMFiT did you see the error rate change as you trained the model, when I look at the notebook is shows the same error rate for each epoch.

In your plan to add have it be multi images, is that so it will work for multivariate analysis? Would that be like covering each categorical variables to a seperate image? Would that make is a higher dementional tensor, or would it be wider/longer?

Very interesting! Thank you for sharing?

mb4310 · November 15, 2018, 5:14pm

@marcmuc @Daniel.R.Armstrong
Multi-image would let us take a univariate time-series, do multiple transformations (a wavelet transform into time-frequency domain, a gramian angular field to capture temporal interaction, a recurrence plot, etc.) and feed all of this information into a connected sequence of resnets. The input would be have more channels (I would just stack them so instead of 3 channels, 2 pictures would be 6 channels, and separate them out to feed each one into a different resnet, then concat their outputs upstream). Also yes as Daniel pointed out this would enable us to extend the technique seamlessly to multivariate time-series as well.

As for the ULMFiT results, I just ran the notebook with some random hyperparams to make sure it worked on a different dataset (the point is you can jump from task to task in the SCP datasets). You can see the model is completely ineffective (it predicts everything is 1s) in that example, the point of the notebook is to share the code so others can play around with it and not demonstrate a particular result.

edit: as an aside, the ULMFiT part doesn’t seem to help very much with most of these datasets (it typically achieves the same accuracy as the un-aided classifier, but sometimes more quickly), I am exploring the idea of passing on not the output of the cores but their last hidden state or some pooling of their hidden states which might help capture “context” maybe? I have a theory about why it’s not helping but I have to do a lot more experimentation to hash it out. In the meanwhile I am just trying to set up the pavement to get all these different kinds of methods going easily.

I will keep you posted on specific results once they start coming in!

Daniel.R.Armstrong · November 15, 2018, 5:55pm

Please excuse my naeivete, a granian angular field capture the effect of time, on the output, right? Can this be used to capture the interaction with other features besides time? And a recurrence plot measures nonstationarity, so that would show how unstable the time series is?

I am trying to understand how I could use the CNN approach to non continuous time series like sales data, like the rossmann kaggle data set.

mb4310 · November 15, 2018, 7:40pm

Recurrence plots don’t just describe nonstationarity, it can detect periodicity (and lack thereof) of a dynamical system. It takes a point in time t1 and asks what are the other points in time (t_i) when the object has gotten very close to where it was at time t1, so in some sense this captures -strict- patterns in time (when the value in the future is VERY close value now, the corresponding pixel is black, else it is white).

Gramian angular field does a similar thing but instead of being black-and-white the value is continuous with the strength of the interaction at two points in time being measured by an inner product (which loosely speaking an inner product measures how much two vectors overlap), so it can encode more subtle information (examples of gramian angular field tends to be much more ‘colorful’ than recurrence plot) but is also vulnerable to noisy data.

I am not sure how to answer your question about whether you can use these techniques for analyzing relationship between features other than time and some output. I have thought about it but I am having difficulty articulating the conclusions in a coherent way haha but am happy to discuss on skype or something. Anyways hope that helped somewhat.