Forecasting and anomaly detection from streaming time series (online learning)

mino · July 12, 2017, 8:41am

Hi,
I’m looking into building a system that receives a continuous data streaming representing a 1D time series (say, some performance metric from a machine once per minute) and output two things:

A forecast for that metric for the next hour
An alert, if the data point we just received is anomalous (e.g., a probability value for the last value being close to what we expected)

As this is an online-model which get updated at each new data point, my first thought is to use LSTM/GRU for sequence prediction, but I’m not sure:

whether that’s still the state of the art? I already did a bit of inconclusive literature research and looked at this thread, where @Rolfe was wondering about the use of CNN instead: CNN better than LSTM/GRU for time series
how to predict further than the next item (i.e., 60 minutes/datapoints rather than 1) in a LSTM/GRU
what’s the best way to bring this in production. I read about the BETA of Cloud-ml’s online predictions: https://cloud.google.com/ml-engine/docs/how-tos/online-predict

Anyone would like to share ideas or good pointers?
Thanks so much everyone, this forum is great.

mino · July 13, 2017, 1:08pm

An update, as I expect this topic may interest many users.
Regarding item #2 above, I’m trying to do something as follows:

# First, build the model:
model = Sequential()
model.add(LSTM(64, return_sequences=True, stateful=False, batch_input_shape=(1, 5, 1)))
model.add(LSTM(32, return_sequences=True, stateful=False))
model.add(LSTM(16, stateful=False))
model.add(Dense(1, activation='linear'))

model.compile(loss='mean_squared_error',
          optimizer='rmsprop',
          metrics=['accuracy'])

Then, every time a new data point current is sent in, I do:

# 'current' is the new data point
# 'sequence' is a list with the previous 5 data points
model.fit(sequence, current, epochs=1, batch_size=1, shuffle=False)

# remove the oldest item from 'sequence' and append 'current' at the end of it
# then:
future = model.predict(sequence, batch_size=1)

However that doesn’t seem to converge anywhere at all. Leave aside the structure (depth, # of parameters, dropout, etc) of the model itself, is this approach fundamentally wrong?

Regarding #3, using Keras I’m not sure how to save a model. Apparently this approach doesn’t seem to change everything (e.g., learning rate, etc). Anyone has experience to share, please?

marmac · July 13, 2017, 1:51pm

I’m definitely interested in the same problem!
You are definitely ahead of me (just starting part 1), but when I gain more insight I will definitely share and I will be checking back as well.

Thanks for posting.

twairball · July 19, 2017, 8:02am

@mino do u have a dataset sample ?

mino · July 20, 2017, 2:58pm

Done. At least, sort of
I built the online-learning model in Keras as follows:

preprocessing: linear scaling between [0, 1] and differentiating each data point to the previous one
LSTM recurrent layer of 128 neurons
dropout with p=0.4 to avoid overfitting
LSTM recurrent layer of 64 neurons
dropout with p=0.2 to avoid overfitting
LSTM recurrent layer of 32 neurons
dropout with p=0.1 to avoid overfitting
Fully connected layer with 1 output

It seems to work great for my datasets, which are decently autocorrelated and show multiple seasonalities (i.e. daily pattern + day of the week). In fact, the stacked LSTM seems to automagically learn the seasonalities, as long as the training is long enough.

Next aims:

implement an anomaly measure
as the dataset is relatively small and I’m using batch_size=1 (because time series…), GPU training is highly inefficient. In fact, training on CPU is even faster. I think I could dramatically increase efficiency by pre-loading the dataset in the GPU’s RAM (e.g., using tf.constant), right?
find how to effectively “serve” this in production at scale. I’m still confused by cloud-ml’s pricing scheme for online modelling.

I should probably do a nice write up once I’m done.

mino · August 1, 2017, 9:08pm

Regarding:

implement an anomaly measure

At first I was trying to do that using autoencoders, taking the reconstruction error as a form of anomaly measure, but I’m now experimenting with using WaveNet instead, by quantizing the time series and getting softmax predictions so that I have a probability value.

I got this tip from a reddit user, it seems a great idea!

Suggestions?

msp · September 2, 2017, 11:16am

When new points come in, do you do the scaling with the current window, or do you use fixed values from your training? (the second option seems safer to me in terms of keeping the input scale consistent, however you might have values outside the [0,1] range for unseen data points).

mriazi · November 29, 2017, 10:58pm

Hi Mino,

I’m kind of doing a similar project. Predict future values of a time series data.
I’m curious about the use of WaveNet for getting the anomaly score. Could you post the link to the reddit article you’ve mentioned? Or if you were able to actually apply the idea, I’d appreciate if you could elaborate on it a bit.

Many thanks

mino · December 1, 2017, 6:09pm

Gi @msp, sorry for the delay. I didn’t get a notification for your post:

When new points come in, do you do the scaling with the current window, or do you use fixed values from your training?

I was using a fixed value, because in my application domain that metric I was modelling was in a well-bounded range.

Hi @mriazi, I was referring to this comment.

I’m not working on that approach anymore to be honest, but if you manage to build anything please let us know! Thanks.

akshayb7 · May 14, 2019, 8:19am

Hi @mino ,
I’m sure I am very late to the conversation, but I’ve recently been working on a similar problem (Anomaly detection in multi-variate time series). I’ve been using an autoencoder to do the same. I was wondering if you could share some information on how you deployed the model into production. I’m in my first data science job and have never put a model into production, so kinda in the dark here. Thanks in advance

mino · May 14, 2019, 9:02am

Hi @akshayb7, this was 2+ years ago.
What I did was simply gather a new “batch” of data, then generating a prediction to estimate whether it is an anomaly or not, and then use this new batch to update the model itself.

This was “ages” ago and implemented with Keras (I think with theano backend!), so I’m not sure there would be much to learn as best practice guidelines, sorry

akshayb7 · May 14, 2019, 9:08am

Hey @mino , thanks for replying. I was not sure if you would even see this. Yeah I understand that it was quite a long time ago but still this was the only relevant thread I found for asking this.

I’ve used many of the recent techniques to create an anomaly detectors (which runs superbly on a jupyter notebook) but I’m not sure how to put it into production. Would it be possible for you to give me any guidance in that regard? Or even point me to someplace/website which might help me out on how to do that?

Thank you so much for your help.

P.S.: By production I mean actually deploying the model on the cloud. I completely understood your approach on the implementation side (I was kind of doing the same, but rather than making predictions I was taking the training error on an LSTM based-autoencoder for anomaly detection and then switching out the trained model on the bad data with a previously trained model on good data, kinda like the save best model callback).