Forecasting and anomaly detection from streaming time series (online learning)


#1

Hi,
I’m looking into building a system that receives a continuous data streaming representing a 1D time series (say, some performance metric from a machine once per minute) and output two things:

  • A forecast for that metric for the next hour
  • An alert, if the data point we just received is anomalous (e.g., a probability value for the last value being close to what we expected)

As this is an online-model which get updated at each new data point, my first thought is to use LSTM/GRU for sequence prediction, but I’m not sure:

  1. whether that’s still the state of the art? I already did a bit of inconclusive literature research and looked at this thread, where @Rolfe was wondering about the use of CNN instead: CNN better than LSTM/GRU for time series
  2. how to predict further than the next item (i.e., 60 minutes/datapoints rather than 1) in a LSTM/GRU
  3. what’s the best way to bring this in production. I read about the BETA of Cloud-ml’s online predictions: https://cloud.google.com/ml-engine/docs/how-tos/online-predict

Anyone would like to share ideas or good pointers?
Thanks so much everyone, this forum is great.


#2

An update, as I expect this topic may interest many users.
Regarding item #2 above, I’m trying to do something as follows:

# First, build the model:
model = Sequential()
model.add(LSTM(64, return_sequences=True, stateful=False, batch_input_shape=(1, 5, 1)))
model.add(LSTM(32, return_sequences=True, stateful=False))
model.add(LSTM(16, stateful=False))
model.add(Dense(1, activation='linear'))

model.compile(loss='mean_squared_error',
          optimizer='rmsprop',
          metrics=['accuracy'])

Then, every time a new data point current is sent in, I do:

# 'current' is the new data point
# 'sequence' is a list with the previous 5 data points
model.fit(sequence, current, epochs=1, batch_size=1, shuffle=False)

# remove the oldest item from 'sequence' and append 'current' at the end of it
# then:
future = model.predict(sequence, batch_size=1)

However that doesn’t seem to converge anywhere at all. Leave aside the structure (depth, # of parameters, dropout, etc) of the model itself, is this approach fundamentally wrong? :slight_smile:

Regarding #3, using Keras I’m not sure how to save a model. Apparently this approach doesn’t seem to change everything (e.g., learning rate, etc). Anyone has experience to share, please?


(Greg McKenzie) #3

I’m definitely interested in the same problem!
You are definitely ahead of me (just starting part 1), but when I gain more insight I will definitely share and I will be checking back as well.

Thanks for posting.


(jerry liu) #4

@mino do u have a dataset sample ?


#5

Done. At least, sort of :slight_smile:
I built the online-learning model in Keras as follows:

  1. preprocessing: linear scaling between [0, 1] and differentiating each data point to the previous one
  2. LSTM recurrent layer of 128 neurons
  3. dropout with p=0.4 to avoid overfitting
  4. LSTM recurrent layer of 64 neurons
  5. dropout with p=0.2 to avoid overfitting
  6. LSTM recurrent layer of 32 neurons
  7. dropout with p=0.1 to avoid overfitting
  8. Fully connected layer with 1 output

It seems to work great for my datasets, which are decently autocorrelated and show multiple seasonalities (i.e. daily pattern + day of the week). In fact, the stacked LSTM seems to automagically learn the seasonalities, as long as the training is long enough.

Next aims:

  • implement an anomaly measure

  • as the dataset is relatively small and I’m using batch_size=1 (because time series…), GPU training is highly inefficient. In fact, training on CPU is even faster. I think I could dramatically increase efficiency by pre-loading the dataset in the GPU’s RAM (e.g., using tf.constant), right?

  • find how to effectively “serve” this in production at scale. I’m still confused by cloud-ml’s pricing scheme for online modelling.

I should probably do a nice write up once I’m done.


#6

Regarding:

implement an anomaly measure

At first I was trying to do that using autoencoders, taking the reconstruction error as a form of anomaly measure, but I’m now experimenting with using WaveNet instead, by quantizing the time series and getting softmax predictions so that I have a probability value.

I got this tip from a reddit user, it seems a great idea!

Suggestions?


#7

When new points come in, do you do the scaling with the current window, or do you use fixed values from your training? (the second option seems safer to me in terms of keeping the input scale consistent, however you might have values outside the [0,1] range for unseen data points).


(Mohammad) #8

Hi Mino,

I’m kind of doing a similar project. Predict future values of a time series data.
I’m curious about the use of WaveNet for getting the anomaly score. Could you post the link to the reddit article you’ve mentioned? Or if you were able to actually apply the idea, I’d appreciate if you could elaborate on it a bit.

Many thanks


#9

Gi @msp, sorry for the delay. I didn’t get a notification for your post:

When new points come in, do you do the scaling with the current window, or do you use fixed values from your training?

I was using a fixed value, because in my application domain that metric I was modelling was in a well-bounded range.


Hi @mriazi, I was referring to this comment.

I’m not working on that approach anymore to be honest, but if you manage to build anything please let us know! Thanks.