Time series data

Hi! I’ve been looking for a Fast.ai example with tabular data where each row is dependent on the previous one, but couldn’t find a suitable example. I’m trying to forecast the weather based on previous hourly temperature data in cities. The data looks like this:

day time Paris Berlin London Amsterdam
7.3.2019 8:00 15.3 13.4 12.9 13.2
7.3.2019 9:00 15.4 13.5 13.1 13.3
7.3.2019 10:00 15.7 13.7 13.4 13.5
7.3.2019 11:00 15.5 13.9 13.6 13.9
7.3.2019 12:00 15.9 13.4 13.2 13.5
7.3.2019 13:00 16.0 14.0 13.9 13.9
7.3.2019 14:00 15.7 14.1 14.0 14.0
7.3.2019 15:00 15.6 13.9 14.1 13.8
7.3.2019 16:00 ? ? ? ?

I’ve searched for time series and LSTM examples but I’m not sure if these are the right keywords since I only found more advanced pieces of code or NLP related examples. Which data loader and learner should I use for predicting the temperature in each city for the next hour? Thanks a bunch!


It seems there isn’t a ready-made method in Fast.ai for this type of data. Have any of you stumbled upon a relatively beginner friendly implementation of this with non-standard Fast.ai methods?

You could try something like this. In pandas out the next row on the same row as the previous. Then for dep_var put a list of what those columns are. I have no idea if it’ll have good results but that’s how I’d do it

1 Like

Two other options than RNN approach:

  1. convert the date&hour to categorical variables and Use tabula learner (same as Rossmann example, but without “log” of dependent variable)

  2. use a convolutional approach with transformations like Gramian Angular field - for details see Time series/ sequential data study group


In theory flattening the rows should work since the learner gets all the information from previous time steps. I’ll very quickly hit a performance wall though, since I’m planning on running it with hundreds of cities and include other features too, such as wind speed and humidity. One time step has nearly a thousand features, leading to very slow training when stacked up 5-10 times. Will try this today anyway, thanks!

Doesn’t the tabular learner only read rows as individual observations and not as a sequence? I could concatenate the sequences into single rows Mueller suggested above but this will lead to a massive amount of features. I could be missing something here though. I’ll have a look into Gramian Angular Field.

Yes it reads one row at time, but the network (linear layers and embedding matrices) learns the relationship between inputs in time.

Give it a try: in the rossmann paper they address your same problem, but instead to predict forecast they predict sales, feeding a lot of rows one at time.

1 Like

There’s nothing wrong with a massive amount of features. As Jeremy says in the intro to ML the curse of dimensionality isn’t really a thing. I’ve done research where I have 100+ features. And I engineered 80% of thise

1 Like