Going mad with RNNs outside NLP

DanielAyala · February 11, 2019, 3:38pm

Hello everyone. I have recently started to use fastai, and as a personal excercise I wanted to create a model for regression of tabular data, but with a recurrent model. For example, we can have a csv file with auction bids, where rows are sorted by auction, with a column that identifies it. Auctions may have a variable number of bids. Of course I could double or triple the size of the input vector and just feed a fixed number of samples, but I want a proper RNN.

The creation of the model should be quite easy, by adding at least some memory cells here and there, assuming the input is not a batch of elements, but a batch of sequences of elements, as happens with the language model. The difference is that while the language model has a fixed sequence length for the training datasets given by the bptt parameter. The sequences in the tabular context usually have a variable, fixed, potentially small sequence length.

I started studying the data input classes. The data items from which to make guesses are the same: rows from the table, each with a label, so the ItemList class should ideally be the tabular one without changes. I started digging into the classes that take care of creating a batch, and I got to the sampler classes from pytorch. However, these classes use, as samples, indices from the original data source, while we need groups of the original data (rows form the dataframe) that correspond to sequences. I also feel like I’m getting too low level for what should be a pretty straight forwards processing, since these classes start to work with some multi-threading mess.

So my question is: is there even support of recurrent neural networks beyond model languages/text? I haven’t seen anything so far, not a single recurrent network for tabular data (features-rich time series), images (such as video), or anything beyond NLP, which uses the class “LanguageModelPreLoader”, which doesn’t seem to be applicable to anything beyond text. I feel like there is no actual support for RNNs, but I may be missing something. Has anyone implemented any non-NLP RNN with the corresponding data-loading classes?

I think that the most likely problem is that the data shouldn’t be fed in the way I imagined. Maybe I should consider each sequence an entire data point retrieved by the dataset, with an additional dimension in both the features and the label corresponding to the sequence. Then the network and the loss/metrics functions would just deal with the dimensions of the data in any convenient way.

Kaspar · February 11, 2019, 11:02pm

Would be easier to help if you have plots of your data serie:)
And yes you are right the way you serve data is important and probably the first thing to clarify

DanielAyala · February 12, 2019, 8:15am

Hi again! I hadn’t some specific data in mind, but this would be a good example of what the net would applied to (I also have some real data about bids somewhere, I think):

As you can see, the sequences are defined by a column (item ID in this case) that groups the rows. The prediction of each row would be the bid of the next row, which could be added as an additional column, removing the row with the last bid. Each row has potentially many features, as is the case with the Rossman dataset about sales, which would be another good example of case where a memory layer could be added, in addition to the other layers.

The most intuitive way to represent the data would be with a sample per row, and an individual prediction per sample. Each sequence would be a group of samples (with variable size), and each batch would be a group of sequences. But this doesn’t seem to be possible, at least in an elegant way, unless a the DataBunch class is altered to return batches in a customised way (I havent given much though to this possibility, since I am not too familiar with the class). There is also the idea I mentioned: making each sample a sequence, and each ‘y’ a sequence of predictions. This is not intuitive and could cause problems when it comes to metrics and loss, I think.

Antoine · February 12, 2019, 9:31am

I am facing the same problem, but in a different context, namely supply chain / inventory optimization. I work in the manufacturing industry, and I would like to create an RNN model to represent the sequence of manufacturing steps from raw material to finished part.

At each step, the RNN would predict the advance/delay of the execution of the manufacturing operation compared to the planned execution date. The goal is to reduce the buffers between two manufacturing steps (thus reducing overall stock and save on working capital) while still delivering the part on time at the last step of the manufacturing process.

Right now I am using a plain tabular structure similar to the Rossmann example. But the RNN would be useful because then I would be able to simulate a decrease in time between two successive manufacturing steps (in other words: decreasing the buffers) and see whether the part would still be delivered on time at the last step.

I have access to a large amount of historical data coming from my company’s ERP system. Each line in the dataset represents the validation of one specific step of the manufacturing process (execution date, planned date, part id, manufacturing step id, + lots of additional metadata)

One key aspect is that the number of steps in the manufacturing process is not fixed: it can go roughly from 5 to 15 depending on the type and complexity of the part. This is why I am relating to your post. I haven’t started to look into the details of the RNN but I understand from your post that this seems difficult to implement right now.

Therefore I am very interested to a solution to both these RNN applications.

DanielAyala · February 12, 2019, 9:57am

Another idea I had was that the network would just receive individual samples (rows), and it would take care of resetting the memory layers when a new sequence is detected. To do this, rows would have to be first sorted by sequence and secondarily by whatever orders them in the sequence. Then, the batches should feed the data in perfect order, which maybe is possible with the pytorch sequentialSampler, used by the BatchSampler, used by the DataLoader, used by the DataBunch.

In the forward function, the dimension representing the element of the batch should become the dimension representing the order in the sequence. One of the problems is that the same batch could contain several sequences, so this would have to be dealt with. A sequence could also go over several batches.

Overall, all these solutions I come up with seem to be really forced and bad, and I doubt a non-NLP CNN can be trained and appleid as long as there is no support for them in the form of some class that supports feeding batches with sequences in a generic way.

Kaspar · February 12, 2019, 10:43am

Interesting applications but not the easiest ones to start with if you are new to AI:)

For the auction I think you would need a custom model where you mix rnn (fastai’s lstm or attention model) with input from:

embeddings of 1) day of week, 2) category, possibly time(hours) of day,
a scalar for the difference between bids
is auctions are limited to a certain number of days hours ?

Anyway i would start simpler just to get started. Could it make sens to reduce the time aspect to three variables: 1) first bid, 2) middel bid. The variable to predict would the be the last bid
If there is a defined interval in which byers have to bid then that info might also be useful.
In this way you would avoid treating this as a time serie and get start testing and getting a feeling for what is important ?

DanielAyala · February 12, 2019, 10:49am

The creation of the model would be very easy, but training it with a databunch that feeds it batches of sequences, not so!

Cramming information about a couple of former bids in each row is, of course, a possibility. But my focus is not in solving this specific problem, but in covering what seems to me like a very common use case: RNNs for tabular data, which should ideally also be generalised for images and pretty much any input from which we can create a sequence.

DanielAyala · February 22, 2019, 11:31am

Can’t a dev answer so that we can at least confirm that fastai does not support RNNs beyond the included case for NLP?

sgugger · February 22, 2019, 12:38pm

I don’t what you are talking about, regression is fully supported as explained in various topics of the forums, or the rossmann lesson.
As for your problem, well you want a custom DataLoader that feeds batches in a certain order, just write it! You can then pass it directly to the DataBunch init method.

Pomo · April 29, 2019, 10:58pm

It seems that my question overlaps this topic. I have a time-series RNN model that trains. It is built sloppily in pure PyTorch. I would like to bring it into fastai in order to use the LR schedulers, LR Finder, and other conveniences. However, I do not understand how to adapt the language data model to this situation.

Data is a sequence of rows that represent time steps; columns are various stock prices and indicators. These are held in a DataFrame that looks like this:

AAPL GOOG MSFT… each column a series
160 710 113
164 712 123
163 714 131
…and so on, each row a time step

The RNN should process a series of time steps rows in sequence. Each column will be an element of a minibatch computed simultaneously with the others (with own hidden state). When all columns have been processed in minibatches, we have a full epoch. The Training set is the first 80% of rows, while Validation is the last 20% of rows. The target is the next timestep value.

To be clear, I do not want the time series’ to be chopped across minibatches, across elements of a minibatch, or into segments of length bptt. Rather, the RNN trains on an entire first 80% of the rows of each minibatch. I do, however, want to specify how often gradients are backpropagated/updated. I think this frequency corresponds to bptt in a language model.

Can fastai adapt to this scenario? If so, could you please show me the exact places that will need to be customized?

Thank you!

Cameron · February 17, 2020, 10:25am

@Pomo,

Feels a bit surreal reading your post as it so closely aligns with my current project. I’m trying to implement an RNN to predict the behaviour of the NASDAQ100 index based on quite a great data set: [ http://cseweb.ucsd.edu/~yaq007/NASDAQ100_stock_data.html ].

I feel as though I am on the opposite side of the tabular RNN madness. I have the fast.ai library working-ish on a single 3D tensor representing a single batch containing 4 sequences of 3 objects, where each object represents a small sample of 5 input variables (column values at a given time-stamp). From this input into a net.forward(input) call I get a non-error output.

My issues arise when trying to set up the databunch to format the full dataframe into a format that can be called by the fast.ai learner, as when I call .one_batch() on my databunch I’m met with a 2D tensor, and there exists no mention of any kind of bptt value for TabularData.

My thoughts are cluttered on this, how much further have you gotten?

Cameron

Pomo · February 17, 2020, 5:48pm

Hi Cameron. I am working on these ideas very intermittently, due to lots of family crises. I have never been able to adapt the NLP methods to time series. But fastai NLP is my weakest area, so that does not mean much. This question asked in the Time/Series Sequential study group elicited no clear answers, but still you might try again there. I also placed posts there that attempt to find equivalences for bptt, etc.

Currently, I build all models and data loaders directly in PyTorch, and then pass them into a DataBunch and Learner to use at least some of the fastai conveniences. I have moved away from sequential models and rather am playing with layered models with a dilated lookback. (This is just a fancy auto-regression and is much faster than RNN. Some of the inputs are themselves conv1d, so there may still be sequential processes deep in the bowels of PyTorch.) My batch is just one multivariate sequence followed by validation - there’s no attempt to break a time series into several training/validation sets.

So that’s where I have landed. In sum, not trying to adapt fastai’s NLP or Tabular to my needs. I hear there will a time series module in the next fastai version. Hopefully someone will show us how exactly to use it for situations such as these.

HTH, Malcolm

muellerzr · February 17, 2020, 5:52pm

A time series module exists already in v1 which utilizes Rnn’s. See the discussion here: Time series/ sequential data study group