Going mad with RNNs outside NLP

(Daniel Ayala) #1

Hello everyone. I have recently started to use fastai, and as a personal excercise I wanted to create a model for regression of tabular data, but with a recurrent model. For example, we can have a csv file with auction bids, where rows are sorted by auction, with a column that identifies it. Auctions may have a variable number of bids. Of course I could double or triple the size of the input vector and just feed a fixed number of samples, but I want a proper RNN.

The creation of the model should be quite easy, by adding at least some memory cells here and there, assuming the input is not a batch of elements, but a batch of sequences of elements, as happens with the language model. The difference is that while the language model has a fixed sequence length for the training datasets given by the bptt parameter. The sequences in the tabular context usually have a variable, fixed, potentially small sequence length.

I started studying the data input classes. The data items from which to make guesses are the same: rows from the table, each with a label, so the ItemList class should ideally be the tabular one without changes. I started digging into the classes that take care of creating a batch, and I got to the sampler classes from pytorch. However, these classes use, as samples, indices from the original data source, while we need groups of the original data (rows form the dataframe) that correspond to sequences. I also feel like I’m getting too low level for what should be a pretty straight forwards processing, since these classes start to work with some multi-threading mess.

So my question is: is there even support of recurrent neural networks beyond model languages/text? I haven’t seen anything so far, not a single recurrent network for tabular data (features-rich time series), images (such as video), or anything beyond NLP, which uses the class “LanguageModelPreLoader”, which doesn’t seem to be applicable to anything beyond text. I feel like there is no actual support for RNNs, but I may be missing something. Has anyone implemented any non-NLP RNN with the corresponding data-loading classes?

I think that the most likely problem is that the data shouldn’t be fed in the way I imagined. Maybe I should consider each sequence an entire data point retrieved by the dataset, with an additional dimension in both the features and the label corresponding to the sequence. Then the network and the loss/metrics functions would just deal with the dimensions of the data in any convenient way.

0 Likes

(Kaspar Lund) #2

Would be easier to help if you have plots of your data serie:)
And yes you are right the way you serve data is important and probably the first thing to clarify

0 Likes

(Daniel Ayala) #3

Hi again! I hadn’t some specific data in mind, but this would be a good example of what the net would applied to (I also have some real data about bids somewhere, I think):

image

As you can see, the sequences are defined by a column (item ID in this case) that groups the rows. The prediction of each row would be the bid of the next row, which could be added as an additional column, removing the row with the last bid. Each row has potentially many features, as is the case with the Rossman dataset about sales, which would be another good example of case where a memory layer could be added, in addition to the other layers.

The most intuitive way to represent the data would be with a sample per row, and an individual prediction per sample. Each sequence would be a group of samples (with variable size), and each batch would be a group of sequences. But this doesn’t seem to be possible, at least in an elegant way, unless a the DataBunch class is altered to return batches in a customised way (I havent given much though to this possibility, since I am not too familiar with the class). There is also the idea I mentioned: making each sample a sequence, and each ‘y’ a sequence of predictions. This is not intuitive and could cause problems when it comes to metrics and loss, I think.

0 Likes

(antoine mercier) #4

I am facing the same problem, but in a different context, namely supply chain / inventory optimization. I work in the manufacturing industry, and I would like to create an RNN model to represent the sequence of manufacturing steps from raw material to finished part.

At each step, the RNN would predict the advance/delay of the execution of the manufacturing operation compared to the planned execution date. The goal is to reduce the buffers between two manufacturing steps (thus reducing overall stock and save on working capital) while still delivering the part on time at the last step of the manufacturing process.

Right now I am using a plain tabular structure similar to the Rossmann example. But the RNN would be useful because then I would be able to simulate a decrease in time between two successive manufacturing steps (in other words: decreasing the buffers) and see whether the part would still be delivered on time at the last step.

I have access to a large amount of historical data coming from my company’s ERP system. Each line in the dataset represents the validation of one specific step of the manufacturing process (execution date, planned date, part id, manufacturing step id, + lots of additional metadata)

One key aspect is that the number of steps in the manufacturing process is not fixed: it can go roughly from 5 to 15 depending on the type and complexity of the part. This is why I am relating to your post. I haven’t started to look into the details of the RNN but I understand from your post that this seems difficult to implement right now.

Therefore I am very interested to a solution to both these RNN applications.

0 Likes

(Daniel Ayala) #5

Another idea I had was that the network would just receive individual samples (rows), and it would take care of resetting the memory layers when a new sequence is detected. To do this, rows would have to be first sorted by sequence and secondarily by whatever orders them in the sequence. Then, the batches should feed the data in perfect order, which maybe is possible with the pytorch sequentialSampler, used by the BatchSampler, used by the DataLoader, used by the DataBunch.

In the forward function, the dimension representing the element of the batch should become the dimension representing the order in the sequence. One of the problems is that the same batch could contain several sequences, so this would have to be dealt with. A sequence could also go over several batches.

Overall, all these solutions I come up with seem to be really forced and bad, and I doubt a non-NLP CNN can be trained and appleid as long as there is no support for them in the form of some class that supports feeding batches with sequences in a generic way.

0 Likes

(Kaspar Lund) #6

Interesting applications but not the easiest ones to start with if you are new to AI:)

For the auction I think you would need a custom model where you mix rnn (fastai’s lstm or attention model) with input from:

  • embeddings of 1) day of week, 2) category, possibly time(hours) of day,
  • a scalar for the difference between bids
  • is auctions are limited to a certain number of days hours ?

Anyway i would start simpler just to get started. Could it make sens to reduce the time aspect to three variables: 1) first bid, 2) middel bid. The variable to predict would the be the last bid
If there is a defined interval in which byers have to bid then that info might also be useful.
In this way you would avoid treating this as a time serie and get start testing and getting a feeling for what is important ?

0 Likes

(Daniel Ayala) #7

The creation of the model would be very easy, but training it with a databunch that feeds it batches of sequences, not so!

Cramming information about a couple of former bids in each row is, of course, a possibility. But my focus is not in solving this specific problem, but in covering what seems to me like a very common use case: RNNs for tabular data, which should ideally also be generalised for images and pretty much any input from which we can create a sequence.

1 Like

(Daniel Ayala) #8

Can’t a dev answer so that we can at least confirm that fastai does not support RNNs beyond the included case for NLP?

0 Likes

#9

I don’t what you are talking about, regression is fully supported as explained in various topics of the forums, or the rossmann lesson.
As for your problem, well you want a custom DataLoader that feeds batches in a certain order, just write it! You can then pass it directly to the DataBunch init method.

0 Likes