RNNTrainer in a model having an RNN and other kind of layers

I am trying to combine tabular and text data together. I am not getting better results than just with tabular and I am wondering if maybe my code is doing something wrong…

One piece of RNNLearner I had to modify is the RNNTrainer for the training to work. Specifically, the methods on_loss_begin and on_backward_begin… My custom model ouputed simply a tensor for classification. Now I modified my custom model so that it also includes what RNNTrainer needs from the rnn sub-model. Here is my model where the forward method returns now 2 things, my binary classification tensor and the tensor from the rnn for RNNTrainer:

class TabularTextModel(nn.Module):
    "Basic model for tabular data."
    def __init__(self, emb_szs:ListSizes, n_cont:int, layers:Collection[int], vocab_sz:int, encoder):
        super().__init__()
        l = [400 * 3] + [256]
        ps = [.4]
        self.lm_encoder = SequentialRNN(encoder[0], PoolingLinearClassifier(l, ps))

        self.tab = TabularModelEx(emb_szs, n_cont, 256, layers)

        self.merge = nn.Sequential(*bn_drop_lin(256 + 256, 128, bn=True, p=0.5, actn=nn.ReLU(inplace=True)))
        self.final = nn.Sequential(*bn_drop_lin(128, 2, bn=True, p=0., actn=None))

    def forward(self, x:Tensor, text:Tensor) -> Tensor:
        tabLatent = self.tab(x[0], x[1])
        textLatent = self.lm_encoder(text)

        cat = torch.cat([tabLatent, textLatent[0]], dim=1)

        return self.final(self.merge(cat)), textLatent
    
    def reset(self):
        for c in self.children():
            if hasattr(c, 'reset'): c.reset()

But now I didn’t get any performance gain from adding this rnn in the model… Just using the tabular part of the model seems like it is doing better, so I am wondering about the impact of RNNTrainer and if it might be the reason why this is not really working…

Here is the code of my RNNTrainerCustom for reference where I simply modified where it takes the tensor it needs to do the same job it did for the normal RNNTrainer.

class RNNTrainerCustom(LearnerCallback):
    "`Callback` that regroups lr adjustment to seq_len, AR and TAR."
    def __init__(self, learn:Learner, alpha:float=0., beta:float=0.):
        super().__init__(learn)
        self.not_min += ['raw_out', 'out']
        self.alpha,self.beta = alpha,beta
        
    def on_epoch_begin(self, **kwargs):
        "Reset the hidden state of the model."
        self.learn.model.reset()

    def on_loss_begin(self, last_output:Tuple[Tensor,Tensor,Tensor], **kwargs):
        "Save the extra outputs for later and only returns the true output."
        self.raw_out,self.out = last_output[1][1],last_output[1][2]
        return last_output[0]

    def on_backward_begin(self, last_loss:Rank0Tensor, last_input:Tensor, **kwargs):
        "Apply AR and TAR to `last_loss`."
        #AR and TAR
        if self.alpha != 0.:  last_loss += self.alpha * self.out[-1].float().pow(2).mean()
        if self.beta != 0.:
            h = self.raw_out[-1]
            if len(h)>1: last_loss += self.beta * (h[:,1:] - h[:,:-1]).float().pow(2).mean()
        return last_loss

Something else I was wondering is that the encoder in my model is using a MultiBatchEncoder… But I am wondering if this makes sense considering the fact that if text span multiple batches, then it would not align with the corresponding tabular data…

I have no idea what your texts are. MultiBatchEncoder is there to feeds a whole sequence, not just a fixed seq_len (it’s not a good name), but it feeds the same batch.
Note that the RNNTrainer is only there for regularization so I’d put it only after you have overfit. One thing that should be useful is using pooling on the text output and not just the textLatent. Look at PoolingLinearClassifier (or something like that).

Yep I already use PoolingLinearClassifier in my model, textLatent is calculated with that:

self.lm_encoder = SequentialRNN(encoder[0], PoolingLinearClassifier(l, ps))

Also here is some sample texts:

'Proper bake, top and bottom (include pictures)',
'Cleanliness Front of House',
'Proper bake, top and bottom (include pictures)',
'Are floors, walls and baseboards clean & in good repair?',
'Walk in coolers are clean, well maintained and meet storage standards?',
'Pop Fridge- Temperature',
'Daily Duties/Responsibilities Staff',
'Authorized Menus (Picture)',
'Nonfood-contact surfaces of equipment and utensils are durable, non-toxic, easily cleanable and in good condition.',
'Dust pans, mop buckets,  and mop heads are clean and in good condtion',

Those are conformity questions for inspectors. Each row in the dataset contains a question in text form, a bunch of tabular data about the question, the actual result and possibly multiple pictures. Just for simplicity sake right now I am testing a model just with tabular data and text.

My model with just tabular data is around 2% better than the model combining tabular with text. 86% vs 88% accuracy. So adding the rnn penalize the model.

In the model using only tabular data, I fitted a LDA model on the text to do topic modeling. And create a new categorical variable with that called TextTopic. So each question is classified in one of 10 category… So this is how I use the text data there.

I removed this variable from the model using tabular + text to see if adding the rnn could get as good results, but so far it hasn’t been the case.

Happy to report that removing RNNTrainer like you suggested greatly helped. Also I adjusted a bunch of things inside my model which I think were preventing it from converging properly. Also I had forgotten to fine-tune the encoder with my text data before using it in my sub-model.

All those little things added up I think.

But now both models use identical variables for the structured variables (including TopicText) and I get better performance from the Tabular+Text model. It hasn’t finished training yet, but it has already surpassed the tabular only model.

Thanks for your help!

1 Like

Hi Everyone,

I have sequential images of coronary CTA cross-sections that would like to use CNN+RNN to predict a binary outcome. Has anyone used fastai to implement such problem?

I have trained a CNN to predict the 2D cross-sections independent of relationship to each other but would like to add RNN since the images are sequential along the centerline of the vessel. I am using fastai v2 for this problem but I can convert everything to v1 if necessary.

Any thoughts/help is appreciated.