Get activations of second to last layer in Tabular Model

rbunn80130 · April 18, 2019, 4:47pm

I am training a tabular learner and would like to save the outputs of the next to last layer to train a random forest with (after training is complete). I did this before with a previous version of fast ai and the result was a significant accuracy boost. In my specific case my dataset was quite small and doing this was significantly better than using RF or NN alone. It appears that this should be simple after looking though the forums, but I can’t seem to figure out how to save these activations. Is there an example of something similar to this anywhere?

Thanks,

Bob

Pak · April 18, 2019, 4:57pm

I’ve done something like this before and first I’ve came to simular results, but now I have a doubt. I have a question for you, how do you measure accuracy in a Random Forrest case and in a NN one?

rbunn80130 · April 18, 2019, 5:10pm

I was doing regression, so I simply calculated percent accuracy.

Pak · April 18, 2019, 5:58pm

First I answer you initial question, and the I will try to explain what do I mean with my own

You can get activations of a particular layer by putting hook into it.
After you have determined which one it is (in my case for ex. it is -5th, because it goes like (4): Linear(in_features=1000, out_features=500, bias=True); (5): ReLU(inplace); (6): BatchNorm1d(500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True); (7): Dropout(p=0.1); (8): Linear(in_features=500, out_features=1, bias=True))
So, I get the -5th layer with this:
layer = list(learn.model.modules())[-5]

The getting activations’ function is:

activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
        return hook

Then you can register a hook

hook1 = layer.register_forward_hook(get_activation('name_youve_chosen'))

Then run the model with something like
learn.predict(row)
And don’t forget to remove the hook
hook1.remove

And now your activations are stored in
activation['name_you_chosen']

If you have more than one row ( ) you can put in a for loop (or any other method that will let you run the model on a data without loosing connection between independent and dependent variables) and pull out data from activation['name_youve_chosen']

Pak · April 18, 2019, 6:32pm

I try to explain my own question
First when I’ve tried to use embeddings from my NN in a RF, I’ve taught NN on an every example (without validation set), cause I won’t use this NN, I thought, I just need embeddings from it. And I need embeddings for every example (in other case in my validation set definitely will be some samples without proper embeddings, with a category values that was not in the training set).
And after that I’ve got a really good RF result (with the help on embeddings only). Muuuch better than a validation error in NN.
That was too good to be true for me. And I think I now know where my mistake was.
As I see that:
Teaching categorical values with high cardinality is to turn each value into many floats (turn each value into tens of values, embedding vector). So we effectively move some part of a taught model into this embedding layer. And the I have moved this part of taught model as a RF input. But it is an overfitted model (in terms that was taught on every example). In my case whichever row RF will choose for validation, it will use an embedding layer that was already overfitted and holds much more information about dependent variable (becuase it was taught so). So it’s much easy for RF to get the right answer. In fact, I think, that is you try to teach an NN model with this input you maybe get even more good results

TLDR;
So my point is that we should use a separate (ok, lets name it test) set that was not seen nor by our NN model nor by RF, to calculate a realworld error from this composite approach (when we get data->intut of NN->activations from x-th layer of NN->input of RF->output of RF).
(And there looks like we will have some interesting edge cases)
To be clear I did not yet try to use separate test set in this scenario

rbunn80130 · April 18, 2019, 7:37pm

Thanks for your example code. However it is causing a Type error when I run this code:

layer = list(learn.model.modules())[-4]

activation = []
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
        return hook

hook = layer.register_forward_hook(get_activation('act_output'))

row = df.iloc[0]
learn.predict(row)

I get this error:

TypeError Traceback (most recent call last)
in
1 row = df.iloc[0]
----> 2 learn.predict(row)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_train.py in predict(self, item, **kwargs)
360 “Return predicted class, label and probabilities for item.”
361 batch = self.data.one_item(item)
–> 362 res = self.pred_batch(batch=batch)
363 pred,x = res[0],batch[0]
364 norm = getattr(self.data,‘norm’,False)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_train.py in pred_batch(self, ds_type, batch, reconstruct)
340 cb_handler = CallbackHandler(self.callbacks)
341 xb,yb = cb_handler.on_batch_begin(xb,yb, train=False)
–> 342 preds = loss_batch(self.model.eval(), xb, yb, cb_handler=cb_handler)
343 res = _loss_func2activ(self.loss_func)(preds[0])
344 if not reconstruct: return res

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
23 if not is_listy(xb): xb = [xb]
24 if not is_listy(yb): yb = [yb]
—> 25 out = model(*xb)
26 out = cb_handler.on_loss_begin(out)
27

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
–> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/tabular/models.py in forward(self, x_cat, x_cont)
35 x_cont = self.bn_cont(x_cont)
36 x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
—> 37 x = self.layers(x)
38 if self.y_range is not None:
39 x = (self.y_range[1]-self.y_range[0]) * torch.sigmoid(x) + self.y_range[0]

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
–> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
–> 491 hook_result = hook(self, input, result)
492 if hook_result is not None:
493 raise RuntimeError(

TypeError: ‘NoneType’ object is not callable

The last two lines do a correct prediction if I don’t run the code you suggested, so not sure where I am going wrong here.

rbunn80130 · April 18, 2019, 7:43pm

Yes, I agree completely with holding out a test set completely from both the NN and RF.

Pak · April 18, 2019, 7:59pm

First of all
activation = []
should be a dictionary so
activation = {}

But that’s not the source of the error as I understand.
What does ‘layer’ contains?

rbunn80130 · April 18, 2019, 9:00pm

Linear(in_features=100, out_features=100, bias=True)

rbunn80130 · April 18, 2019, 9:52pm

This appears to work. Let me know if you think it is incorrect.

from fastai.callbacks.hooks import *
nn_module = list(learn.model.modules())[-5]
hook = hook_output(nn_module)
row = df.iloc[0]
learn.predict(row)
hook.stored

Pak · April 19, 2019, 10:18am

Yes, it seems even better as it uses fastai hooks which handles many of heavy machinery.
Please, share your results about was it really worth it to use NN->RF instead of NN alone in terms of separate test set error in your case

Pak · April 19, 2019, 4:04pm

By the way. I’ve just tried myself using embeddings from NN model in RF and got pretty the same error as in NN alone.
So in my case it didn’t helped much.