Splitting the FastAI Tabular Learner

bobseboy · October 26, 2020, 9:18am

I have attempted to split my Tabular Learner in two , as I am trying to do some internal transformations decoupled from the network. In doing so I have created a hook that allows me to grab the softmax layer in my model. After doing some transformations to this softmax layer I would like to apply the last few steps in my learner object to get a prediction. But even if I don’t change anything I still do not succeed at getting the same predictions (running the predictions using learn.get_preds vs. applying the steps manually as can be seen below:

Learner object:
… (9) Softmax -> (10) BatchNorm1d -> (11) Dropout -> (12) Linear

softmax_activations = torch.Tensor(softmax_layer.features)

batch_layer = (learn.model.layers[-3])
dropout_layer = (learn.model.layers[-2])
linear_layer = (learn.model.layers[-1])

batch_transformed = batch_layer(softmax_activations )
dropout_transformed = dropout_layer(batch_transformed)
softmax_y_hat = linear_layer(dropout_transformed)

Even when I don’t alter anything, applying the last steps manually does not give me the predictions that I am looking for.

Why is softmax_y_hat not giving me the same predictions as learn.get_preds() does?

arampacha · October 29, 2020, 10:39am

would you post output of learn.summary()?

bobseboy · October 29, 2020, 10:55am

Absolutely,
TabularModel
======================================================================
Layer (type) Output Shape Param # Trainable
======================================================================
Embedding [10] 280 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Embedding [3] 9 True
______________________________________________________________________
Dropout [73] 0 False
______________________________________________________________________
BatchNorm1d [53] 106 True
______________________________________________________________________
Linear [100] 12,700 True
______________________________________________________________________
ReLU [100] 0 False
______________________________________________________________________
BatchNorm1d [100] 200 True
______________________________________________________________________
Dropout [100] 0 False
______________________________________________________________________
Linear [50] 5,050 True
______________________________________________________________________
ReLU [50] 0 False
______________________________________________________________________
BatchNorm1d [50] 100 True
______________________________________________________________________
Dropout [50] 0 False
______________________________________________________________________
Linear [10] 510 True
______________________________________________________________________
Softmax [10] 0 False
______________________________________________________________________
BatchNorm1d [10] 20 True
______________________________________________________________________
Dropout [10] 0 False
______________________________________________________________________
Linear [1] 11 True
______________________________________________________________________

Total params: 19,166
Total trainable params: 19,166
Total non-trainable params: 0
Optimized with 'torch.optim.adam.Adam', betas=(0.9, 0.99)
Using true weight decay as discussed in https://www.fast.ai/2018/07/02/adam-weight-decay/ 
Loss function : L1LossFlat
======================================================================
Callbacks functions applied

arampacha · October 29, 2020, 11:11am

To make sure that u use those last layers in eval mode you can do:

model_head = learn.model.layers[-3:]
model_head.eval()
softmax_y_hat = model_head(softmax_activations)

also I would print model_head to make sure that contains layers I expect ))

bobseboy · October 30, 2020, 10:24am

Just tested your suggested approach. It gives me the exact same output - model_head.eval() doesn’t seem to change anything. Do you have any other ideas I could try?

arampacha · October 30, 2020, 10:55am

In this case, first thing I would have done, is to ensure that softmax_activations tensor contains what you expect. Instead of using hook you can compute outputs of the last layer by simply running your data through the first part of your model, learn.model.layers[:-3].
Also it’s hard to tell what might be the problem without the code. If you would share your notebook, I can take a look at it.

bobseboy · October 30, 2020, 1:46pm

I guess you could grab the head and the tail this way, but then I suppose I am not really utilizing the FastAI library which will apply the transformer objects you define in the start etc.

My notebook is extremely messy, but I think the relevant code is highlighted - do let me know if you miss something:

procs = [FillMissing, Categorify, Normalize]

# Test tabularlist
test = TabularList.from_df(df_test, cat_names=cat_names+bin_names, cont_names=cont_names, procs=procs)

# Train data bunch
data = (TabularList.from_df(df_train, path='.', cat_names=cat_names+bin_names, cont_names=cont_names, procs=procs)
                        .split_by_idx(valid_idx=val_idx)
                        .label_from_df(cols = dep_var, label_cls = FloatList)
                        .add_test(test)
                        .databunch(bs=5000))
                        
layer_1 = 100
layer_2 = 50
layer_softmax = 10 
ps = [0.001,0.01,0.01]
emb_drop = 0.04

layers=[layer_1,layer_2,(layer_softmax,nn.Softmax(dim=1))]

min_y = np.min(df_train['target'])*1.2
max_y = np.max(df_train['target'])*1.2
y_range = torch.tensor([min_y, max_y], device=defaults.device)

learn = tabular_learner(data, layers=[layer_1,layer_2,(layer_softmax,nn.Softmax(dim=1))], ps=ps, emb_drop=emb_drop, y_range=y_range, metrics=mae)

class L1LossFlat(nn.L1Loss):
    def forward(self, input:Tensor, target:Tensor) -> Rank0Tensor:
        return super().forward(input.view(-1), target.view(-1))

learn.loss_func = L1LossFlat()

max_lr = 1e-3
epochs = 2
learn.fit_one_cycle(epochs, slice(max_lr), wd = 1e-2)

class SaveFeatures():
    features=None
    def __init__(self, m): 
        self.hook = m.register_forward_hook(self.hook_fn)
        self.features = None
    def hook_fn(self, module, input, output): 
        out = output.detach().cpu().numpy()
        if isinstance(self.features, type(None)):
            self.features = out
        else:
            self.features = np.row_stack((self.features, out))
    def remove(self): 
        self.hook.remove()
        
## Output before the last FC layer
softmax_layer = SaveFeatures(learn.model.layers[9])

batch_size = learn.data.batch_size

# Normal predictions
y_hat, _ = learn.get_preds(ds_type=DatasetType.Test, n_batch=n_batch)


# Manual predictions from Softmax layer
profit_states_tensor = torch.Tensor(softmax_layer.features)
model_head = learn.model.layers[-3:]
model_head.train()
softmax_y_hat = model_head(profit_states_tensor)
softmax_y_hat = softmax_y_hat.detach()

arampacha · October 30, 2020, 2:40pm

You can iterate manually over your databunch to calculate softmax output while using preprocessed data. I don’t remember how exactly you would do that in code using fastai_v1 (which, as I understand you use here), but it shouldn’t be very difficult.
Two point I would recheck in the above code:

## Output before the last FC layer
softmax_layer = SaveFeatures(learn.model.layers[9])

here you need output right after sofmax layer, before batchnorm, not before last FC layer (which comment suggests it is).

here you need to do model_head.eval() instead of .train(), but I guess I’ve tried it both ways already.
Otherwise I cannot see why this code wouldn’t work. May be I’m also missing something, and someone else would find other potential problems

bobseboy · November 1, 2020, 6:58am

I’ll dig into the manual part I think. Its FastAI v1 yep.

If all steps in the learner object are accessible by index when refering to the layer, then I don’t really see how could miss anything in this way. Having that said I havent fully investigated the hooks.

In regards to

model_head.eval()

I’ve tried both yes. There was only a slight difference between the two. From the numbers it seems it is some kind of transformation that may be missing as the numbers from the softmax layer are much closer to zero.

Thanks!

bobseboy · November 23, 2020, 1:11pm

–> Solved the problem

The y_range I specified gave me some issues with the hook & manual dot products. I believe it may be due to the fact that it impacts the final prediction, which is then not applied when doing it “manually” (i.e. when applying the tail of the model, it will then not spit out the same predictions, cause the y_range is not applied manually). May dig further into this.