After some effort, I got it partially working. It reached 92.1% accuracy (0.3% bellow my previous model). Some remarks:
- I think there is some bug in my code, since when I try to run fit_one_cycle, after unfreezing and lr_find I get a “can’t optimize a non-leaf Tensor” error (it only happen when unfreezing and running lr_find before training). I wasn’t able to find out what is causing it.
- I only created two groups layers: the first containing the tabular and nlp models and the second containing the last linear layers. So, it is leveraging just partially on discriminative learning rates, using the same LR for all the layers inside tabular and nlp models. I think this can be specially harmful for the nlp model, updating the weights aggressively in the early layers, destroying part of the pre-trained weights.
Edited: I managed to set the layer_groups properly. In addition to minor tweaks (increase wd and dropout), I reached 92.3%).
I’d love to get some feedback and possible improvements.