I have an OCR model that takes an image and produces characters. It includes
- a spatial transformer https://arxiv.org/abs/1506.02025
- a feature extractor (lower part of Inception)
- a LSTM with attention
If I train the whole model, 1) does not learn and gets stuck with a random transformation while 2) and 3) learn.
In order to make 1) learn, I have to
- freeze 1), train 2) and 3) for n steps
- freeze 2) and 3) and unfreeze 1). Train for n steps
I came up with this training strategy after experimenting. I currently use n = 5000. I’m sure it is not optimal and I am searching for papers that study this subject. Do you have suggestions?
This sounds pretty similar to the description of how a GAN works; alternatively freezing and training the generator and the discriminator.
Might be worth trying to abstract the process so that one “step” consists of doing one batch on each part of the network with the other part of the network frozen.
Thanks for your response. Unlike GANs my model has a single objective that is shared by all sub-modules of the network. The reason GAN alternates at each step is that you want both generator and discriminator to improve at the same pace.
I’ve tried training each sub-modules for small number of steps but it did not work. It has to be somewhat large. I currently use 5000 step.