What do you think of such approach while fine-tuning?
Freeze some first N layer blocks and not freeze any after them while ALL batch normalization stay trainable:
Could anyone explain this way of fine-tuning?
What do you think of such approach while fine-tuning?
Freeze some first N layer blocks and not freeze any after them while ALL batch normalization stay trainable:
Could anyone explain this way of fine-tuning?