I suspect that fastai again provides us with very sensible defaults hence I could probably answer my own question by just looking at the library, but wanted to ask - what are some usual amounts of dropout people use in the fully connected part of a CNN that work well?
If I were to go by fastai defaults, I believe it would be 0.25 between the activation layer and the layer before softmax, and 0.5 between last layer and softmax.
The reason I am asking is that those amounts intuitively seem quite high - dropping half of the layer just before softmax sounds quite extreme!
I can see how the answer likely could be along the lines that this seems to have work best across a vast spectrum of applications and in general additional experimentation could be of value. Also, this likely is very dataset specific. (Need to reduce bias -> remove dropout, reduce variance -> add dropout).
Maybe there is no quick and easy answer apart from the defaults being a good starting point and there not being really much information on this subject, but if there would be any reading on this subject or any info that one might share that would be greatly appreciated (as in, some successful models, etc).
PS. As I was finishing this post I thought of googling this - I entered the name of one of my favorite writers and the term dropout and this is what I got:
I think the paper by Srivastava et al. is probably a great source of insight - planning to read it throughly, but if anyone would have any other materials they found useful or could shed some light on this please do share