I am working on dogsvscats, redoing what @jeremy presented in lecture 3.
I get this strange result that I cannot find explanation for. What I do is the following:
- I save features, train the vgg16 FC layers with dropout at 0.5 (not touching that at all).
- Create a model of just the FC layers with dropout set to 0, adjusting weights.
- I do not do any training - as a sanity check I run evaluate_generator on both models with validation data.
Old FC model with Dropout gives me:
[0.14896876902467668, 0.98760000000000003]
(first value is categorical crossentropy, 2nd accuracy).
With the new model without Dropout (p=0), I get:
[0.045561284964613716, 0.98760000000000003]
How is it possible that the values are different and yet the accuracy is the same? I cannot find an explanation for this.
It gets even weirder as when I do the same steps but with values of 0.1 and 0.2 for p, I get the results as below:
p = 0.1: [0.057281951369761212, 0.98760000000000003]
p = 0.2: [0.072925884064820298, 0.98760000000000003]
Seems that evaluate generator is behaving as if it was running in training for the logloss but not for accuracy?
And what is even stranger, if I have
- p = predictions of model with dropout
- pd = predictions of model without dropout
I get:
np.mean(p-pd)
-5.58385e-10
but:
np.sort(p.reshape((1, -1)) - pd.reshape((1, -1)))
array([[-0.36295992, -0.36100584, -0.36085624, …, 0.36085624,
0.3610059 , 0.36295998]], dtype=float32)
I am sorry - I just realized this is hard to read. Will try to use some different way for posting going forward - either linking to github for code or doing something else.
If anyone would have any idea what might be happening here, I would really appreciate your help.
BTW I cut off the FC layers at the bottom of the first FC layer and not how @jeremy did in the lecture (below flatten IIRC), but I do not think that that should make a difference.