Poor submissions for Dogs Vs. Cats Resnet version

karthik_k314 · February 25, 2017, 1:37pm

Hi,

I’m trying to apply Resnet50 to the dogs vs cats competition. Previously I used VGG19 to get a log loss of 0.088. I was expecting better results with Resnet and I did get better results for the most part. My validation accuracy was .9890. I then try to submit the same to the competition but i get a loss of 1.6 something which is very weird. I’m reproducing relevant bits of the code here. Please take a look and let me know if I’m missing something.

  test_batches = get_batches(test_path,shuffle=False,batch_size=64)
  test_features = rn0.predict_generator(test_batches,test_batches.nb_sample)
 
  preds = model.predict(test_features,batch_size=128) #get predictions. 
 
   #How I'm clipping my predictions. 
   isdog = preds[:,1]
   isdog = np.clip(preds[:,1], 0.05, 0.975)
   isdog[:5]

I’m getting really confused here. It would be great if you can shed some light on this.
Thanks in advance.

EDIT: I made an issue with matching filenames while generating preds. I did get a slightly better result than VGG-19. I moved 5 places up in the leaderboard. I did get a slightly better result by adding batch norm to the average pooling model since i saw it was overfitting. Now trying Data Augmentation, etc.

radek · February 25, 2017, 3:25pm

Not sure if it is the reason, but you seem to be doing a lot of clipping. Also, is there a reason why you are doing it asymmetrically? 0.05 vs 0.975?

EDIT: looked at the math so it cannot account for such a large difference but still something worth looking at

karthik_k314 · February 25, 2017, 3:41pm

I was following the default notebook when it comes to clipping the values. I tried a different range of values that gave me different results on the leaderboard. This one seems to be giving a good result so far. But i didnt understand what you mean by clipping it “asymmetrically”.

radek · February 25, 2017, 3:43pm

0.05 from the bottom and 0.025 from the top. I’m going for same margin from both directions but guess it actually could make sense to treat them separately

karthik_k314 · February 25, 2017, 3:51pm

Also, I’m noticing something quite weird and i dont know why this happens. When i try to use just the resnet50 model to finetune and fit data it’s pretty off. Here’s a sample of my code. Do let me know if it makes sense as to why this is happening.

batches = get_batches(path+'train',shuffle=False,batch_size=64)
val_batches = get_batches(path+'valid',shuffle=False,batch_size=64)
res_default = Resnet50()
res_default.finetune(batches)
res_default.fit(batches,val_batches)
Epoch 1/1
22800/22800 [==============================] - 329s - loss: nan - acc: 0.5000 - val_loss: nan - val_acc: 0.5000

Is it because I’ve enabled shuffle=False? If i got thoughts about shuffle right it shouldnt make a difference when both my train and val batches have shuffle=False set right?

radek · February 25, 2017, 3:58pm

There is something more serious amiss - you are getting a nan value which can result if you do for example something like dividing by 0 or taking a log of 0. Once you get a nan value it can pollute all your other calculations.

karthik_k314 · February 25, 2017, 4:25pm

Exactly! I tried this in a new notebook and I’m still getting the exact same issue. I dont have an issue while running VGG or other custom convnets that i’m using. Just this issue with running resnet this way.

Can you try it on the redux dataset and see if it’s something you’re having an issue with also? That would help a lot.

karthik_k314 · February 25, 2017, 4:49pm

@jeremy is this normal? Any insight on what I could be doing wrong?

Thanks!

jeremy · February 25, 2017, 8:10pm

No not normal. Not shuffling is definitely an error, so you should at least fix that. I don’t know what other errors you have without seeing all your code.

karthik_k314 · February 25, 2017, 8:32pm

Right. So thought that not shuffling could be it and did attempt to create a new notebook where i’m just replicating the same code but with shuffling and i’m getting still nan as loss.
I’ve just created a gist for you to go through. I cant seem to find an obvious error or i’m making a very very stupid mistake.

gist.github.com

https://gist.github.com/Shoshin23/adfc367a0106b7515eb18c5104c46aee

redux-resnet.ipynb

from theano.sandbox import cuda
%matplotlib inline
path = "homework/dogcats/"
import utils; reload(utils)
from utils import plots

from __future__ import division,print_function

import os, json
from glob import glob

This file has been truncated. show original

Even · February 25, 2017, 10:31pm

I noticed this too and was just coming on to check. I think something’s incorrect with the resnet50.py implementation.

himanshu · March 10, 2017, 1:44pm

I am also getting loss=nan when using resnet50.py on Fisheries.

kelin-christi · March 27, 2017, 2:08am

I did too! Did you have any luck in figuring out the issue? I ran a couple of print statements, and the type of the resnet model was actually a keras.engine.training.model whereas the type of the Vgg16BN model was keras.model.sequential. I didn’t understand why that was the case, but I could not perform the pop() operation on the resnet50 model.

Please let me know if anybody else has had luck with resnet.

Thanks!

davecg · March 27, 2017, 3:34am

I used the built in ResNet50 model from Keras.applications successfully.

FYI it requires VGG preprocessing.

The other two (InceptionV3 and Xception) have their own preprocessing (just 2*(x/255 - 0.5)).

kelin-christi · March 27, 2017, 8:41pm

Silly mistake, forgot to precompute, and then add the remaining Fully Connected/GlobalAveragePooling2D layers. However, Jeremy got amazing results on cats vs dogs, but I’m trying it on the Fisheries contest and I’m getting really bad results. Did anybody have any luck with getting good results using the Resnet50 model on the Fisheries contest?

Thanks!

RogerS49 · April 24, 2017, 6:34am

It is interesting to view the 4th place interview

https://blog.kaggle.com/2017/04/03/dogs-vs-cats-redux-playground-competition-winners-interview-bojan-tunguz/

RogerS49 · April 25, 2017, 12:24pm

I had a problem with pop but model.layers.pop worked. Similarly model.add doesn’t work but model.layer.append works to some degree but then it complains about not being built!!!

binga · May 7, 2017, 3:04pm

I am getting the same error! Added shuffle=True and it doesn’t help. I’m trying ResNet on a dataset other than the cats vs dogs and the same issue remains. Was anybody able to solve this?

Estiui · May 17, 2017, 10:49am

I’m getting nan as loss values as well on a different dataset, and I’ve not been able to solve it yet

sschneider · June 22, 2017, 3:07pm

I’m also getting nan loss values with the provided resnet class. I’m working with the invasive species kaggle competition, which only has two classes, so I’ve tried:

Setting the final dense layer to have 1 output with sigmoid activation and set the loss function to binary_crossentropy
Setting the final dense layer to have 2 outputs with softmax activation and setting the loss function to categorical_crossentropy/mean_squared_error (both didn’t work)

Is there something else I’m missing here?