Poor submissions for Dogs Vs. Cats Resnet version

Hi,

I’m trying to apply Resnet50 to the dogs vs cats competition. Previously I used VGG19 to get a log loss of 0.088. I was expecting better results with Resnet and I did get better results for the most part. My validation accuracy was .9890. I then try to submit the same to the competition but i get a loss of 1.6 something which is very weird. I’m reproducing relevant bits of the code here. Please take a look and let me know if I’m missing something.

  test_batches = get_batches(test_path,shuffle=False,batch_size=64)
  test_features = rn0.predict_generator(test_batches,test_batches.nb_sample)
 
  preds = model.predict(test_features,batch_size=128) #get predictions. 
 
   #How I'm clipping my predictions. 
   isdog = preds[:,1]
   isdog = np.clip(preds[:,1], 0.05, 0.975)
   isdog[:5]

I’m getting really confused here. It would be great if you can shed some light on this. :slight_smile:
Thanks in advance.

EDIT: I made an issue with matching filenames while generating preds. I did get a slightly better result than VGG-19. I moved 5 places up in the leaderboard. I did get a slightly better result by adding batch norm to the average pooling model since i saw it was overfitting. Now trying Data Augmentation, etc.

Not sure if it is the reason, but you seem to be doing a lot of clipping. Also, is there a reason why you are doing it asymmetrically? 0.05 vs 0.975?

EDIT: looked at the math so it cannot account for such a large difference but still something worth looking at

I was following the default notebook when it comes to clipping the values. I tried a different range of values that gave me different results on the leaderboard. This one seems to be giving a good result so far. But i didnt understand what you mean by clipping it “asymmetrically”.

0.05 from the bottom and 0.025 from the top. I’m going for same margin from both directions but guess it actually could make sense to treat them separately

Also, I’m noticing something quite weird and i dont know why this happens. When i try to use just the resnet50 model to finetune and fit data it’s pretty off. Here’s a sample of my code. Do let me know if it makes sense as to why this is happening.

batches = get_batches(path+'train',shuffle=False,batch_size=64)
val_batches = get_batches(path+'valid',shuffle=False,batch_size=64)
res_default = Resnet50()
res_default.finetune(batches)
res_default.fit(batches,val_batches)
Epoch 1/1
22800/22800 [==============================] - 329s - loss: nan - acc: 0.5000 - val_loss: nan - val_acc: 0.5000

Is it because I’ve enabled shuffle=False? If i got thoughts about shuffle right it shouldnt make a difference when both my train and val batches have shuffle=False set right?

1 Like

There is something more serious amiss - you are getting a nan value which can result if you do for example something like dividing by 0 or taking a log of 0. Once you get a nan value it can pollute all your other calculations.

Exactly! I tried this in a new notebook and I’m still getting the exact same issue. I dont have an issue while running VGG or other custom convnets that i’m using. Just this issue with running resnet this way.

Can you try it on the redux dataset and see if it’s something you’re having an issue with also? That would help a lot.

@jeremy is this normal? Any insight on what I could be doing wrong?

Thanks!

No not normal. Not shuffling is definitely an error, so you should at least fix that. I don’t know what other errors you have without seeing all your code.

Right. So thought that not shuffling could be it and did attempt to create a new notebook where i’m just replicating the same code but with shuffling and i’m getting still nan as loss.
I’ve just created a gist for you to go through. I cant seem to find an obvious error or i’m making a very very stupid mistake. :confused:

I noticed this too and was just coming on to check. I think something’s incorrect with the resnet50.py implementation.

I am also getting loss=nan when using resnet50.py on Fisheries.

I did too! Did you have any luck in figuring out the issue? I ran a couple of print statements, and the type of the resnet model was actually a keras.engine.training.model whereas the type of the Vgg16BN model was keras.model.sequential. I didn’t understand why that was the case, but I could not perform the pop() operation on the resnet50 model.

Please let me know if anybody else has had luck with resnet.

Thanks!

I used the built in ResNet50 model from Keras.applications successfully.

FYI it requires VGG preprocessing.

The other two (InceptionV3 and Xception) have their own preprocessing (just 2*(x/255 - 0.5)).

Silly mistake, forgot to precompute, and then add the remaining Fully Connected/GlobalAveragePooling2D layers. However, Jeremy got amazing results on cats vs dogs, but I’m trying it on the Fisheries contest and I’m getting really bad results. Did anybody have any luck with getting good results using the Resnet50 model on the Fisheries contest?

Thanks!

It is interesting to view the 4th place interview

https://blog.kaggle.com/2017/04/03/dogs-vs-cats-redux-playground-competition-winners-interview-bojan-tunguz/

I had a problem with pop but model.layers.pop worked. Similarly model.add doesn’t work but model.layer.append works to some degree but then it complains about not being built!!!

I am getting the same error! Added shuffle=True and it doesn’t help. I’m trying ResNet on a dataset other than the cats vs dogs and the same issue remains. Was anybody able to solve this?

I’m getting nan as loss values as well on a different dataset, and I’ve not been able to solve it yet :confused:

I’m also getting nan loss values with the provided resnet class. I’m working with the invasive species kaggle competition, which only has two classes, so I’ve tried:

  • Setting the final dense layer to have 1 output with sigmoid activation and set the loss function to binary_crossentropy
  • Setting the final dense layer to have 2 outputs with softmax activation and setting the loss function to categorical_crossentropy/mean_squared_error (both didn’t work)

Is there something else I’m missing here?