It was something folks in the comp discussed on the kaggle forum - just some trial and error really…
Thanks Jeremy. So I did more training with even larger LR for earlier layers (assuming the images are way different than imagenet and wanted to experiment beyond provided suggestions):
lrs = np.array([lr/4,lr/2,lr])
th = opt_th(val_prob, y, start=0.14, end=0.30, step=0.005)
This time I ended up with th of 0.195, private score stayed the same: 0.93007. Love the idea of empirically deriving the threshold from validation data, kind of like performing probability calibration using hold out data.
I have a number of questions. I didn’t see these answered anywhere else but sorry if I missed it!
@jeremy mentioned in class that because ImageNet was trained on, for example, sz=299 images, if we unfreeze the layers and train them with a different size like sz=64, it will clobber the ImageNet weights and ruin the earlier layers. But it seems like in the lesson 2 notebook, that’s exactly what he does. After he trains the last layers, he then unfreezes and trains the whole thing. Why doesn’t that clobber the earlier weights since the training is happening with a different size image?
Why do we sometimes use sz=224 and sometimes use sz=299? Is it about which size that particular architecture was trained on? So resnet34 requires sz=224 and resnext50 requires sz=299? (By “required” I mean that sz should be used to avoid clobbering the earlier weights if you unfreeze all layers.)
What is the point of changing the size of the images, freezing the layer, training, unfreezing and fitting again. Is that intended to counter overfitting?
What is happening to the layers of the network when we call learn.set_data(). Is it stripping off later layers and adding new ones or is just adding layers onto the end?
We only used data.resize() when we set sz=64. Since it’s so much faster, why didn’t we use that function again when we had the other sizes? Is that only included in the notebook to show a faster way resizing could be done, even though it’s not really necessary on this data set?
- 224 and 299 are standard Image sizes(jeremy mentioned once)
- We need to change size as Very tiny images wont make any sense…
- Regarding freezing and Unfreezing, We are trying to improve our accuracy, preventiong Overfitting and trying to minimise the loss…
That’s right - but in this case, our images (satellite) are very different to what imagenet is trained on (standard photos), so I don’t expect to need to keep all the details of the pretrained weights.
Yes it depends what it was originally trained on. We don’t have to use the same size it was trained on, but sometimes you get better results if you do.
Yes, changing size is designed to avoid overfitting.
set_data doesn’t change the model at all. It just gives it new data to train with.
Once the input size is reasonably big, the preprocessing no longer is the bottleneck, so resizing to 128x128 or larger doesn’t really help.
Thanks @jeremy. Taking your responses together with the fact that we freeze the convolutional layers when we change the image sizes with
set_data, train them, and then unfreeze the layers and then train again, suggest that we are still worried about the impact of different image sizes on the weights in the convolutional layer, even though these images are significantly different from ImageNet.
Is the thinking behind the freezing and unfreezing when we change sizes that when you change the sizes of the images, the weights in the fully connected layers, although they aren’t random anymore, really should be tuned to the new image sizes before we unfreeze and train the convolutional layers on the new image sizes? Is this something you just learned from trial and error or is there a theory you can articulate behind this?
I get that you shouldn’t unfreeze the convolutional layers when the fully-connected layers are initially random, but I guess I’m having trouble getting comfortable extending that insight to when we’ve already trained the fully connected layers, albeit on differently sized images.
Thanks for this. I am trying to submit my first submission for this competition but a bit confused about learn.TTA(). Its Docstring indicates that the outputs are log_preds but you are treating them as probs, since later you compare them with threshold. And yet it seems you are getting great results. Why is that?
There are few changes made to the learn.TTA() by Jeremy few days back…
Have a look there and a search in the forum might help…
This link might help…
Thanks. I looked at the link you provided but it does not seem to be related to what I was asking! An any rate, do you remember if the output of learn.TTA() was log_pred or prob when you posted the above link 16 days ago?
TTA now returns class probability for each n_aug so you need to:
log_preds,y = learn.TTA() preds = np.mean(np.exp(log_preds),0)
This should work…
I’m getting predictions with learn.TTA(), but I’m getting strange results. I take the mean just like in lesson2, but I get very bad f2 metric on validation set unless I use thrshold of 1.21. Does this mean that instead of logs of predictions TTA returns log§ + 1 ?
Since we can’t see the code where you use
TTA, it’s hard to know what’s happening here…
Looks fine - something else is going on in your model… I don’t think it’s specific to TTA.
I see. Thank you so much. I was wondering why I am getting a three dimensional output but did not realize the code had been changed.
Have you resolved this issue? I was getting similar results (0.48 accuracy). It seems that the newer version of fastai.learn.TTA() does output probabilities, not log_probs. So instead of taking the exp and then the mean, I just took the mean (raw_preds = learn.TTA(), preds=np.mean(raw_preds,0) ) and then accuracy results improved to 0.92.
@jeremy can you comment on this?
Also, I have 16 G RAM but whenever I try to run learn.TTA(is_test=True), my kernel dies. I have tried adding up to 48 G of swapfiles to my NVMe SSD drive but it won’t help. Any suggestion?
In this particular competition I had reduced my image size to 300x300 and then it didn’t run out of memory…