Thanks Jeremy. So I did more training with even larger LR for earlier layers (assuming the images are way different than imagenet and wanted to experiment beyond provided suggestions):
lrs = np.array([lr/4,lr/2,lr])
th = opt_th(val_prob, y, start=0.14, end=0.30, step=0.005)
This time I ended up with th of 0.195, private score stayed the same: 0.93007. Love the idea of empirically deriving the threshold from validation data, kind of like performing probability calibration using hold out data.
I have a number of questions. I didn’t see these answered anywhere else but sorry if I missed it!
@jeremy mentioned in class that because ImageNet was trained on, for example, sz=299 images, if we unfreeze the layers and train them with a different size like sz=64, it will clobber the ImageNet weights and ruin the earlier layers. But it seems like in the lesson 2 notebook, that’s exactly what he does. After he trains the last layers, he then unfreezes and trains the whole thing. Why doesn’t that clobber the earlier weights since the training is happening with a different size image?
Why do we sometimes use sz=224 and sometimes use sz=299? Is it about which size that particular architecture was trained on? So resnet34 requires sz=224 and resnext50 requires sz=299? (By “required” I mean that sz should be used to avoid clobbering the earlier weights if you unfreeze all layers.)
What is the point of changing the size of the images, freezing the layer, training, unfreezing and fitting again. Is that intended to counter overfitting?
What is happening to the layers of the network when we call learn.set_data(). Is it stripping off later layers and adding new ones or is just adding layers onto the end?
We only used data.resize() when we set sz=64. Since it’s so much faster, why didn’t we use that function again when we had the other sizes? Is that only included in the notebook to show a faster way resizing could be done, even though it’s not really necessary on this data set?
That’s right - but in this case, our images (satellite) are very different to what imagenet is trained on (standard photos), so I don’t expect to need to keep all the details of the pretrained weights.
Thanks @jeremy. Taking your responses together with the fact that we freeze the convolutional layers when we change the image sizes with set_data, train them, and then unfreeze the layers and then train again, suggest that we are still worried about the impact of different image sizes on the weights in the convolutional layer, even though these images are significantly different from ImageNet.
Is the thinking behind the freezing and unfreezing when we change sizes that when you change the sizes of the images, the weights in the fully connected layers, although they aren’t random anymore, really should be tuned to the new image sizes before we unfreeze and train the convolutional layers on the new image sizes? Is this something you just learned from trial and error or is there a theory you can articulate behind this?
I get that you shouldn’t unfreeze the convolutional layers when the fully-connected layers are initially random, but I guess I’m having trouble getting comfortable extending that insight to when we’ve already trained the fully connected layers, albeit on differently sized images.
Thanks for this. I am trying to submit my first submission for this competition but a bit confused about learn.TTA(). Its Docstring indicates that the outputs are log_preds but you are treating them as probs, since later you compare them with threshold. And yet it seems you are getting great results. Why is that?
Thanks. I looked at the link you provided but it does not seem to be related to what I was asking! An any rate, do you remember if the output of learn.TTA() was log_pred or prob when you posted the above link 16 days ago?
I’m getting predictions with learn.TTA(), but I’m getting strange results. I take the mean just like in lesson2, but I get very bad f2 metric on validation set unless I use thrshold of 1.21. Does this mean that instead of logs of predictions TTA returns log§ + 1 ?
Have you resolved this issue? I was getting similar results (0.48 accuracy). It seems that the newer version of fastai.learn.TTA() does output probabilities, not log_probs. So instead of taking the exp and then the mean, I just took the mean (raw_preds = learn.TTA(), preds=np.mean(raw_preds,0) ) and then accuracy results improved to 0.92. @jeremy can you comment on this?
Also, I have 16 G RAM but whenever I try to run learn.TTA(is_test=True), my kernel dies. I have tried adding up to 48 G of swapfiles to my NVMe SSD drive but it won’t help. Any suggestion?