Hello. Generally imagenet pretrained models generalize well in most situations except when the current task images look completely different from natural images (think spectograms from audio files or medical images like MRIs)

However you should experiment with both and see what works. I have an intuition its gonna work out but deep learning is best approached by actually doing experiments and seeing if they work

I think this is correct. If target is 1, then the output should also be 1, so that the log of that is zero (log(1)=0). And if the target is 0 then the output should also be zero so that the loss will be log(1-output) equals log(1) equals zero.

Hi Lucas,
If the target is 1, assuming output is 0.9, isnāt the loss supposed to be -log(1-0.9)?
If the target is 0, assuming output is 0.2, isnāt the loss supposed to be -log(0.2-0) = -log(0.2)?
Aka, the loss is the difference between output and the target.
So if the target is 1, it should be the difference between 1 and output.
If the target is 0, it should be the difference between 0 and output, in such case, is just the output itself.
Am I understanding it right or did I miss something?

Got it, somehow I was fixated by the thought that the loss gotta be the gap between output and target, thus should be 1-0.9, but didnāt realize that the whole negative log thing already encapsulates the gap concept, so you just need to pass in the output. I shouldāve looked at the negative log graph a little more. Thanks for help, Lucas!

In 06_multicat.ipynb (Google Colab), after the face center coordinates data gets loaded, when you do:
xb,yb = dls.one_batch()
xb.shape,yb.shape

the result is: (torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

I donāt understand why there is a 3 in the independant variable, 64 means there are 64 items in a mini batch, 240 and 320 are the transformed image size, where does the 3 come from? Can anybody explain?

In 06_multicat.ipynb (Google Colab), in the image regression part, after using lr_find() to get the best LR to be 1e-2, it called learn.fine_tune(3, lr). The results is:

Why is the validation loss consistently smaller than the training loss?
Is this because there is only one sample in the validation set, and there are many in the training set? And somehow this single validation sample is a good āaverageā of the training samples (aka, the model captures it quite well). thus, validation loss become much smaller than the training loss?