Hello. Generally imagenet pretrained models generalize well in most situations except when the current task images look completely different from natural images (think spectograms from audio files or medical images like MRIs)
However you should experiment with both and see what works. I have an intuition its gonna work out but deep learning is best approached by actually doing experiments and seeing if they work
I think this is correct. If target is 1, then the output should also be 1, so that the log of that is zero (log(1)=0). And if the target is 0 then the output should also be zero so that the loss will be log(1-output) equals log(1) equals zero.
If the target is 1, assuming output is 0.9, isn’t the loss supposed to be -log(1-0.9)?
If the target is 0, assuming output is 0.2, isn’t the loss supposed to be -log(0.2-0) = -log(0.2)?
Aka, the loss is the difference between output and the target.
So if the target is 1, it should be the difference between 1 and output.
If the target is 0, it should be the difference between 0 and output, in such case, is just the output itself.
Am I understanding it right or did I miss something?
Got it, somehow I was fixated by the thought that the loss gotta be the gap between output and target, thus should be 1-0.9, but didn’t realize that the whole negative log thing already encapsulates the gap concept, so you just need to pass in the output. I should’ve looked at the negative log graph a little more. Thanks for help, Lucas!
In 06_multicat.ipynb (Google Colab), after the face center coordinates data gets loaded, when you do:
xb,yb = dls.one_batch()
the result is: (torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))
I don’t understand why there is a 3 in the independant variable, 64 means there are 64 items in a mini batch, 240 and 320 are the transformed image size, where does the 3 come from? Can anybody explain?
Why is the validation loss consistently smaller than the training loss?
Is this because there is only one sample in the validation set, and there are many in the training set? And somehow this single validation sample is a good “average” of the training samples (aka, the model captures it quite well). thus, validation loss become much smaller than the training loss?