Lesson 2 - Official Topic

Hello. Generally imagenet pretrained models generalize well in most situations except when the current task images look completely different from natural images (think spectograms from audio files or medical images like MRIs)

However you should experiment with both and see what works. I have an intuition its gonna work out but deep learning is best approached by actually doing experiments and seeing if they work

Yes on the above question. One image is augmented to multiple versions and is used for training.

a quick fix:

Fix Truncated Images Error:

  • Import the following:
from PIL import ImageFile
  • This is to force PIL to open truncated images that it would normally not open, such as:

    • An image with a file extension that does not match its type (e.g. jpg file extension with a .png file type)
    • An image with an alpha channel that PIL does not support (e.g. RGBA)
  • Add the import before running your learner:

    • learn = vision_learner(dls, resnet18, metrics=error_rate)

    • learn.fine_tune(4)


Did anybody notice a mistake in course book?

in Chapter 6, Multi-Category, (06_multicat.ipynb):

def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()

1-inputs should be before inputs, right? if you want to get the correct loss. Or am I wrong?

Somebody please clarify, thanks!

Hi @zhhisdn,

I think this is correct. If target is 1, then the output should also be 1, so that the log of that is zero (log(1)=0). And if the target is 0 then the output should also be zero so that the loss will be log(1-output) equals log(1) equals zero.

Hi Lucas,
If the target is 1, assuming output is 0.9, isn’t the loss supposed to be -log(1-0.9)?
If the target is 0, assuming output is 0.2, isn’t the loss supposed to be -log(0.2-0) = -log(0.2)?
Aka, the loss is the difference between output and the target.
So if the target is 1, it should be the difference between 1 and output.
If the target is 0, it should be the difference between 0 and output, in such case, is just the output itself.
Am I understanding it right or did I miss something?


If the target is 1, assuming output is 0.9, isn’t the loss supposed to be -log(1-0.9)?

No it should be -log(0.9).

You should have a look at the shape of f(x) = -log(x). See here: -log(x) from x=0 to 1 - Wolfram|Alpha

So f(x) [the loss] goes to zero whenever x goes to 1.

So if the target = 1 then the output should be close to 1 as well, so that f(x) goes to zero.

-log(1-0.9) = 1 whereas -log(0.9) = 0.045

1 Like

Got it, somehow I was fixated by the thought that the loss gotta be the gap between output and target, thus should be 1-0.9, but didn’t realize that the whole negative log thing already encapsulates the gap concept, so you just need to pass in the output. I should’ve looked at the negative log graph a little more. Thanks for help, Lucas!

1 Like

In 06_multicat.ipynb (Google Colab), after the face center coordinates data gets loaded, when you do:
xb,yb = dls.one_batch()

the result is: (torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

I don’t understand why there is a 3 in the independant variable, 64 means there are 64 items in a mini batch, 240 and 320 are the transformed image size, where does the 3 come from? Can anybody explain?


The image has 3 color channels. Commonly RGB, but other formats exist too.

I see. Thanks Allen!

In 06_multicat.ipynb (Google Colab), in the image regression part, after using lr_find() to get the best LR to be 1e-2, it called learn.fine_tune(3, lr). The results is:


Why is the validation loss consistently smaller than the training loss?
Is this because there is only one sample in the validation set, and there are many in the training set? And somehow this single validation sample is a good “average” of the training samples (aka, the model captures it quite well). thus, validation loss become much smaller than the training loss?