Lesson 2 - Official Topic

Hello. Generally imagenet pretrained models generalize well in most situations except when the current task images look completely different from natural images (think spectograms from audio files or medical images like MRIs)

However you should experiment with both and see what works. I have an intuition its gonna work out but deep learning is best approached by actually doing experiments and seeing if they work

Yes on the above question. One image is augmented to multiple versions and is used for training.

a quick fix:

Fix Truncated Images Error:

  • Import the following:
from PIL import ImageFile
  • This is to force PIL to open truncated images that it would normally not open, such as:

    • An image with a file extension that does not match its type (e.g. jpg file extension with a .png file type)
    • An image with an alpha channel that PIL does not support (e.g. RGBA)
  • Add the import before running your learner:

    • learn = vision_learner(dls, resnet18, metrics=error_rate)

    • learn.fine_tune(4)


Did anybody notice a mistake in course book?

in Chapter 6, Multi-Category, (06_multicat.ipynb):

def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()

1-inputs should be before inputs, right? if you want to get the correct loss. Or am I wrong?

Somebody please clarify, thanks!

Hi @zhhisdn,

I think this is correct. If target is 1, then the output should also be 1, so that the log of that is zero (log(1)=0). And if the target is 0 then the output should also be zero so that the loss will be log(1-output) equals log(1) equals zero.

Hi Lucas,
If the target is 1, assuming output is 0.9, isn’t the loss supposed to be -log(1-0.9)?
If the target is 0, assuming output is 0.2, isn’t the loss supposed to be -log(0.2-0) = -log(0.2)?
Aka, the loss is the difference between output and the target.
So if the target is 1, it should be the difference between 1 and output.
If the target is 0, it should be the difference between 0 and output, in such case, is just the output itself.
Am I understanding it right or did I miss something?


If the target is 1, assuming output is 0.9, isn’t the loss supposed to be -log(1-0.9)?

No it should be -log(0.9).

You should have a look at the shape of f(x) = -log(x). See here: -log(x) from x=0 to 1 - Wolfram|Alpha

So f(x) [the loss] goes to zero whenever x goes to 1.

So if the target = 1 then the output should be close to 1 as well, so that f(x) goes to zero.

-log(1-0.9) = 1 whereas -log(0.9) = 0.045

1 Like

Got it, somehow I was fixated by the thought that the loss gotta be the gap between output and target, thus should be 1-0.9, but didn’t realize that the whole negative log thing already encapsulates the gap concept, so you just need to pass in the output. I should’ve looked at the negative log graph a little more. Thanks for help, Lucas!

1 Like

In 06_multicat.ipynb (Google Colab), after the face center coordinates data gets loaded, when you do:
xb,yb = dls.one_batch()

the result is: (torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

I don’t understand why there is a 3 in the independant variable, 64 means there are 64 items in a mini batch, 240 and 320 are the transformed image size, where does the 3 come from? Can anybody explain?


The image has 3 color channels. Commonly RGB, but other formats exist too.

I see. Thanks Allen!

In 06_multicat.ipynb (Google Colab), in the image regression part, after using lr_find() to get the best LR to be 1e-2, it called learn.fine_tune(3, lr). The results is:


Why is the validation loss consistently smaller than the training loss?
Is this because there is only one sample in the validation set, and there are many in the training set? And somehow this single validation sample is a good “average” of the training samples (aka, the model captures it quite well). thus, validation loss become much smaller than the training loss?

I asked ChatGPT the above question, and it gave a pretty good answer:

The situation where the validation loss is consistently smaller than the training loss is quite unusual. Typically, you would expect the training loss to be lower than the validation loss, as the model has seen the training data and should be better at fitting it.

In the case you’ve described, the validation loss being consistently smaller than the training loss might be due to several reasons:

  1. Small validation set: As you pointed out, if the validation set is too small, the validation loss might not be representative of the model’s performance on unseen data. In this case, the single validation sample might just happen to be easy for the model to predict, leading to a lower validation loss.
  2. Regularization and data augmentation: During training, the model may be subjected to regularization techniques such as dropout or weight decay, which can increase the training loss. Additionally, data augmentation techniques may be applied to the training set, making the training samples more challenging to predict. These techniques are generally not applied to the validation set, which can lead to lower validation loss in some cases.
  3. Unbalanced data: If the training and validation sets are not well-balanced, the validation set might be composed of easier samples, leading to a lower validation loss. Ideally, you would want the training and validation sets to have similar distributions so that the model generalizes well to unseen data.

In your case, it seems that the small validation set might be the main reason for the observed discrepancy between the training and validation losses. To get a better understanding of your model’s performance, you could try increasing the size of the validation set or using k-fold cross-validation to ensure that the model’s performance is assessed on multiple validation sets.

When would ChatGPT able to fetch itself training data and improve the algorithms as to evolve itself? seems not far away :upside_down_face: