Wiki: Lesson 3

(As always, this is a wiki thread, so please edit it to help make it better).

<<< Wiki: Lesson 2Wiki: Lesson 4 >>>

Lesson resources


Extra Links

Lesson notes

How to create a submission file

Video timelines for Lesson 3

(Thanks to @EricPB)

  • 00:00:05 Cool guides & posts made by classmates
    • tmux, summary of lesson 2, learning rate finder, guide to Pytorch, learning rate vs batch size,
    • decoding ResNet architecture, beginner’s forum
  • 00:05:45 Where we go from here
  • 00:08:20 How to complete last week assignement “Dog breeds detection”
  • 00:08:55 How to download data from Kaggle (Kaggle CLI) or anywhere else
  • 00:12:05 Cool tip to download only the files you need: using CulrWget
  • 00:13:35 Dogs vs Cats example
  • 00:17:15 What means “Precompute = True” and “learn.bn_freeze”
  • 00:20:10 Intro & comparison to Keras with TensorFlow
  • 00:30:10 Porting PyTorch library to Keras+TensorFlow project
  • 00:32:30 Create a submission to Kaggle
  • 00:39:30 Making an individual prediction on a single file
  • 00:42:15 The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
  • 00:49:45 ConvNet demo with Excel,
    • filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer
  • Pause
  • 01:08:30 ConvNet demo with Excel (continued)
    • output, probabilities adding to 1, activation function, Softmax
  • 01:15:30 The mathematics you really need to understand for Deep Learning
    • Exponentiation & Logarithm
  • 01:20:30 Multi-label classification with Amazon Satellite competition
  • 01:33:35 Example of improving a “washed-out” image
  • 01:37:30 Seting different learning rates for different layers
  • 01:38:45 ‘data.resize()’ for speed-up, and ‘metrics=[f2]’ or ‘fbeta_score’ metric
  • 01:45:10 ‘sigmoid’ activation for multi-label
  • 01:47:30 Question on “Training only the last layers, not the initial freeze/frozen ones from ImageNet models”
    • ‘learn.unfreeze()’ advanced discussion
  • 01:56:30 Visualize your model with ‘learn.summary()’, shows ‘OrderedDict()’
  • 01:59:45 Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting”
    • Based on the Rossman Stores competitition
  • 02:05:30 Book: Python for Data Analysis, by Wes McKinney
  • 02:11:50 We save the dataframe with ‘Joined.to_feather()’ from Pandas, use ‘df = pd.read_feather()’ to load.
  • 02:13:30 Split Rossman columns in two types: categorical vs continuous

Due to the update of learn.TTA, minor changes are required for the first two kernels of the submission file.


Which notebook(s) are you referring to?

I encountered the missing one line item (could be the first or last item) when running learn.TTA(is_test = True) from various notebooks. This is from the (replicated) Dogs Breeds notebook.

After updated the git pull #78, learn.TTA() is working again. :smile: Thanks @FabianHertwig

At 1:39:55 Jeremy said that he is not going to be using any images bigger than
int(sz * 1.3) = int(64 * 1.3) = 83
But later he has used sz = 128 and sz = 256 and never has called data.resize function ever again. Are the images in this case scaled down from the original sizes to 128 and 256 pixels respectively, or scaled up from 83 pixels to 128 and 256?
Why has he not used a bigger coefficient than 1.3?


I suppose that he uses the data.resize when size is 64 because it takes too much time for the data loader to perform an on-demand scaling with a target dimension of 64.

On the other hand, on-demand data loader scaling from the original size to 128 or to 256 may take less time, so it’s pointless to do a pre-scale of all the images with data.resize. However, nothing prevents to do that. The resize it’s just to speed things up.

So, in the end, images are scaled by the DL from the original sizes to 128 and 256. As for the coefficient, I think it’s try and test to find a balance between quality and speed.

(These are my opinions, bear in mind I could be totally wrong :slight_smile: )

Edit: found Jeremy’s answer here.


Thought I’d share my notes for this lesson. These contain some screenshots from videos and a few notes/explanations from other sources I found useful. I also tried to make variable naming a bit easier to understand too (e.g., instead of “y” use “target_values”, instead of “lrs” use “differential_learning_rates”, etc.).

Minimum code to run image prediction:

Dogs vs. Cats implemented in Keras:

How to download data from Kaggle, run image recognition, create a submission:

Convolutional neural network theory:

Multi-label classification:


I am a little confused with this part:

Do I need to execute the line 31 then jump over 54, execute all sortings, reindexings, mergings, then save result in feather for train data, then go again to the line 54, execute it, and do the same operations for the test dataset. Am I understand this correctly?

That’s exactly right! :slight_smile:

1 Like

Sounds like you are running the crestle-only lines when not using crestle…

Yep and deleted the post. Conflated that and getting the kaggle images downloaded into the correct locations.

in the otavio demo, after the (first) maxpool layer, it overlays 2 (8x8?) images and then runs filters over that combination.
Can someone confirm how the images are combined, e.g. added? multiplied? (edit: jeremy adds in excel afterwards, so thats most likely the same)
thanks in advance

SOLVED: I got this working. I believe the code below is correct, but apparently you also need to make sure to call learn.predict() at least once BEFORE using learn.predict_array()? Otherwise the predictions from learn.predict_array() seem to be totally incorrect.

I’m playing with the dog breeds data, and specifically trying to make an individual prediction on a single file as described in the video around 00:39:30. I was having some problems getting the code in the video to work, but after searching the forums, I’ve come up with this (had to convert the image to np.array and make sure precompute=False):

im = val_tfms(np.array(
learn.precompute = False
preds = learn.predict_array(im[None])

What I really want is the name of the breed of the prediction, instead of just a number. Is the code below a correct way to get the breed name for the predicted value?[np.argmax(preds)]

That returns a label, such as ‘chihuahua’, but I’m not sure it is the correct way to get the breed name? Especially since when I’ve been playing with this I’ve tried about 20 different images, and NONE of the dogs in any of the images I’ve tried are actually of the breed that the above code predicts, even though the reported accuracy is greater than 0.924. So what am I doing wrong?

1 Like

Might be silly, but I get an error when running the “Individual prediction” code against the model I created on different data in exercise1.ipyb:

PATH = "/home/paperspace/data/"
fn = "exercise1/27752471_10215241041800086_6867930642288620096_n.png"; fn;
im =; im.load(); im
trn_tfms, val_tfms = tfms_from_model(arch, sz)
im_array = val_tfms(im)
preds = learn.predict_array(im_array[None])

It seems like it’s trying to get the shape of the image, but the property is just not there for some reason:

~/fastai/courses/dl1/fastai/ in do_transform(self, x)
    271     def do_transform(self, x):
--> 272         return scale_min(x,

~/fastai/courses/dl1/fastai/ in scale_min(im, targ)
     10         targ (int): target size
     11     """
---> 12     r,c,*_ = im.shape
     13     ratio = targ/min(r,c)
     14     sz = (scale_to(c, ratio, targ), scale_to(r, ratio, targ))

AttributeError: 'PngImageFile' object has no attribute 'shape'
1 Like

I think I could benefit from some general understanding of application. I understand (I think) pulling in the training and test data, fitting the model to the training data and verifying the model with the test data. What I am missing is how to take it to the step of taking the trained model and using it (applying it to a new image). For example, we have a cat/dog image classification model - I take a new photo - How do I pass the new image to the model and get back if it is a cat or a dog?

And, What if it was a picture of a horse? Will the model incorrectly classify as a dog or cat (with a low probability) or will it say ‘this is not a cat or dog’?

I got stuck in the satellite notebook, can anybody tell me how the sigmoid works for multi-label problem?
As Jeremy mentioned, we have to replace the softmax of the last layer with a sigmoid function, but what I don’t understand is, how could a sigmoid output a N-d vector? Or we have to train N sigmoids, each for one of the N labels?
So the question is how to calculate the loss of multi-label classification?
If you got any clue about this, please tell me some about the details, thanks!

hi @ibunny,

Yes you have it correct when you said:

Or we have to train N sigmoids, each for one of the N labels?

For multi-label problems (is_multi=True in ConvnetBuilder), fastai/pytorch automatically sets up N sigmoid activations in the final output layer training against data.classes (i.e. (‘agriculture’, 1.0), (‘artisinal_mine’, 0.0), (‘bare_ground’, 0.0),…).

Note that we also switch to F.binary_cross_entropy as the loss function vs negative log likelihood loss (F.nll_loss) since it’s a binary classification task for each label class:

class ConvLearner(Learner):
    def __init__(self, data, models, precompute=False, **kwargs):
        self.precompute = False
        super().__init__(data, models, **kwargs)
        if hasattr(data, 'is_multi'):
            self.crit = F.binary_cross_entropy if data.is_multi else F.nll_loss
            if data.is_reg: self.crit = F.l1_loss
            elif self.metrics is None:
                self.metrics = [accuracy_thresh(0.5)] if else [accuracy]
        if precompute: self.save_fc1()
        self.precompute = precompute

Everytime I can get exactly what I wanna know from your :star_struck:
So when I take a closer look at the structure of the NN in the satellite notebook, I found:

so does that mean we have 17 sigmoids, and we separately train them, for example, if the real label is: [1, 0, 1, 1, ....] and the sigmoid output is [0.8, 0.2, 0.7, 0.6, ...], so the loss for the 1st sigmoid is -(1*log0.8+0*log0.2) , and the loss for the 2nd sigmoid is -(0*log0.2+1*log0.8), and so on… and we just use the binary_cross_entropy for each sigmoid to train them separately? Or we have to add the loss up?

Good idea to look at the NN final layer outputs - confirms that it’s creating 17 sigmoids (for the 17 different label classes in the Planet dataset).

I believe pytorch automatically calculates the loss of each sigmoid output separately, takes the average of the 17, and displays that average value as trn_loss and val_loss. Looks like you can set it to sum losses instead of averaging them based on my reading of the pytorch 0.3.1 docs for binary_cross_entropy, size_average=True/False setting:

In the newest pytorch 0.4, looks like there’ll be a new parameter (reduce=True/False) that would “returns a loss per input/target element instead” of averaging or summing the losses: