Wiki: Lesson 3

rachel · January 2, 2018, 11:36pm

(As always, this is a wiki thread, so please edit it to help make it better).

<<< Wiki: Lesson 2 ｜ Wiki: Lesson 4 >>>

Lesson resources

Links

Notes from @reshama on AWS Setup and tmux and many more
Summary of lesson 2 from @apil.tamang
Learning rate finder article from @surmenok
Convolutional Neural Networks in 5 Minutes by @init_27
Visualizing Learning Rate vs Batch Size by @miguel_perez
Decoding the ResNet Architecture by @anandsaha
A practitioner’s guide to PyTorch by @radek

Extra Links

Lesson notes

How to create a submission file

Video timelines for Lesson 3

(Thanks to @EricPB)

00:00:05 Cool guides & posts made by Fast.ai classmates
- tmux, summary of lesson 2, learning rate finder, guide to Pytorch, learning rate vs batch size,
- decoding ResNet architecture, beginner’s forum
00:05:45 Where we go from here
00:08:20 How to complete last week assignement “Dog breeds detection”
00:08:55 How to download data from Kaggle (Kaggle CLI) or anywhere else
00:12:05 Cool tip to download only the files you need: using CulrWget
00:13:35 Dogs vs Cats example
00:17:15 What means “Precompute = True” and “learn.bn_freeze”
00:20:10 Intro & comparison to Keras with TensorFlow
00:30:10 Porting PyTorch fast.ai library to Keras+TensorFlow project
00:32:30 Create a submission to Kaggle
00:39:30 Making an individual prediction on a single file
00:42:15 The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
00:49:45 ConvNet demo with Excel,
- filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer
Pause
01:08:30 ConvNet demo with Excel (continued)
- output, probabilities adding to 1, activation function, Softmax
01:15:30 The mathematics you really need to understand for Deep Learning
- Exponentiation & Logarithm
01:20:30 Multi-label classification with Amazon Satellite competition
01:33:35 Example of improving a “washed-out” image
01:37:30 Seting different learning rates for different layers
01:38:45 ‘data.resize()’ for speed-up, and ‘metrics=[f2]’ or ‘fbeta_score’ metric
01:45:10 ‘sigmoid’ activation for multi-label
01:47:30 Question on “Training only the last layers, not the initial freeze/frozen ones from ImageNet models”
- ‘learn.unfreeze()’ advanced discussion
01:56:30 Visualize your model with ‘learn.summary()’, shows ‘OrderedDict()’
01:59:45 Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting”
- Based on the Rossman Stores competitition
02:05:30 Book: Python for Data Analysis, by Wes McKinney
02:11:50 We save the dataframe with ‘Joined.to_feather()’ from Pandas, use ‘df = pd.read_feather()’ to load.
02:13:30 Split Rossman columns in two types: categorical vs continuous

Moody · January 3, 2018, 6:00am

Due to the update of learn.TTA, minor changes are required for the first two kernels of the submission file.

jeremy · January 4, 2018, 1:06am

Which notebook(s) are you referring to?

Moody · January 4, 2018, 1:33am

I encountered the missing one line item (could be the first or last item) when running learn.TTA(is_test = True) from various notebooks. This is from the (replicated) Dogs Breeds notebook.

Moody · January 5, 2018, 7:50am

After updated the git pull #78, learn.TTA() is working again. Thanks @FabianHertwig

grez911 · January 21, 2018, 6:31am

At 1:39:55 Jeremy said that he is not going to be using any images bigger than
int(sz * 1.3) = int(64 * 1.3) = 83
But later he has used sz = 128 and sz = 256 and never has called data.resize function ever again. Are the images in this case scaled down from the original sizes to 128 and 256 pixels respectively, or scaled up from 83 pixels to 128 and 256?
Why has he not used a bigger coefficient than 1.3?

giusvit · January 22, 2018, 12:11pm

I suppose that he uses the data.resize when size is 64 because it takes too much time for the data loader to perform an on-demand scaling with a target dimension of 64.

On the other hand, on-demand data loader scaling from the original size to 128 or to 256 may take less time, so it’s pointless to do a pre-scale of all the images with data.resize. However, nothing prevents to do that. The resize it’s just to speed things up.

So, in the end, images are scaled by the DL from the original sizes to 128 and 256. As for the coefficient, I think it’s try and test to find a balance between quality and speed.

(These are my opinions, bear in mind I could be totally wrong )

Edit: found Jeremy’s answer here.

pekoto · January 28, 2018, 2:15am

Thought I’d share my notes for this lesson. These contain some screenshots from videos and a few notes/explanations from other sources I found useful. I also tried to make variable naming a bit easier to understand too (e.g., instead of “y” use “target_values”, instead of “lrs” use “differential_learning_rates”, etc.).

Minimum code to run image prediction:

Dogs vs. Cats implemented in Keras:

How to download data from Kaggle, run image recognition, create a submission:

Convolutional neural network theory:

Multi-label classification:

grez911 · January 28, 2018, 6:05am

I am a little confused with this part:

Do I need to execute the line 31 then jump over 54, execute all sortings, reindexings, mergings, then save result in feather for train data, then go again to the line 54, execute it, and do the same operations for the test dataset. Am I understand this correctly?

jeremy · January 28, 2018, 2:35pm

That’s exactly right!

jeremy · January 31, 2018, 1:11am

Sounds like you are running the crestle-only lines when not using crestle…

jmoney · January 31, 2018, 1:45am

Yep and deleted the post. Conflated that and getting the kaggle images downloaded into the correct locations.

Brad_S · February 2, 2018, 2:42pm

in the otavio demo, after the (first) maxpool layer, it overlays 2 (8x8?) images and then runs filters over that combination.
Can someone confirm how the images are combined, e.g. added? multiplied? (edit: jeremy adds in excel afterwards, so thats most likely the same)
thanks in advance

MattRC · February 14, 2018, 7:30pm

SOLVED: I got this working. I believe the code below is correct, but apparently you also need to make sure to call learn.predict() at least once BEFORE using learn.predict_array()? Otherwise the predictions from learn.predict_array() seem to be totally incorrect.

I’m playing with the dog breeds data, and specifically trying to make an individual prediction on a single file as described in the video around 00:39:30. I was having some problems getting the code in the video to work, but after searching the forums, I’ve come up with this (had to convert the image to np.array and make sure precompute=False):

im = val_tfms(np.array(Image.open(PATH+file_name)))
learn.precompute = False
preds = learn.predict_array(im[None])
np.argmax(preds)

What I really want is the name of the breed of the prediction, instead of just a number. Is the code below a correct way to get the breed name for the predicted value?

learn.data.classes[np.argmax(preds)]

That returns a label, such as ‘chihuahua’, but I’m not sure it is the correct way to get the breed name? Especially since when I’ve been playing with this I’ve tried about 20 different images, and NONE of the dogs in any of the images I’ve tried are actually of the breed that the above code predicts, even though the reported accuracy is greater than 0.924. So what am I doing wrong?

micheledicosmo · February 26, 2018, 4:36pm

Might be silly, but I get an error when running the “Individual prediction” code against the model I created on different data in exercise1.ipyb:

PATH = "/home/paperspace/data/"
fn = "exercise1/27752471_10215241041800086_6867930642288620096_n.png"; fn;
im = Image.open(PATH+fn); im.load(); im
trn_tfms, val_tfms = tfms_from_model(arch, sz)
im_array = val_tfms(im)
preds = learn.predict_array(im_array[None])
np.argmax(preds)

It seems like it’s trying to get the shape of the image, but the property is just not there for some reason:

~/fastai/courses/dl1/fastai/transforms.py in do_transform(self, x)
    270 
    271     def do_transform(self, x):
--> 272         return scale_min(x, self.sz)
    273 
    274 

~/fastai/courses/dl1/fastai/transforms.py in scale_min(im, targ)
     10         targ (int): target size
     11     """
---> 12     r,c,*_ = im.shape
     13     ratio = targ/min(r,c)
     14     sz = (scale_to(c, ratio, targ), scale_to(r, ratio, targ))

AttributeError: 'PngImageFile' object has no attribute 'shape'

InfinityCliff · March 8, 2018, 3:06pm

I think I could benefit from some general understanding of application. I understand (I think) pulling in the training and test data, fitting the model to the training data and verifying the model with the test data. What I am missing is how to take it to the step of taking the trained model and using it (applying it to a new image). For example, we have a cat/dog image classification model - I take a new photo - How do I pass the new image to the model and get back if it is a cat or a dog?

And, What if it was a picture of a horse? Will the model incorrectly classify as a dog or cat (with a low probability) or will it say ‘this is not a cat or dog’?

ibunny · March 12, 2018, 11:05am

Hi!
I got stuck in the satellite notebook, can anybody tell me how the sigmoid works for multi-label problem?
As Jeremy mentioned, we have to replace the softmax of the last layer with a sigmoid function, but what I don’t understand is, how could a sigmoid output a N-d vector? Or we have to train N sigmoids, each for one of the N labels?
So the question is how to calculate the loss of multi-label classification?
If you got any clue about this, please tell me some about the details, thanks!

daveluo · March 12, 2018, 9:45pm

hi @ibunny,

Yes you have it correct when you said:

Or we have to train N sigmoids, each for one of the N labels?

For multi-label problems (is_multi=True in ConvnetBuilder), fastai/pytorch automatically sets up N sigmoid activations in the final output layer training against data.classes (i.e. (‘agriculture’, 1.0), (‘artisinal_mine’, 0.0), (‘bare_ground’, 0.0),…).

Note that we also switch to F.binary_cross_entropy as the loss function vs negative log likelihood loss (F.nll_loss) since it’s a binary classification task for each label class:

class ConvLearner(Learner):
    def __init__(self, data, models, precompute=False, **kwargs):
        self.precompute = False
        super().__init__(data, models, **kwargs)
        if hasattr(data, 'is_multi'):
            self.crit = F.binary_cross_entropy if data.is_multi else F.nll_loss
            if data.is_reg: self.crit = F.l1_loss
            elif self.metrics is None:
                self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
        if precompute: self.save_fc1()
        self.freeze()
        self.precompute = precompute

ibunny · March 13, 2018, 2:06am

Everytime I can get exactly what I wanna know from your
So when I take a closer look at the structure of the NN in the satellite notebook, I found:

so does that mean we have 17 sigmoids, and we separately train them, for example, if the real label is: [1, 0, 1, 1, ....] and the sigmoid output is [0.8, 0.2, 0.7, 0.6, ...], so the loss for the 1st sigmoid is -(1*log0.8+0*log0.2) , and the loss for the 2nd sigmoid is -(0*log0.2+1*log0.8), and so on… and we just use the binary_cross_entropy for each sigmoid to train them separately? Or we have to add the loss up?

daveluo · March 13, 2018, 11:24pm

Good idea to look at the NN final layer outputs - confirms that it’s creating 17 sigmoids (for the 17 different label classes in the Planet dataset).

I believe pytorch automatically calculates the loss of each sigmoid output separately, takes the average of the 17, and displays that average value as trn_loss and val_loss. Looks like you can set it to sum losses instead of averaging them based on my reading of the pytorch 0.3.1 docs for binary_cross_entropy, size_average=True/False setting:
http://pytorch.org/docs/0.3.1/nn.html#binary-cross-entropy

In the newest pytorch 0.4, looks like there’ll be a new parameter (reduce=True/False) that would “returns a loss per input/target element instead” of averaging or summing the losses: http://pytorch.org/docs/master/nn.html#binary-cross-entropy