(As always, this is a wiki thread, so please edit it to help make it better).
<<< Wiki: Lesson 2 ｜ Wiki: Lesson 4 >>>
How to create a submission file
Video timelines for Lesson 3
(Thanks to @EricPB)
00:00:05 Cool guides & posts made by Fast.ai classmates
- tmux, summary of lesson 2, learning rate finder, guide to Pytorch, learning rate vs batch size,
- decoding ResNet architecture, beginner’s forum
00:05:45 Where we go from here
00:08:20 How to complete last week assignement “Dog breeds detection”
00:08:55 How to download data from Kaggle (Kaggle CLI) or anywhere else
00:12:05 Cool tip to download only the files you need: using CulrWget
00:13:35 Dogs vs Cats example
00:17:15 What means “Precompute = True” and “learn.bn_freeze”
00:20:10 Intro & comparison to Keras with TensorFlow
00:30:10 Porting PyTorch fast.ai library to Keras+TensorFlow project
00:32:30 Create a submission to Kaggle
00:39:30 Making an individual prediction on a single file
00:42:15 The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
00:49:45 ConvNet demo with Excel,
- filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer
01:08:30 ConvNet demo with Excel (continued)
- output, probabilities adding to 1, activation function, Softmax
01:15:30 The mathematics you really need to understand for Deep Learning
- Exponentiation & Logarithm
01:20:30 Multi-label classification with Amazon Satellite competition
01:33:35 Example of improving a “washed-out” image
01:37:30 Seting different learning rates for different layers
01:38:45 ‘data.resize()’ for speed-up, and ‘metrics=[f2]’ or ‘fbeta_score’ metric
01:45:10 ‘sigmoid’ activation for multi-label
01:47:30 Question on “Training only the last layers, not the initial freeze/frozen ones from ImageNet models”
- ‘learn.unfreeze()’ advanced discussion
01:56:30 Visualize your model with ‘learn.summary()’, shows ‘OrderedDict()’
01:59:45 Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting”
- Based on the Rossman Stores competitition
02:05:30 Book: Python for Data Analysis, by Wes McKinney
02:11:50 We save the dataframe with ‘Joined.to_feather()’ from Pandas, use ‘df = pd.read_feather()’ to load.
02:13:30 Split Rossman columns in two types: categorical vs continuous
Due to the update of
learn.TTA, minor changes are required for the first two kernels of the submission file.
Which notebook(s) are you referring to?
I encountered the missing one line item (could be the first or last item) when running
learn.TTA(is_test = True) from various notebooks. This is from the (replicated) Dogs Breeds notebook.
After updated the git pull #78,
learn.TTA() is working again. Thanks @FabianHertwig
At 1:39:55 Jeremy said that he is not going to be using any images bigger than
int(sz * 1.3) = int(64 * 1.3) = 83
But later he has used
sz = 128 and
sz = 256 and never has called
data.resize function ever again. Are the images in this case scaled down from the original sizes to 128 and 256 pixels respectively, or scaled up from 83 pixels to 128 and 256?
Why has he not used a bigger coefficient than 1.3?
I suppose that he uses the
data.resize when size is 64 because it takes too much time for the data loader to perform an on-demand scaling with a target dimension of 64.
On the other hand, on-demand data loader scaling from the original size to 128 or to 256 may take less time, so it’s pointless to do a pre-scale of all the images with
data.resize. However, nothing prevents to do that. The resize it’s just to speed things up.
So, in the end, images are scaled by the DL from the original sizes to 128 and 256. As for the coefficient, I think it’s try and test to find a balance between quality and speed.
(These are my opinions, bear in mind I could be totally wrong )
Edit: found Jeremy’s answer here.
Thought I’d share my notes for this lesson. These contain some screenshots from videos and a few notes/explanations from other sources I found useful. I also tried to make variable naming a bit easier to understand too (e.g., instead of “y” use “target_values”, instead of “lrs” use “differential_learning_rates”, etc.).
Minimum code to run image prediction:
Dogs vs. Cats implemented in Keras:
How to download data from Kaggle, run image recognition, create a submission:
Convolutional neural network theory:
I am a little confused with this part:
Do I need to execute the line 31 then jump over 54, execute all sortings, reindexings, mergings, then save result in feather for train data, then go again to the line 54, execute it, and do the same operations for the test dataset. Am I understand this correctly?
Sounds like you are running the crestle-only lines when not using crestle…
Yep and deleted the post. Conflated that and getting the kaggle images downloaded into the correct locations.
in the otavio demo, after the (first) maxpool layer, it overlays 2 (8x8?) images and then runs filters over that combination.
Can someone confirm how the images are combined, e.g. added? multiplied? (edit: jeremy adds in excel afterwards, so thats most likely the same)
thanks in advance
SOLVED: I got this working. I believe the code below is correct, but apparently you also need to make sure to call
learn.predict() at least once BEFORE using
learn.predict_array()? Otherwise the predictions from
learn.predict_array() seem to be totally incorrect.
I’m playing with the dog breeds data, and specifically trying to make an individual prediction on a single file as described in the video around 00:39:30. I was having some problems getting the code in the video to work, but after searching the forums, I’ve come up with this (had to convert the image to np.array and make sure precompute=False):
im = val_tfms(np.array(Image.open(PATH+file_name)))
learn.precompute = False
preds = learn.predict_array(im[None])
What I really want is the name of the breed of the prediction, instead of just a number. Is the code below a correct way to get the breed name for the predicted value?
That returns a label, such as ‘chihuahua’, but I’m not sure it is the correct way to get the breed name? Especially since when I’ve been playing with this I’ve tried about 20 different images, and NONE of the dogs in any of the images I’ve tried are actually of the breed that the above code predicts, even though the reported accuracy is greater than 0.924. So what am I doing wrong?
Might be silly, but I get an error when running the “Individual prediction” code against the model I created on different data in
PATH = "/home/paperspace/data/"
fn = "exercise1/27752471_10215241041800086_6867930642288620096_n.png"; fn;
im = Image.open(PATH+fn); im.load(); im
trn_tfms, val_tfms = tfms_from_model(arch, sz)
im_array = val_tfms(im)
preds = learn.predict_array(im_array[None])
It seems like it’s trying to get the shape of the image, but the property is just not there for some reason:
~/fastai/courses/dl1/fastai/transforms.py in do_transform(self, x)
271 def do_transform(self, x):
--> 272 return scale_min(x, self.sz)
~/fastai/courses/dl1/fastai/transforms.py in scale_min(im, targ)
10 targ (int): target size
---> 12 r,c,*_ = im.shape
13 ratio = targ/min(r,c)
14 sz = (scale_to(c, ratio, targ), scale_to(r, ratio, targ))
AttributeError: 'PngImageFile' object has no attribute 'shape'
I think I could benefit from some general understanding of application. I understand (I think) pulling in the training and test data, fitting the model to the training data and verifying the model with the test data. What I am missing is how to take it to the step of taking the trained model and using it (applying it to a new image). For example, we have a cat/dog image classification model - I take a new photo - How do I pass the new image to the model and get back if it is a cat or a dog?
And, What if it was a picture of a horse? Will the model incorrectly classify as a dog or cat (with a low probability) or will it say ‘this is not a cat or dog’?
I got stuck in the satellite notebook, can anybody tell me how the sigmoid works for multi-label problem?
As Jeremy mentioned, we have to replace the softmax of the last layer with a sigmoid function, but what I don’t understand is, how could a sigmoid output a N-d vector? Or we have to train N sigmoids, each for one of the N labels?
So the question is how to calculate the loss of multi-label classification?
If you got any clue about this, please tell me some about the details, thanks!
Yes you have it correct when you said:
Or we have to train N sigmoids, each for one of the N labels?
For multi-label problems (is_multi=True in ConvnetBuilder), fastai/pytorch automatically sets up N sigmoid activations in the final output layer training against data.classes (i.e. (‘agriculture’, 1.0), (‘artisinal_mine’, 0.0), (‘bare_ground’, 0.0),…).
Note that we also switch to F.binary_cross_entropy as the loss function vs negative log likelihood loss (F.nll_loss) since it’s a binary classification task for each label class:
def __init__(self, data, models, precompute=False, **kwargs):
self.precompute = False
super().__init__(data, models, **kwargs)
if hasattr(data, 'is_multi'):
self.crit = F.binary_cross_entropy if data.is_multi else F.nll_loss
if data.is_reg: self.crit = F.l1_loss
elif self.metrics is None:
self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
if precompute: self.save_fc1()
self.precompute = precompute
Everytime I can get exactly what I wanna know from your
So when I take a closer look at the structure of the NN in the satellite notebook, I found:
so does that mean we have 17 sigmoids, and we separately train them, for example, if the real label is:
[1, 0, 1, 1, ....] and the sigmoid output is
[0.8, 0.2, 0.7, 0.6, ...], so the loss for the 1st sigmoid is
-(1*log0.8+0*log0.2) , and the loss for the 2nd sigmoid is
-(0*log0.2+1*log0.8), and so on… and we just use the binary_cross_entropy for each sigmoid to train them separately? Or we have to add the loss up?
Good idea to look at the NN final layer outputs - confirms that it’s creating 17 sigmoids (for the 17 different label classes in the Planet dataset).
I believe pytorch automatically calculates the loss of each sigmoid output separately, takes the average of the 17, and displays that average value as trn_loss and val_loss. Looks like you can set it to sum losses instead of averaging them based on my reading of the pytorch 0.3.1 docs for binary_cross_entropy, size_average=True/False setting:
In the newest pytorch 0.4, looks like there’ll be a new parameter (reduce=True/False) that would “returns a loss per input/target element instead” of averaging or summing the losses: http://pytorch.org/docs/master/nn.html#binary-cross-entropy