Part 2 Lesson 9 wiki

daveluo · March 30, 2018, 8:54pm

I’ve found that if you just don’t run the lines that set pytorch Variables to .cpu() (or make sure those lines are commented out) in the original pascal-multi nb, it should all run correctly on GPU. Specifically these 3 cells:

x,y = next(iter(md.val_dl))
# x,y = V(x).cpu(),V(y)
x,y = V(x),V(y)

#for i,o in enumerate(y): y[i] = o.cpu()
learn.model#.cpu()

#anchors = anchors.cpu(); grid_sizes = grid_sizes.cpu(); anchor_cnr = anchor_cnr.cpu()

I found it easiest to restart the original notebook kernel, check that these 3 lines are commented out, and run through to confirm that it works.

By default, I believe variables are placed on CUDA (GPU) when they’re first defined. What’s happening is when you run the lines above (the ones I’ve commented out), it’s placing those pytorch Variables on cpu while other Variables are not and this makes them inaccessible to each other later on when a function requiring both is called.

binga · March 30, 2018, 9:11pm

This small tweak worked for me as well.

jeremy · March 30, 2018, 11:11pm

The lines that convert existing tensors into cpu versions aren’t meant to be run - they are there to enable testing on the CPU (since if you have errors on the GPU, they’re much harder to debug).

jeremy · March 30, 2018, 11:12pm

Yup exactly. This is done by fastai. You can override this behavior with fastai.core.USE_GPU=False BTW. (You need to run that before you start creating your models or dataloaders).

Interogativ · March 30, 2018, 11:27pm

How Confident are we?

The confidence threshold hard coded into show_nmf is 0.25, I got some interesting results by making that a parameter. It seems for some objects (person) increasing this helps, and for others (dog) this hurts. In the SSD paper they quoted a threshold of .1, but I’m guessing that this theshold should somehow rely on the relative sizes of the gt object with regard to the anchor boxes. Any thoughts?

Ducky · March 31, 2018, 12:51am

Advice: if you want to go through pascal-multi.ipynb step-by-step, executing as you go, use Tim David Lee’s version, not Jeremy’s. TDL’s version has more comments/discussions, disambiguations, and it actually runs straight through without having to edit to deal with CPU/CUDA.

github.com

timdavidlee/fast_dl2/blob/master/live_notes/dl2_022_multiple_objects.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",

This file has been truncated. show original

groverpr · March 31, 2018, 1:23am

I was reading about Focal Loss that was discussed in class. So, it handles class imbalance problem for single stage object detector like YOLO/ SSD by weighting the observations which were difficult to classify. Is it right to think of this as neural net version of ensemble of boosted trees? If so, it would be beautiful simple tweak that people didn’t think of before for CNN, but everyone was using it for tree models. Please correct me if I am wrong.

Ducky · March 31, 2018, 3:28am

val_ds2 = ConcatLblDataset(md.val_ds, val_mcs)

In this line, I couldn’t figure out how md.val_ds and val_mcs could line up, since val_mcs is split at random by:

((val_mcs,trn_mcs),) = split_by_idx(val_idxs, mcs)

and md.val_ds comes from ImageDataClassifier.from_csv() creating a different validation set at random.

Well, it turns out that the validation sets are not random, exactly. They have a default seed, so if you don’t specify the seed, the validation sets are chosen for the same records.

Hope this saves someone else some time.

Ducky · March 31, 2018, 4:46am

For

def one_hot_embedding(labels, num_classes):
    return torch.eye(num_classes)[labels.data.cpu()]

why is labels.data put on the CPU?

sgugger · March 31, 2018, 12:48pm

Because pytorch doesn’t like it otherwise, it’s the error you quoted earlier:

Performing basic indexing on a tensor and encountered an error indexing dim 0 
with an object of type torch.cuda.LongTensor. The only supported types are integers,
slices, numpy scalars, or if indexing with a torch.LongTensor or torch.
ByteTensor only a single Tensor may be passed.

Pytorch doesn’t want an indexing by a cuda tensor, only integers, slices, numpy scalars or torch.LongTensor/ByteTensor.
labels.data is a toch.cuda.LongTensor because it’s stored on the GPU during training, so we have to pass it back to the CPU to convert it into a toch.LongTensor.

jeremy · March 31, 2018, 2:35pm

…and I have no idea why not. It looks like a bug to me, or at least a missing feature. I see no reason why pytorch shouldn’t support indexing with a cuda tensor.

jeremy · March 31, 2018, 2:36pm

Good point!

jeremy · March 31, 2018, 2:40pm

I don’t think either of these things are true - especially not the latter bit.

The alpha parameter in focal loss does weight positive vs negative labels differently (but weights observations the same).

But that’s a minor tweak. The main difference is what’s shown in fig 1 in the paper - by multiplying the input to cross-entropy loss by the factor shown there, it results in a steeper “hook” in the curve. Make sure you understand that figure in the paper, since that’s key. Try to reproduce that figure yourself. Then, try to make sure you understand why the curves with gamma>0 are what we probably want.

(If anyone tries this and gets stuck, let us know! And if you figure it out, tell us what your understanding is )

jamesrequa · March 31, 2018, 3:06pm

You can change the default seed to some other seed but if you want to get the same random split you just have to use that same seed.

hiromi · March 31, 2018, 4:04pm

I am trying to understand BCE_Loss class we covered in the class.

After watching the video several times, I do understand the complexity of predicting background as a class itself. What I do not understand is, in OutConv, why do we do:

self.oconv1 = nn.Conv2d(nin, (len(id2cat)+1)*k, 3, padding=1)

This is the conv layer for classification. If our custom BCE loss is just going to chop off the last column of the activation, why don’t we just set the out_channels of Conv2d to be (len(id2cat))*k? I’ve gone through the model layer by layer and cannot quite figure out the reason.

Any help would be greatly appreciated!!

Here is the timestamp where Jeremy is talking about this

Ducky · March 31, 2018, 5:19pm

I also don’t understand why we add one and then chop it off.

jeremy · March 31, 2018, 5:52pm

Yup this is confusing! And quite possibly I’m doing it in a sub-optimal way…

Let’s first discuss the loss function. We want to 1-hot encode, but if the target is ‘background’, then we want all-zeros. So in the loss function we use a regular 1-hot encoding function, then remove the last column. There are of course other ways we could do this - if anyone has a suggestion for something that is more concise and/or easier to understand, please say so.

As for why we add one in the convolutional output - well frankly I can’t remember! I suspect it is a redundant hold-over from when I used softmax (which is what I did when I started on this). My guess is that you could remove it and get just as good results (if not better). If you try this, please let us know how you go!

hiromi · March 31, 2018, 6:03pm

Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does Thank you very much for the quick response and clarifications!!

KevinB · March 31, 2018, 6:35pm

I’m working on the defaultdict thing and I get what it is doing now that I’m breaking everything down, I just have a question on lambda (I know, not technically related to defaultdict). Is there a way to put lambda:x+1 so what I’m wanting x to be is I want that to be whatever the key is. so

trn_anno = collections.defaultdict(lambda:x+1)

would look like this:

trn_anno[12] would give you 13 is that a thing you can do with lambda?

I’ve tried to google it, but I haven’t seen anything that does this.

hiromi · March 31, 2018, 7:11pm

It is possible to create a lambda function that increments the input by 1 :

my_lambda = lambda x: x + 1
my_lambda(12)

I am not sure the use case you have in your mind, but you probably don’t want to put that in defaultdict. The reason is, if say trn_anno[12] doesn’t exist, it calls the lambda function with no argument and the call will fail because it didn’t pass the required parameter. I might be able to help more if you could explain what you are trying to do. Sorry