Part 2 Lesson 9 wiki


(Kaitlin Duck Sherwood) #345

I got this error to. The line

anchors = anchors.cpu(); grid_sizes = grid_sizes.cpu(); anchor_cnr = anchor_cnr.cpu()

happens after the error about weight type and input type.

I found .cpu() in a number of places:

def one_hot_embedding(labels, num_classes):
    return torch.eye(num_classes)[labels.data.cuda()]

for i,o in enumerate(y): y[i] = o.cpu()
learn.model.cpu()

I changed it and now do not get the runtime error about CUDAFloatTensor vs. CPUFloatTensor.


(Kaitlin Duck Sherwood) #346

Now I get the error

Performing basic indexing on a tensor and encountered an error indexing dim 0 
with an object of type torch.cuda.LongTensor. The only supported types are integers,
slices, numpy scalars, or if indexing with a torch.LongTensor or torch.
ByteTensor only a single Tensor may be passed.

I did a little looking into it, but I have to go right now. More later.


(Dave Luo) #347

I’ve found that if you just don’t run the lines that set pytorch Variables to .cpu() (or make sure those lines are commented out) in the original pascal-multi nb, it should all run correctly on GPU. Specifically these 3 cells:

x,y = next(iter(md.val_dl))
# x,y = V(x).cpu(),V(y)
x,y = V(x),V(y)
#for i,o in enumerate(y): y[i] = o.cpu()
learn.model#.cpu()
#anchors = anchors.cpu(); grid_sizes = grid_sizes.cpu(); anchor_cnr = anchor_cnr.cpu()

I found it easiest to restart the original notebook kernel, check that these 3 lines are commented out, and run through to confirm that it works.

By default, I believe variables are placed on CUDA (GPU) when they’re first defined. What’s happening is when you run the lines above (the ones I’ve commented out), it’s placing those pytorch Variables on cpu while other Variables are not and this makes them inaccessible to each other later on when a function requiring both is called.


(Phani Srikanth) #348

This small tweak worked for me as well.


(Jeremy Howard) #349

The lines that convert existing tensors into cpu versions aren’t meant to be run - they are there to enable testing on the CPU (since if you have errors on the GPU, they’re much harder to debug).


(Jeremy Howard) #350

Yup exactly. This is done by fastai. You can override this behavior with fastai.core.USE_GPU=False BTW. (You need to run that before you start creating your models or dataloaders).


(Bart Fish) #351

How Confident are we?

The confidence threshold hard coded into show_nmf is 0.25, I got some interesting results by making that a parameter. It seems for some objects (person) increasing this helps, and for others (dog) this hurts. In the SSD paper they quoted a threshold of .1, but I’m guessing that this theshold should somehow rely on the relative sizes of the gt object with regard to the anchor boxes. Any thoughts?


(Kaitlin Duck Sherwood) #352

Advice: if you want to go through pascal-multi.ipynb step-by-step, executing as you go, use Tim David Lee’s version, not Jeremy’s. TDL’s version has more comments/discussions, disambiguations, and it actually runs straight through without having to edit to deal with CPU/CUDA.


(Prince Grover) #353

I was reading about Focal Loss that was discussed in class. So, it handles class imbalance problem for single stage object detector like YOLO/ SSD by weighting the observations which were difficult to classify. Is it right to think of this as neural net version of ensemble of boosted trees? If so, it would be beautiful simple tweak that people didn’t think of before for CNN, but everyone was using it for tree models. Please correct me if I am wrong.


(Kaitlin Duck Sherwood) #354
val_ds2 = ConcatLblDataset(md.val_ds, val_mcs)

In this line, I couldn’t figure out how md.val_ds and val_mcs could line up, since val_mcs is split at random by:

((val_mcs,trn_mcs),) = split_by_idx(val_idxs, mcs)

and md.val_ds comes from ImageDataClassifier.from_csv() creating a different validation set at random.

Well, it turns out that the validation sets are not random, exactly. They have a default seed, so if you don’t specify the seed, the validation sets are chosen for the same records.

Hope this saves someone else some time.


(Kaitlin Duck Sherwood) #355

For

def one_hot_embedding(labels, num_classes):
    return torch.eye(num_classes)[labels.data.cpu()]

why is labels.data put on the CPU?


#356

Because pytorch doesn’t like it otherwise, it’s the error you quoted earlier:

Performing basic indexing on a tensor and encountered an error indexing dim 0 
with an object of type torch.cuda.LongTensor. The only supported types are integers,
slices, numpy scalars, or if indexing with a torch.LongTensor or torch.
ByteTensor only a single Tensor may be passed.

Pytorch doesn’t want an indexing by a cuda tensor, only integers, slices, numpy scalars or torch.LongTensor/ByteTensor.
labels.data is a toch.cuda.LongTensor because it’s stored on the GPU during training, so we have to pass it back to the CPU to convert it into a toch.LongTensor.


(Jeremy Howard) #357

…and I have no idea why not. It looks like a bug to me, or at least a missing feature. I see no reason why pytorch shouldn’t support indexing with a cuda tensor.


(Jeremy Howard) #358

Good point!


(Jeremy Howard) #359

I don’t think either of these things are true - especially not the latter bit.

The alpha parameter in focal loss does weight positive vs negative labels differently (but weights observations the same).

But that’s a minor tweak. The main difference is what’s shown in fig 1 in the paper - by multiplying the input to cross-entropy loss by the factor shown there, it results in a steeper “hook” in the curve. Make sure you understand that figure in the paper, since that’s key. Try to reproduce that figure yourself. Then, try to make sure you understand why the curves with gamma>0 are what we probably want.

(If anyone tries this and gets stuck, let us know! And if you figure it out, tell us what your understanding is :slight_smile: )


(James Requa) #360

You can change the default seed to some other seed but if you want to get the same random split you just have to use that same seed.


(Hiromi Suenaga) #361

I am trying to understand BCE_Loss class we covered in the class.

After watching the video several times, I do understand the complexity of predicting background as a class itself. What I do not understand is, in OutConv, why do we do:

self.oconv1 = nn.Conv2d(nin, (len(id2cat)+1)*k, 3, padding=1)

This is the conv layer for classification. If our custom BCE loss is just going to chop off the last column of the activation, why don’t we just set the out_channels of Conv2d to be (len(id2cat))*k? I’ve gone through the model layer by layer and cannot quite figure out the reason.

Any help would be greatly appreciated!! :pray:

Here is the timestamp where Jeremy is talking about this


(Kaitlin Duck Sherwood) #362

I also don’t understand why we add one and then chop it off.


(Jeremy Howard) #363

Yup this is confusing! And quite possibly I’m doing it in a sub-optimal way…

Let’s first discuss the loss function. We want to 1-hot encode, but if the target is ‘background’, then we want all-zeros. So in the loss function we use a regular 1-hot encoding function, then remove the last column. There are of course other ways we could do this - if anyone has a suggestion for something that is more concise and/or easier to understand, please say so. :slight_smile:

As for why we add one in the convolutional output - well frankly I can’t remember! I suspect it is a redundant hold-over from when I used softmax (which is what I did when I started on this). My guess is that you could remove it and get just as good results (if not better). If you try this, please let us know how you go!


(Hiromi Suenaga) #364

Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does :slight_smile: Thank you very much for the quick response and clarifications!!