Part 2 Lesson 9 wiki

I’ve found that if you just don’t run the lines that set pytorch Variables to .cpu() (or make sure those lines are commented out) in the original pascal-multi nb, it should all run correctly on GPU. Specifically these 3 cells:

x,y = next(iter(md.val_dl))
# x,y = V(x).cpu(),V(y)
x,y = V(x),V(y)
#for i,o in enumerate(y): y[i] = o.cpu()
learn.model#.cpu()
#anchors = anchors.cpu(); grid_sizes = grid_sizes.cpu(); anchor_cnr = anchor_cnr.cpu()

I found it easiest to restart the original notebook kernel, check that these 3 lines are commented out, and run through to confirm that it works.

By default, I believe variables are placed on CUDA (GPU) when they’re first defined. What’s happening is when you run the lines above (the ones I’ve commented out), it’s placing those pytorch Variables on cpu while other Variables are not and this makes them inaccessible to each other later on when a function requiring both is called.

6 Likes

This small tweak worked for me as well.

1 Like

The lines that convert existing tensors into cpu versions aren’t meant to be run - they are there to enable testing on the CPU (since if you have errors on the GPU, they’re much harder to debug).

1 Like

Yup exactly. This is done by fastai. You can override this behavior with fastai.core.USE_GPU=False BTW. (You need to run that before you start creating your models or dataloaders).

4 Likes

How Confident are we?

The confidence threshold hard coded into show_nmf is 0.25, I got some interesting results by making that a parameter. It seems for some objects (person) increasing this helps, and for others (dog) this hurts. In the SSD paper they quoted a threshold of .1, but I’m guessing that this theshold should somehow rely on the relative sizes of the gt object with regard to the anchor boxes. Any thoughts?

Advice: if you want to go through pascal-multi.ipynb step-by-step, executing as you go, use Tim David Lee’s version, not Jeremy’s. TDL’s version has more comments/discussions, disambiguations, and it actually runs straight through without having to edit to deal with CPU/CUDA.

I was reading about Focal Loss that was discussed in class. So, it handles class imbalance problem for single stage object detector like YOLO/ SSD by weighting the observations which were difficult to classify. Is it right to think of this as neural net version of ensemble of boosted trees? If so, it would be beautiful simple tweak that people didn’t think of before for CNN, but everyone was using it for tree models. Please correct me if I am wrong.

val_ds2 = ConcatLblDataset(md.val_ds, val_mcs)

In this line, I couldn’t figure out how md.val_ds and val_mcs could line up, since val_mcs is split at random by:

((val_mcs,trn_mcs),) = split_by_idx(val_idxs, mcs)

and md.val_ds comes from ImageDataClassifier.from_csv() creating a different validation set at random.

Well, it turns out that the validation sets are not random, exactly. They have a default seed, so if you don’t specify the seed, the validation sets are chosen for the same records.

Hope this saves someone else some time.

1 Like

For

def one_hot_embedding(labels, num_classes):
    return torch.eye(num_classes)[labels.data.cpu()]

why is labels.data put on the CPU?

Because pytorch doesn’t like it otherwise, it’s the error you quoted earlier:

Performing basic indexing on a tensor and encountered an error indexing dim 0 
with an object of type torch.cuda.LongTensor. The only supported types are integers,
slices, numpy scalars, or if indexing with a torch.LongTensor or torch.
ByteTensor only a single Tensor may be passed.

Pytorch doesn’t want an indexing by a cuda tensor, only integers, slices, numpy scalars or torch.LongTensor/ByteTensor.
labels.data is a toch.cuda.LongTensor because it’s stored on the GPU during training, so we have to pass it back to the CPU to convert it into a toch.LongTensor.

…and I have no idea why not. It looks like a bug to me, or at least a missing feature. I see no reason why pytorch shouldn’t support indexing with a cuda tensor.

Good point!

I don’t think either of these things are true - especially not the latter bit.

The alpha parameter in focal loss does weight positive vs negative labels differently (but weights observations the same).

But that’s a minor tweak. The main difference is what’s shown in fig 1 in the paper - by multiplying the input to cross-entropy loss by the factor shown there, it results in a steeper “hook” in the curve. Make sure you understand that figure in the paper, since that’s key. Try to reproduce that figure yourself. Then, try to make sure you understand why the curves with gamma>0 are what we probably want.

(If anyone tries this and gets stuck, let us know! And if you figure it out, tell us what your understanding is :slight_smile: )

1 Like

You can change the default seed to some other seed but if you want to get the same random split you just have to use that same seed.

I am trying to understand BCE_Loss class we covered in the class.

After watching the video several times, I do understand the complexity of predicting background as a class itself. What I do not understand is, in OutConv, why do we do:

self.oconv1 = nn.Conv2d(nin, (len(id2cat)+1)*k, 3, padding=1)

This is the conv layer for classification. If our custom BCE loss is just going to chop off the last column of the activation, why don’t we just set the out_channels of Conv2d to be (len(id2cat))*k? I’ve gone through the model layer by layer and cannot quite figure out the reason.

Any help would be greatly appreciated!! :pray:

Here is the timestamp where Jeremy is talking about this

4 Likes

I also don’t understand why we add one and then chop it off.

Yup this is confusing! And quite possibly I’m doing it in a sub-optimal way…

Let’s first discuss the loss function. We want to 1-hot encode, but if the target is ‘background’, then we want all-zeros. So in the loss function we use a regular 1-hot encoding function, then remove the last column. There are of course other ways we could do this - if anyone has a suggestion for something that is more concise and/or easier to understand, please say so. :slight_smile:

As for why we add one in the convolutional output - well frankly I can’t remember! I suspect it is a redundant hold-over from when I used softmax (which is what I did when I started on this). My guess is that you could remove it and get just as good results (if not better). If you try this, please let us know how you go!

4 Likes

Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does :slight_smile: Thank you very much for the quick response and clarifications!!

1 Like

I’m working on the defaultdict thing and I get what it is doing now that I’m breaking everything down, I just have a question on lambda (I know, not technically related to defaultdict). Is there a way to put lambda:x+1 so what I’m wanting x to be is I want that to be whatever the key is. so

trn_anno = collections.defaultdict(lambda:x+1)

would look like this:

trn_anno[12] would give you 13 is that a thing you can do with lambda?

I’ve tried to google it, but I haven’t seen anything that does this.

It is possible to create a lambda function that increments the input by 1 :

my_lambda = lambda x: x + 1
my_lambda(12)

I am not sure the use case you have in your mind, but you probably don’t want to put that in defaultdict. The reason is, if say trn_anno[12] doesn’t exist, it calls the lambda function with no argument and the call will fail because it didn’t pass the required parameter. I might be able to help more if you could explain what you are trying to do. Sorry :frowning: