Part 2 Lesson 9 wiki

Yup this is confusing! And quite possibly I’m doing it in a sub-optimal way…

Let’s first discuss the loss function. We want to 1-hot encode, but if the target is ‘background’, then we want all-zeros. So in the loss function we use a regular 1-hot encoding function, then remove the last column. There are of course other ways we could do this - if anyone has a suggestion for something that is more concise and/or easier to understand, please say so. :slight_smile:

As for why we add one in the convolutional output - well frankly I can’t remember! I suspect it is a redundant hold-over from when I used softmax (which is what I did when I started on this). My guess is that you could remove it and get just as good results (if not better). If you try this, please let us know how you go!


Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does :slight_smile: Thank you very much for the quick response and clarifications!!

1 Like

I’m working on the defaultdict thing and I get what it is doing now that I’m breaking everything down, I just have a question on lambda (I know, not technically related to defaultdict). Is there a way to put lambda:x+1 so what I’m wanting x to be is I want that to be whatever the key is. so

trn_anno = collections.defaultdict(lambda:x+1)

would look like this:

trn_anno[12] would give you 13 is that a thing you can do with lambda?

I’ve tried to google it, but I haven’t seen anything that does this.

It is possible to create a lambda function that increments the input by 1 :

my_lambda = lambda x: x + 1

I am not sure the use case you have in your mind, but you probably don’t want to put that in defaultdict. The reason is, if say trn_anno[12] doesn’t exist, it calls the lambda function with no argument and the call will fail because it didn’t pass the required parameter. I might be able to help more if you could explain what you are trying to do. Sorry :frowning:

well, my thought is that we are using this to say what the default is so my thought was that if they key had something to do with the value. so let’s say I want to make a lookup table with a list of the squares and I only want to calculate them once and then I want to use the dictionary. So my defaultdict function would be: squares = collections.defaultdict(lambda x:x**2). The reason I’m thinking this could be useful is in embedded systems when you don’t have much space, you could still have the speed of a lookup table without having to store the whole table. So when I use this defaultdict, I now want to call squares[5] which should give me 25, but I don’t know how to pass the key through if that makes sense.

Done. Now, I can see the gamma value in relation to the loss.


Sounds like you are trying to implement a cache mechanism. I am not familiar with what kind of approaches people use in Python though :confused:

No problem. I don’t think it’s important for this anyways.

I think what you’re for looking is available here. I quickly tested it and it seems to work.


Hiromi- thanks for putting this together!!!

Sure :slight_smile: It’s just some scribble I did a while ago.

There is a LRU cache decorator in the standard library.

1 Like

Dovetailing with @daveluo 's awesome whiteboarding of SSD_MultiHead (thank you for that!) - I also found it really helpful to spend time diagramming / visualizing the forward line-by-line. Attaching screenshot here in case helpful for anyone else…


Hey Guys,
For those who want a quick recap on Cross-Entropy watch this youtube video


I’d love to show that in class (with credit of course!) - would that be OK? If so, would you prefer me to credit your forum user name, or your real name (if the latter, please tell me your real name)?


Hey everyone…
I faced a problem when I’m reading the SSD paper :sweat: … Can anyone tell me how Detections: 8732 per class is calculated in the SSD network?.. :slightly_smiling_face:

Thank you… :slight_smile:

Sure! But you try first :slight_smile: How many detections do you calculate based on what you’ve read? Take us through your thinking and we’ll figure this out together.

Hey everyone.
I think, I found a little bug in pascal-multi.ipynb.
It is a peace of code in the very beginning when we predict multiple classes and plot pictures with one or more predicted labels:

for i,ax in enumerate(axes.flat):
    ya = np.nonzero(y[i]>0.4)[0]
    b = '\n'.join(md.classes[o] for o in ya)
    ax = show_img(ima, ax=ax)
    draw_text(ax, (0,0), b)

I found, that ya = np.nonzero(y[i]>0.4)[0] is one object and the code always plots only one class instead of several.
So I removed [0] and added int(o) (to convert from torch.cuda.LongTensor dtype) in b definition. Like this

for i,ax in enumerate(axes.flat):
    ya = np.nonzero(y[i]>0.4)
    b = '\n'.join(md.classes[int(o)] for o in ya)
    ax = show_img(ima, ax=ax)
    draw_text(ax, (0,0), b)

Now it works well.
Should I create a pull request for this?

1 Like

What is the best way to deal with images with a different aspect ratio? For example, I have a dataset with images of size 375x1242. Width is 3.3 times larger than height. Resizing the images to a square shape expected by pretrained Imagenet models will lead to unrealistically looking images.
Is there any way to leverage pretrained image classification models in this case, or it’s better to create SSD model with an appropriate aspect ratio and train it from scratch?

1 Like

Good list of Criterions for better understanding