Part 2 Lesson 9 wiki

Good point!

I don’t think either of these things are true - especially not the latter bit.

The alpha parameter in focal loss does weight positive vs negative labels differently (but weights observations the same).

But that’s a minor tweak. The main difference is what’s shown in fig 1 in the paper - by multiplying the input to cross-entropy loss by the factor shown there, it results in a steeper “hook” in the curve. Make sure you understand that figure in the paper, since that’s key. Try to reproduce that figure yourself. Then, try to make sure you understand why the curves with gamma>0 are what we probably want.

(If anyone tries this and gets stuck, let us know! And if you figure it out, tell us what your understanding is :slight_smile: )

1 Like

You can change the default seed to some other seed but if you want to get the same random split you just have to use that same seed.

I am trying to understand BCE_Loss class we covered in the class.

After watching the video several times, I do understand the complexity of predicting background as a class itself. What I do not understand is, in OutConv, why do we do:

self.oconv1 = nn.Conv2d(nin, (len(id2cat)+1)*k, 3, padding=1)

This is the conv layer for classification. If our custom BCE loss is just going to chop off the last column of the activation, why don’t we just set the out_channels of Conv2d to be (len(id2cat))*k? I’ve gone through the model layer by layer and cannot quite figure out the reason.

Any help would be greatly appreciated!! :pray:

Here is the timestamp where Jeremy is talking about this


I also don’t understand why we add one and then chop it off.

Yup this is confusing! And quite possibly I’m doing it in a sub-optimal way…

Let’s first discuss the loss function. We want to 1-hot encode, but if the target is ‘background’, then we want all-zeros. So in the loss function we use a regular 1-hot encoding function, then remove the last column. There are of course other ways we could do this - if anyone has a suggestion for something that is more concise and/or easier to understand, please say so. :slight_smile:

As for why we add one in the convolutional output - well frankly I can’t remember! I suspect it is a redundant hold-over from when I used softmax (which is what I did when I started on this). My guess is that you could remove it and get just as good results (if not better). If you try this, please let us know how you go!


Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does :slight_smile: Thank you very much for the quick response and clarifications!!

1 Like

I’m working on the defaultdict thing and I get what it is doing now that I’m breaking everything down, I just have a question on lambda (I know, not technically related to defaultdict). Is there a way to put lambda:x+1 so what I’m wanting x to be is I want that to be whatever the key is. so

trn_anno = collections.defaultdict(lambda:x+1)

would look like this:

trn_anno[12] would give you 13 is that a thing you can do with lambda?

I’ve tried to google it, but I haven’t seen anything that does this.

It is possible to create a lambda function that increments the input by 1 :

my_lambda = lambda x: x + 1

I am not sure the use case you have in your mind, but you probably don’t want to put that in defaultdict. The reason is, if say trn_anno[12] doesn’t exist, it calls the lambda function with no argument and the call will fail because it didn’t pass the required parameter. I might be able to help more if you could explain what you are trying to do. Sorry :frowning:

well, my thought is that we are using this to say what the default is so my thought was that if they key had something to do with the value. so let’s say I want to make a lookup table with a list of the squares and I only want to calculate them once and then I want to use the dictionary. So my defaultdict function would be: squares = collections.defaultdict(lambda x:x**2). The reason I’m thinking this could be useful is in embedded systems when you don’t have much space, you could still have the speed of a lookup table without having to store the whole table. So when I use this defaultdict, I now want to call squares[5] which should give me 25, but I don’t know how to pass the key through if that makes sense.

Done. Now, I can see the gamma value in relation to the loss.


Sounds like you are trying to implement a cache mechanism. I am not familiar with what kind of approaches people use in Python though :confused:

No problem. I don’t think it’s important for this anyways.

I think what you’re for looking is available here. I quickly tested it and it seems to work.


Hiromi- thanks for putting this together!!!

Sure :slight_smile: It’s just some scribble I did a while ago.

There is a LRU cache decorator in the standard library.

1 Like

Dovetailing with @daveluo 's awesome whiteboarding of SSD_MultiHead (thank you for that!) - I also found it really helpful to spend time diagramming / visualizing the forward line-by-line. Attaching screenshot here in case helpful for anyone else…


Hey Guys,
For those who want a quick recap on Cross-Entropy watch this youtube video


I’d love to show that in class (with credit of course!) - would that be OK? If so, would you prefer me to credit your forum user name, or your real name (if the latter, please tell me your real name)?