Part 2 Lesson 9 wiki

jeremy · March 28, 2018, 4:39am

That is all exactly right. But this earlier statement isn’t: each of those parts would be compared against 4+c filters to see which one activated it the most.

KevinB · March 28, 2018, 4:43am

Ok, I think I’m starting to understand this. Just have to rewind a lot. Glad I have study group tomorrow to bounce some ideas off hiromi and metachi!

jeremy · March 28, 2018, 4:47am

Note that the loss function is nearly identical to the single bounding box loss function we used at the end of pascal.ipynb. The only significant difference is that we first have to solve the matching problem.

(There’s also a much more minor difference that we use binary cross entropy and ignore background, instead of categorical cross entropy, but it’s fine to totally ignore that difference for now)

Sree · March 28, 2018, 6:25am

Getting this error for the notebooks on aws. Did a git pull and condo env update.

suvash · March 28, 2018, 9:52am

A while ago, I ran into this post which helped clarify the idea of BN rather well for me. Now and then, I go back to it to refresh myself when I’m confused. Hopefully helpful to more. https://towardsdatascience.com/understanding-batch-normalization-with-examples-in-numpy-and-tensorflow-with-interactive-code-7f59bb126642

lgvaz · March 28, 2018, 12:56pm

Ow, that’s very nice, I will search more information about this. I actually face this scenario very often, In deep reinforcement learning is common to have one network body with multiple heads.

hiromi · March 28, 2018, 2:17pm

Thanks

nachiket273 · March 28, 2018, 3:50pm

Check if git pull had some conflicts , resolve them and commit and reopen.

Combalgorythm · March 28, 2018, 4:44pm

I faced this problem when I had some commit conflicts on pascal.ipynb file. After resolving those, the notebook opened fine.

bhollan · March 28, 2018, 5:19pm

@rachel and @binga is the gist linked to at the top of the page private/correct? It’s going 404 for me. Is there a group or something I need to join or be admitted to?

Sree · March 28, 2018, 5:21pm

yes found git full conflicts and all okay now. Thanks!

binga · March 28, 2018, 5:24pm

I fixed a bug yesterday and created a new gist. Updated the link now. Thanks!

Even · March 28, 2018, 7:11pm

I did. But when you rerecorded that part was lost.

jeremy · March 28, 2018, 8:43pm

Oh I understand now - sorry! I highlighted @sermakarevich’s Kaggle gold medal in the Jigsaw Toxic Comments competition.

sermakarevich · March 29, 2018, 8:26am

Oops, I missed that part too. Any chance the link with live stream is still valid ?

This was just amazing:

radek · March 29, 2018, 3:00pm

While presenting this slide @jeremy mentions two Conv2d operations performed in succession:

The first one is a Conv2d that takes the outputs from the resnet model of shape (7, 7, <num channels>) to a new shape of (4, 4, 4+<num_classes>).

In the lecture we are not provided the other settings for the convolution but I guess they would be easy to figure out by looking at the notebook. My guess is that they are performed with a (3, 3) kernel and a padding of 1. These are quite common settings that preserve feature map size with a stride of 1 and I assume they might be what we go here for with a stride of 2, which gives us the ‘(4 ,4)’ feature maps.

Here however, we perform another set of convolutions going from (4, 4) to (2, 2). Given how these two convolutions seem to be doing roughly the same thing, I would expect their parameters to be the same. But I don’t see how we can go from (4, 4) to (2, 2) with a filter size of (3, 3) and a stride of 2. We could do one side padding but that sounds absolutely horrible

The only settings that seem reasonable here for the 2nd convolution would be padding of 0 and a (2, 2) kernel.

But is this really what is happening here? More interestingly, if these convolutions don’t share parameters, why is that?

I was really blown away by the observation that a receptive field will ‘look more’ at what is in the center. (this is nicely shown using excel where there are more values feeding into the center of a receptive field than its sides). Could this be a factor that plays into the conv params here? If we want to look as best as we can at a square we should probably look at the center given the nature of a receptive field and the padding of 1 is counter productive. Going from (4, 4) to (2, 2) seems to be doing just that.

But why the earlier convolution?

Or maybe this whole reasoning is wrong and there is something else happening here?

wdhorton · March 29, 2018, 5:24pm

Not sure I understand—why wouldn’t you get from (4,4) to (2,2) with a (3,3) conv, a stride of 2, and padding 1? Those settings should exactly halve the size of the input

radek · March 29, 2018, 5:40pm

I think you would need a stride of 3 for a 3x3 kernel with a padding of 1. But then you end up looking at a lot of zeros.

wdhorton · March 29, 2018, 6:03pm

The formula for the output size is (W−F+2P)/S+1, where W is the input size, F is the size of the conv kernel, and P is padding. For this case I guess it would come down to how you do the rounding. You’d get (4-3+2)/2 + 1, or 3/2 + 1, and I think you do integer division (round down), so you’d get 1 + 1 or 2 as the output size.

Source for formula: http://cs231n.github.io/convolutional-networks/

wdhorton · March 29, 2018, 6:05pm

I think I see what you’re saying though—in that case you wouldn’t be using the right or bottom padding, so in that sense it would be a “one side padding” like you mentioned.