Help me understand Lesson 10 (Part 2)! :)

marco_b · November 20, 2019, 8:23am

That’s what I did at first, but the subset of LaTeX that works on the Markdown implementation of this forums is apparently not enough to render my equations, that’s why I had to switch to codecogs. They’re simple GIF now, so they should render on any device … maybe you looked at the post before I edited and switched to the external tool?

By the way, it’s not like codecogs is a speech/handwriting recognition app, I still had to type those in there so I’m not sure what you mean by suggesting a LaTeX Cheatsheet

Back on topic,

The same could be said for the two 3x3 convs versus a 5x5 conv as well though. Most likely a combination of reasons…

I think point 2 is probably “solved” by now, I still don’t understand point 1, aka “you’re just shuffling the numbers” but If not proven wrong I’ll just assume that what he meant was similar to what I expressed here

marco_b:

I think you want to gradually increase the portion of the image you’ve looked at and the number of filters/features ‘together’, because the 3x3 kernels look at a very small part of the image and (in that space, i.e. the original image) there’s only so much you can “catch” with a 3x3 kernel (edges, small corners, … ) and it’s better in my opinion to use more computations at higher layers where the receptive fields capture representations of bigger patches of the original image!

This has more to do with the “semantics” of the image classification (or representation rather) rather then the actual computations done. It has to do with the fact that we’re talking about the actual input image and not just a 1 x width x height tensor of activations of some kind!

so in the end we do agree even on this topic, just expressed in a way that made it hard for me to rally see it!