While reading up on conv nets, I came across this really neat paper that I wanted to share with you.
I think this is the first ever description of what can be conceived of as a CNN. I really liked it as the components are very simple and the paper is written in a very approachable fashion. Also, there is a lot of mention in the course that we should start reading scientific papers but reading some of them can be quite intimidating - this is a nice pick for getting your feet wet and gives you a nice insight into how all of this began.
The paper also very explicitly (more so than in later papers, I think) speaks to the power of a CNN, meaning that through applying convolutions we are applying domain knowledge (knowledge of images) and thus reducing the number of parameters and making the task easier for our neural net to handle. This struck me as an astoundingly key insight - yeah, with enough examples and enough compute power through the universal learning theorem we could get equally (actually even better) results if we only utilized fully connected layers, but here we are saying - hey, we know that the same feature can exist in various parts of the image, so let’s use this knowledge to make the task at hand easier, constrain some weight updates, and let’s have a go at the task with significantly less trainable parameters than a fully connected neural net would require!
Power of simplicity
Well, I got a kick out of reading this paper so thought I would share Enjoy!