Hi Guys, I’m pretty new to Deep Learning . My question is when we say “we want the initial layers to train less as these layers mostly represent edges, corners like low-level features (and, even if we make them learn more they would eventually end up learning the edges and corners), so we have a lower learning rate”… I want to understand how lower learning rates can make the initial layer learn less … Any help is really appreciated.
You might think like every complicated patterns are build by simpler patterns. For exemple: a rectangle are build by 4 lines with perpendicular angles. We have the same point of view for complicated object recognitions (a dog is described with 4 legs, a head, tail, …).
Deep learning works by this manner. The initial layers are responsible for recognition the simplest patterns (edges and corners). This is the same for all type of images (satellite images or dog images they all both, at the lowest layers, represented by edges, corners, …)
Deep learning take a long time to learn with millions parameters. Then, people usually take advantage the parameters that are already trained by someone else. They are not the same type of image but for initial layers, they are very similar. So, you don’t want to change too much the initial layers. This is done by setting the learning rate small.
Hope my explanation is clear.