Live coding 16

Daniel · July 14, 2022, 2:52pm

Yes, so, can we propose that the non-linear activation is the key to turning the ordinary linear layers/neurons into magical neuralnet? Is it true? I think the excel experiment below provide some support to this proposal. @Moody

The observations from the experiment:

after adding a ReLU to the first neuron, the 2-neuron model can train freely without error exploding (it must have been magic of ReLU, right?)
within the first epoch, 3 out of 4 weights found their optimal and stop updating themselves (Whereas, neither of the two weights of 1-neuron model’s are still updating without settling, it must have been magic of ReLU, right?)
however, although improve steadily the error is far worse than the 1-neuron model. (Interesting!)

try the spreadsheet yourself

Then comes more questions:

Why a more complex and smart model (4 neurons with ReLU) still can’t beat a single neuron model without activation function?
Can such 2-neuron ReLU sandwich ever beat a simple linear neuron model on finding this simple y=2x+30 target? What can we do about it to achieve it?