Exploring fastai with Excel and Python

Experiment and contemplation

on graddesc.xlsx

Questions experimented

Jeremy’s original graddesc.xlsm, and my experiment workbook graddesc-daniel.xlsm

How does ReLU (an activation function) make a linear layer/neuron non-linear? (demo by Sarada)

Does converting a linear neuron into a non-linear one make training much easier? (experiments)

How to calculate derivatives of weights with respect to error without analystical derivative formula?

What happens to a 2-neuron model when you give it a ReLU?

What happens when you train a 2-neuron model to find a simple linear function y = 2x + 30?

How important is your learning rate to ensure training to get started and going?

Derivatives of changing weights seem unpredictable, how SGD using learning rate and derivatives to manage weights toward optimal in most cases?

How SGD loses control and derivatives and errors go exploding?

2 linear layer model without non-linear function in between is just another linear layer model. But why in experiment the 2 linear layer model is much worse than a 1-linear layer model?

What happens when you add a ReLU to the 1st neuro of 2-neuron model? (train freely and 3 weights fixed and derivatives stay zero, does 1-neuron do the same? )

What does momentum look like and what is the intuition behind?

Implement average_grad in Excel