I was playing with the first notebook, and I noticed than each epoch for training the last layer take much longer when cyclic learning rate is used compared when it is not (minutes vs seconds), and I was wondering if someone here knows why ?
Thanks !
It is slower because you are not using the precompute=True anymore. Jeremy probably will get into the details in next class.