Tricks for networks that are hard to train?

Hey team,

I wanted to collect your wisdom on what steps you usually take when a neural net on a given dataset seems impossible to train. I’m currently working on classifying audio spectrograms into 176 categories, but my accuracy just doesn’t go above random guesses.

Jeremy touches this topic in one of the lessons mentioning that starting with a very low LR for a couple of epochs and then increasing the LR back again can do some good. I’ve never fully understand why this is and sadly it doesn’t seem to work in my current project.

Any other ideas? I dont think I made any mistakes when preprocessing the data and other people solved this problem using similar architectures

Thank You


I found a little piece of info in Stanfords excellent CNNs for visual recognition. They recommend taking a tiny portion (like 20 samples) of the data and overfitting the hell out of it. Only when you reach zero loss should you move on to the full dataset.

To me this sounds similar to Jeremys approach of pushing the learning process in the right direction before firing heavy learning rates at it.