Since we’re covering optimization in the lessons I decided to record a video over the weekend talking through how cool it is. The first half talks (far too quickly) about how it forms the heart of many deep learning approaches, second half runs through a notebook abusing the idea to make pretty pictures

Great Video! @johnowhitaker Are you aware of any in-depth resources on loss functions specifically?
I’m looking for something that can guide me on creating my own loss functions based on my problem domain and what I am trying to achieve, trying random stuff and seeing what works is fairly educational but I’m sure there must be some solid theory on this that can indicate why/when certain loss functions work vs don’t work.

I’m more just generally interested. But this came about because I was working on a regression problem that had a rather long tail, mean squared error seemed to do the trick but I wanted to take into account what I thought was relevant. In this problem large amounts of my values are very close to zero (0.0001 to 0.1) and a small fraction are much higher (up to several hundred). These high values are of particular concern to me as they are the more important data. Anyway I noticed that if I had a y_true value of 0.01 and my y_pred was 1.0 this is off by a magnitude of 100 but the mean squared error is ~1.0, alternatively if y_true is 100 and y_pred is 102 my mean squared error is ~4.0 but technically this is a better estimate.
Given the above I had tried using logarithmic errors as well as custom relative difference errors but noticed that these did not converge at all and seemed to blow up to massive losses. I assumed there was some theoretical basis for why what I was doing is wrong?

Maybe someone here can speak to the theoretical side, but for something like your example predicting log(y) would seem to make sense to me - is that what you mean by logarithmic errors?