Negative R^2?

MichaelK · December 13, 2017, 12:03am

I am trying to optimize R^2 = 1 - SS_res/SS_tot, where

SS_res = sum_k (Y_k - Y_pred,k)^2
SS_tot = sum_k (Y_k - mean(Y))^2.

Now, in classical linear regression, if one adds an intercept to the model, then it is always true that SS_res is smaller (or equal to) SS_tot, and hence R^2 >= 0.

In one of the InceptionResNetV2 models I am fitting, I am getting negative R^2s during optimization, and so I am wondering how I can add an intercept term to the model? Can this be done close to the last dense layer…?

KevinB · December 13, 2017, 12:35am

I am not saying for sure you are wrong here (somebody that knows more may), but if you have a really bad predictor, it actually could be less than 0. Maybe for your particular case this isn’t true, but if I understand things correctly, R^2 just says how good of a fit it is where R^2 of 0 means that it is the same as random and R^2 = 1 means it is perfectly fitting. So a R^2 of less than 0 would just mean that you are missing more predictions than if you had a random function. I am more trying to explain this to see if I understand it and if not then I expect somebody to rip my explanation apart and I will learn something.

ramesh · December 13, 2017, 5:12am

The way I understand is, R^2 of negative indicates the predictions are worse than “Predicting Average value” of the entire target. i.e., the Residual Error (Actual val - predicted val)^2 > Total Error (Actual val - Average value)^2. This usually means that Slope of the fitted line is in the Opposite direction to the desired slope.

If this is happening in Neural Nets, it could be because your Learning Rate is too big and it’s exploding the gradients to increase the Losses. Try Reducing the Learning Rate.

jeremy · December 13, 2017, 11:45pm

Exactly. We discuss this in the ML course for anyone interested in learning more.

KevinB · December 14, 2017, 3:57am

Ah so it’s not worse than random but worse than predicting the average for all of them.