Now, in classical linear regression, if one adds an intercept to the model, then it is always true that SS_res is smaller (or equal to) SS_tot, and hence R^2 >= 0.

In one of the InceptionResNetV2 models I am fitting, I am getting negative R^2s during optimization, and so I am wondering how I can add an intercept term to the model? Can this be done close to the last dense layerâ€¦?

I am not saying for sure you are wrong here (somebody that knows more may), but if you have a really bad predictor, it actually could be less than 0. Maybe for your particular case this isnâ€™t true, but if I understand things correctly, R^2 just says how good of a fit it is where R^2 of 0 means that it is the same as random and R^2 = 1 means it is perfectly fitting. So a R^2 of less than 0 would just mean that you are missing more predictions than if you had a random function. I am more trying to explain this to see if I understand it and if not then I expect somebody to rip my explanation apart and I will learn something.

The way I understand is, R^2 of negative indicates the predictions are worse than â€śPredicting Average valueâ€ť of the entire target. i.e., the Residual Error (Actual val - predicted val)^2 > Total Error (Actual val - Average value)^2. This usually means that Slope of the fitted line is in the Opposite direction to the desired slope.

If this is happening in Neural Nets, it could be because your Learning Rate is too big and itâ€™s exploding the gradients to increase the Losses. Try Reducing the Learning Rate.