[ML 1 Video 1] Why is having RMSE as evaluation metric mean that ratios are important?

(Aseem Bansal) #1

Jeremy mentions in ML 1 Video 1 that for the competition “Blue Book for bulldozers” as RMSE is being used as evaluation metric hence ratios are important. So we are taking log of the target variable. I don’t understand why RMSE as evaluation metric would mean ratios are important. Can someone explain that?

(satish) #2

@jeremy is it okay if you can open up forums of machine learning.

Yes I too had the same question

(Aseem Bansal) #3

I think Jeremy mentioned in the intro thread that those forums were only for the Masters students.

(WG) #4

It’s pretty easy to test …

Run the following calculations:

200 - 100
10 - 5
log(200) - log(100)
log(10) - log(5)

Notice that the last two operations return the same results. The reason it is important here is at least twofold:

  1. The competition specifically states that it is using RMSEL and not RMSE
  2. Depending on the problem, it may be more important to know how far your predictions are off relative to the target rather than the actual difference between the two. Looking at my example above, the actual difference without using log is very different … 100 and 5, whereas using the log function make is clear that you are off by the same amount proportionally irrespective of the actual differences.

(Aseem Bansal) #5

I understand that with logs ratio are important. Basically because

log a - log b = log (a/b)

Also understand that for many use cases ratios may be important instead of absolutes. But my question is more about if evaluation metric is RMSLE (which has log in its formula) why does the ratio become important for the target variable due to the evaluation metric? I understand log is there in the formula of RMSLE but that log is of errors not the target variable. Right?

(WG) #6

I think Jeremy converts target variable more as a convenience than a necessity as it allows you to use the standard R/MSE loss functions already defined in the library.

(Aseem Bansal) #7

Watched first few minutes of Video 2 and the explanation there clears my confusion. So the only reason we are taking log is because the evaluation metric has log but the one for which we are optimising does not. So taking log of the target variable is bridging that gap.

(Jeremy Howard (Admin)) #8

We’ll be opening them soon.