[ML 1 Video 1] Why is having RMSE as evaluation metric mean that ratios are important?

anshbansal · January 10, 2018, 4:08am

Jeremy mentions in ML 1 Video 1 that for the competition “Blue Book for bulldozers” as RMSE is being used as evaluation metric hence ratios are important. So we are taking log of the target variable. I don’t understand why RMSE as evaluation metric would mean ratios are important. Can someone explain that?

satish860 · January 10, 2018, 4:17am

@jeremy is it okay if you can open up forums of machine learning.

Yes I too had the same question

anshbansal · January 10, 2018, 4:20am

I think Jeremy mentioned in the intro thread that those forums were only for the Masters students.

wgpubs · January 10, 2018, 4:41am

It’s pretty easy to test …

Run the following calculations:

200 - 100
10 - 5
log(200) - log(100)
log(10) - log(5)

Notice that the last two operations return the same results. The reason it is important here is at least twofold:

The competition specifically states that it is using RMSEL and not RMSE
Depending on the problem, it may be more important to know how far your predictions are off relative to the target rather than the actual difference between the two. Looking at my example above, the actual difference without using log is very different … 100 and 5, whereas using the log function make is clear that you are off by the same amount proportionally irrespective of the actual differences.

anshbansal · January 10, 2018, 2:11pm

I understand that with logs ratio are important. Basically because

log a - log b = log (a/b)

Also understand that for many use cases ratios may be important instead of absolutes. But my question is more about if evaluation metric is RMSLE (which has log in its formula) why does the ratio become important for the target variable due to the evaluation metric? I understand log is there in the formula of RMSLE but that log is of errors not the target variable. Right?

wgpubs · January 10, 2018, 8:16pm

I think Jeremy converts target variable more as a convenience than a necessity as it allows you to use the standard R/MSE loss functions already defined in the library.

anshbansal · January 11, 2018, 4:01pm

Watched first few minutes of Video 2 and the explanation there clears my confusion. So the only reason we are taking log is because the evaluation metric has log but the one for which we are optimising does not. So taking log of the target variable is bridging that gap.

jeremy · January 12, 2018, 12:52am

We’ll be opening them soon.