Why use np.log_e() in Lesson 1?


This may be a strange question, and I’m full of those, but is there any particular reasoning to use log_e(x) as “squishing function” in the Bulldozers (Lesson 1+, at RMSLE)? Could you might as well use log_3(x) or log_1337(x) or log_pi(x), or just some other “squishing function”? I’m not asking for this particular problem, since the problem stated that log_e should be used, but in general for future problems.

I may have misunderstood the the point of using RMSLE, but to my understanding its to even out the differences between training results and validation data into a slightly more handleable size.

Super course and website! Thanks!

Use of log depends on the evaluation metric of your problem. If you check out Lesson 3, there is a discussion about a groceries competition. In there the evaluation metric asks for log_e(x) + 1, hence that is used.

The difference in RMSE and RMSLE is about how the errors are handled. We use RMSLE because it focuses on absolute differences. Using Jeremy’s example - it doesn’t care about 10% difference when actual is 10 and your model predicts 9. But it does care about when actual is a million and your prediction is 900k.

Check this out to see why it would not matter.