Theoretical question: can performance be improved by training 2 models?

(Lucas) #1

Dear all,

I am trying to predict next day sales by training random forests on historic data.

In my data, there is a very clear pattern between week and weekend-days. I therefore have a one-hot encoded variable for the different days of the week.

However, I am having the feeling that I could get better performance when training one model for the week-days and one model for the weekend-days.

But is that theoretically even possible? I mean, the model has the day features as one-hot encoded variables, so in principle it would be able to figure it out “by itself”. On the other hand, I use gridsearch to tune hyper parameters and it could perhaps be possible that those are more tuned towards optimizing weekdays (reducing loss on those days) then optimizing weekend-days since there are 5 week-days and 2 weekend-days (so the loss on the weekend days doesnt count as much as the weekdays…)

Would be great to hear an argument from a theoretical point of view here!