Tabular Data - Training parameters

Hello.

I have a question about a time series problem with tabular data.

I will use a simplified example for clarification.

Let’s say that I have four columns In my training data: weather, hour, date and price.

I want to predict the price at 11 am 1/1/2020.

the problem is that I don’t have the weather in the future. can I still use the weather while training?

It will be great to hear an explanation about this kind of issue.

Thanks alot!
Offir

You could build one model to predict weather from hour and date, and a second model to predict price from hour, date and weather, but use the weather prediction as input for the price model’s prediction.

1 Like

Hi Ralph

Thank you for your answer.

I’m looking for an option to use this column without building another model.

maybe using an aggressive dropout on this column will help? I will be happy to hear about other options.

There is one more example of this kind of issue at the Rossman competition where we have the customers column only at the training set (Jeremy didn’t use it in his model).

Thanks
Offir

You could train with weather but provide a mean value at inference. Perhaps your embeddings would be richer and more predictive with the extra training feature but I don’t think that would work in every case.

1 Like