I want tp call m.predict() on some data, but that data has to fit the dimensions of X_train, but after a lot of feature engineering it is difficult to shape the new input to fit the model.
If the raw data looked like this (dataframe.info()) :
> Data columns (total 5 columns): > date 58764 non-null object > vehicleCount 58764 non-null int64 > id 58764 non-null int64 > totalSpaces 58764 non-null int64 > garageCode 58764 non-null object > dtypes: int64(3), object(2)
It might look like this at training time:
> Data columns (total 26 columns):
> Hourofday 47764 non-null int64
> Minutesofhour 47764 non-null int64
> Year 47764 non-null int64
> Month 47764 non-null int64
> Week 47764 non-null int64
> Day 47764 non-null int64
> Dayofweek 47764 non-null int64
> Dayofyear 47764 non-null int64
> Is_month_end 47764 non-null bool
> Is_month_start 47764 non-null bool
> Is_quarter_end 47764 non-null bool
> Is_quarter_start 47764 non-null bool
> Is_year_end 47764 non-null bool
> Is_year_start 47764 non-null bool
[...]
(11 one-hot columns, thats another issue as well)
[...]
> dtypes: bool(6), int64(8), uint8(12)
Are there any good practices to shape the new input?