Is this supposed to work as such?
cat_vars = ['Store', 'DayOfWeek', 'Year', 'Month', 'Day', 'StateHoliday', 'CompetitionMonthsOpen',
'Promo2Weeks', 'StoreType', 'Assortment', 'PromoInterval', 'CompetitionOpenSinceYear', 'Promo2SinceYear',
'State', 'Week', 'Events', 'Promo_fw', 'Promo_bw', 'StateHoliday_fw', 'StateHoliday_bw',
'SchoolHoliday_fw', 'SchoolHoliday_bw']
but after running this …
data = (TabularList.from_df(df, path=path, cat_names=cat_vars, cont_names=cont_vars, procs=procs)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
)
data.x.cat_names
I see the following:
['Store',
'DayOfWeek',
'Year',
'Month',
'Day',
'StateHoliday',
'CompetitionMonthsOpen',
'Promo2Weeks',
'StoreType',
'Assortment',
'PromoInterval',
'CompetitionOpenSinceYear',
'Promo2SinceYear',
'State',
'Week',
'Events',
'Promo_fw',
'Promo_bw',
'StateHoliday_fw',
'StateHoliday_bw',
'SchoolHoliday_fw',
'SchoolHoliday_bw',
'CompetitionDistance_na',
'CloudCover_na']
The last two items aren’t in the cat_vars list so this was unexpected.