Hi,
While working through the Rossmann example, I was getting very strange result from lr_find() and during training, pytorch was throwing the exception:
RuntimeError: exp_vml_cpu not implemented for 'Int'
When I look at the DataLoaders, I noticed that the variables I identified as continuous in the ‘cont_names’ parameter were converted to floats.
df = pd.read_csv(r'/home/tim/.kaggle/rossmann/small.csv')
dep_var = 'Sales'
procs = [Categorify, FillMissing, Normalize]
cont_vars = ['Customers', 'Store']
cat_vars = ['DayOfWeek', 'Open', 'Promo', 'StateHoliday', 'SchoolHoliday']
dls = TabularDataLoaders.from_df(df, cat_names=cat_vars, cont_names=cont_vars, y_names='Sales', procs=procs)
print(dls.show_batch(show=False))
( DayOfWeek Open Promo StateHoliday SchoolHoliday Customers Store Sales
0 7.0 1.0 1.0 1.0 1.0 -1.544242 0.510249 0.0
1 3.0 2.0 1.0 1.0 2.0 -0.833547 -0.212174 2670.0
2 1.0 2.0 1.0 1.0 1.0 -0.080632 -1.249099 6700.0
3 1.0 2.0 1.0 1.0 2.0 0.294653 -1.149454 6015.0
I tried converting these columns to ‘continuous’ in the dataframe, but got the same results:
maps = {'DayOfWeek': {1: 'Monday', 2: 'Tues', 3: 'Wed', 4: 'Thurs', 5: 'Fri', 6: 'Sat', 7: 'Sun'}, 'Open': {0: 'False', 1: 'True'},
'Promo': {0: 'False', 1: 'True'}, 'StateHoliday': {0: 'False', 1: 'True'}, 'SchoolHoliday': {0: 'False', 1: 'True'}}
for key, val in maps.items():
df[key] = df[key].map(val)
df[key] = df[key].astype('category')
print(df.head())
print(df.info())
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday
0 1 Fri 2015-07-31 5263 555 True True False True
1 2 Fri 2015-07-31 6064 625 True True False True
2 3 Fri 2015-07-31 8314 821 True True False True
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Store 19999 non-null int64
1 DayOfWeek 19999 non-null category
2 Date 19999 non-null object
3 Sales 19999 non-null int64
4 Customers 19999 non-null int64
5 Open 19999 non-null category
6 Promo 19999 non-null category
7 StateHoliday 19999 non-null category
8 SchoolHoliday 19999 non-null category
dtypes: category(5), int64(3), object(1)
dls2 = TabularDataLoaders.from_df(df, cat_names=cat_vars, cont_names=cont_vars, y_names='Sales', procs=procs)
print(dls2.show_batch(show=False))
( DayOfWeek Open Promo StateHoliday SchoolHoliday Customers Store Sales
0 4.0 1.0 1.0 1.0 1.0 -1.540567 0.531858 0.0
1 1.0 2.0 2.0 1.0 2.0 -0.652530 0.899076 5151.0
2 3.0 2.0 1.0 1.0 1.0 0.007624 -0.445317 5564.0
3 6.0 2.0 2.0 1.0 2.0 0.282493 -1.316683 7500.0
And the variables are integers in ‘dls2.train.xs’:
print(dls2.train.xs)
DayOfWeek Open Promo StateHoliday SchoolHoliday Customers Store
5034 2 2 2 1 1 0.167377 0.058831
7481 3 2 1 1 1 -0.434045 0.734139
6811 3 2 1 1 1 1.020174 -1.350915
3448 6 2 2 1 2 1.391364 -1.406931
13939 4 1 1 1 1 -1.540567 0.012150
This problem has me stumped.
Am I doing something wrong?
Should I submit an issue on github?
Thanks,
Tim