Thanks for the great library. I’ve been using it to look at some tabular data and I’ve run into an issue I can’t wrap my head around.
When I call get_tabular_data_from_df to get the DataBunch, it returns an error saying:
TypeError: can’t convert np.ndarray of type numpy.int8. The only supported types are: double, float, float16, int64, int32, and uint8.
I traced the issue back to the dependent variable, which is being converted to an int8 by the function. I’ve tried loading the data as an int32 and as a category. If the dtype is int32, I get an error saying I can’t use .cat unless it’s on a categorical variable. However, I get the TypeError when it’s loaded as a category dtype.
Hum, I think there might be a as_type(np.int64) missing at the end of line 24 of tabular.data. Do you have a developer install of fastai? If so could you check this solves the issue? Thanks.
Thanks @sgugger. I pulled the developer version last night and tried adding it in. I now get the following error whether the dependent variable dtype is set to categorical or numeric before calling the method.
AttributeError: Can only use .cat accessor with a ‘category’ dtype
If the dtype is category before calling the method, it changes it to int64 and raises the error.
I don’t understand, you didn’t have that error in your notebook before and the categorical variable was having no problem. Could you share an updated notebook?
My apologies this wasn’t more clear. I pulled the new dev version today and reran the notebook. Currently, the errors are as follows:
if dep_var is int64 on the call to TabularDataBunch.from_df and is left to be converted to category using the Categorify transform, the error is:
AttributeError: ‘CategoricalAccessor’ object has no attribute ‘astype’
if dep_var is set as a category dtype prior to calling TabularDataBunch.from_df, it gets the same error:
AttributeError: ‘CategoricalAccessor’ object has no attribute ‘astype’
I saw in the new data.py you updated line 24 with df[dep_var].cat.astype(np.int64). I tweaked that line to df[dep_var].cat.codes.astype(np.int64). It seems to have resolved that issue, but now I get an error downstream on line 29 of data.py. The notebook with the error message is here:
I went back and checked the dtypes of the columns in cat_names and they’re all properly set as category.
Could you run %debug to know which column poses problem? From your error message and the test you ran, this should work properly.
The .codes missing has been fixed.