Dtype issue with get_tabular_data_from_df

Hi all,

Thanks for the great library. I’ve been using it to look at some tabular data and I’ve run into an issue I can’t wrap my head around.

When I call get_tabular_data_from_df to get the DataBunch, it returns an error saying:

TypeError: can’t convert np.ndarray of type numpy.int8. The only supported types are: double, float, float16, int64, int32, and uint8.

I traced the issue back to the dependent variable, which is being converted to an int8 by the function. I’ve tried loading the data as an int32 and as a category. If the dtype is int32, I get an error saying I can’t use .cat unless it’s on a categorical variable. However, I get the TypeError when it’s loaded as a category dtype.

Let me know if I can provide any other info.

It would be great to have more code and your full error message.

Sure, the link to the notebook is here.

The error message is at the very bottom.

Hum, I think there might be a as_type(np.int64) missing at the end of line 24 of tabular.data. Do you have a developer install of fastai? If so could you check this solves the issue? Thanks.

Thanks @sgugger. I pulled the developer version last night and tried adding it in. I now get the following error whether the dependent variable dtype is set to categorical or numeric before calling the method.

AttributeError: Can only use .cat accessor with a ‘category’ dtype

If the dtype is category before calling the method, it changes it to int64 and raises the error.

Raises which error?

The AttributeError. It happens whether the dependent variable is passed in as a numeric dtype.

I don’t understand, you didn’t have that error in your notebook before and the categorical variable was having no problem. Could you share an updated notebook?

Hi @sgugger,

My apologies this wasn’t more clear. I pulled the new dev version today and reran the notebook. Currently, the errors are as follows:

if dep_var is int64 on the call to TabularDataBunch.from_df and is left to be converted to category using the Categorify transform, the error is:

AttributeError: ‘CategoricalAccessor’ object has no attribute ‘astype’

if dep_var is set as a category dtype prior to calling TabularDataBunch.from_df, it gets the same error:

AttributeError: ‘CategoricalAccessor’ object has no attribute ‘astype’

I saw in the new data.py you updated line 24 with df[dep_var].cat.astype(np.int64). I tweaked that line to df[dep_var].cat.codes.astype(np.int64). It seems to have resolved that issue, but now I get an error downstream on line 29 of data.py. The notebook with the error message is here:

I went back and checked the dtypes of the columns in cat_names and they’re all properly set as category.

Could you run %debug to know which column poses problem? From your error message and the test you ran, this should work properly.
The .codes missing has been fixed.