Help proc_df error

I can’t figure out the origin / scope of this error. Please help / advise.
Search doesn’t find a mention in the forum
The error states:

AttributeError: Can only use .cat accessor with a ‘category’ dtype

This makes me think there’s some string / category behaviours being messed up.
However my data is a time field and bunch of floats. They were read in as strings because they contained text errors, so pd.to_numeric(..., errors='coerce') has been called.

I see only the following dtypes on my columns:
datetime64[ns]
float64

float64

detailed error track :

----> 2 train_all, y, nas = proc_df(train_all[1:24], ‘target’)

~\fastai\projs\theproj\fastai\structured.py in proc_df(df, y_fld, skip_flds, do_scale, na_dict, preproc_fn, max_n_cat, subset, mapper)
426 for n,c in df.items(): na_dict = fix_missing(df, c, n, na_dict)
427 if do_scale: mapper = scale_vars(df, mapper)
–> 428 for n,c in df.items(): numericalize(df, c, n, max_n_cat)
429 res = [pd.get_dummies(df, dummy_na=True), y, na_dict]
430 if do_scale: res = res + [mapper]

~\fastai\projs\theproj\fastai\structured.py in numericalize(df, col, name, max_n_cat)
315 “”"
316 if not is_numeric_dtype(col) and ( max_n_cat is None or col.nunique()>max_n_cat):
–> 317 df[name] = col.cat.codes+1
318
319 def scale_vars(df, mapper):

~\Anaconda2\envs\fastai\lib\site-packages\pandas\core\generic.py in getattr(self, name)
3608 if (name in self._internal_names_set or name in self._metadata or
3609 name in self._accessors):
-> 3610 return object.getattribute(self, name)
3611 else:
3612 if name in self._info_axis:

~\Anaconda2\envs\fastai\lib\site-packages\pandas\core\accessor.py in get(self, instance, owner)
52 # this ensures that Series.str. is well defined
53 return self.accessor_cls
—> 54 return self.construct_accessor(instance)
55
56 def set(self, instance, value):

~\Anaconda2\envs\fastai\lib\site-packages\pandas\core\categorical.py in _make_accessor(cls, data)
2209 def _make_accessor(cls, data):
2210 if not is_categorical_dtype(data.dtype):
-> 2211 raise AttributeError("Can only use .cat accessor with a "
2212 “‘category’ dtype”)
2213 return CategoricalAccessor(data.values, data.index,

AttributeError: Can only use .cat accessor with a ‘category’ dtype

Apply traincats before…

thanks for the suggestion.
there’s no string category columns… I tried it anyway in case i was missing something and as expected, it made no difference (it wouldn’t be doing anything on numerical data)

Try passing the max_n_cat parameter to proc-df,
Also can you help us with a screenshot of
df.info()

max_n_cat shouldn’t matter, there’s no cats (categories not felines)
df,info() is same as dtype for what matters (they’re all floats):

timestamp 570300 non-null datetime64[ns]
data1 567587 non-null float64
data2 568993 non-null float64

target 568993 non-null float64

I’m trying to set a break point / breakpoint on line 317 in structured.py to help figure out what’s going on.
I can’t recall which video Jeremy touched on / showed how to do this and I’m not finding it in search.
I quickly tried pdb and couldn’t get it to let me (I’ve never used it).
So rather than debug that and before I tag Jeremy… please direct me to where Jeremy showed us or let me know how debugging with breakpoints works in fastai / jupyter. thanks (If it matters, I’m on a local Windows install)

1 Like

ok so debugger told me it had a problem with datetime64[ns] as it returns is_numeric as False (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.api.types.is_numeric_dtype.html)
hence it tries to make it a categorical variable but recognises it isn’t

for now I’ve used Jeremy’s helper function add_datepart() and dropped this column

Thanks for the help @ecdrid

5 Likes

what’s ns? Indexing?

nanoseconds
https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html

Thnx .
New to date time objects

I had this same issue and figured it out when I typed df.info(verbose=True) and there was one datetime remaining.

hello … how do i use proc_df on a test data dat does not include the dependent variable