Problem with cont_cat_split() in Google Collab

Hi,

I am using Google Collab.
I am trying to run section 09_tabular.ipynb


and starting from today I have a problem with
cont,cat = cont_cat_split(df, 1, dep_var=dep_var)
it throws the following error:

TypeError Traceback (most recent call last)
in ()
----> 1 cont,cat = cont_cat_split(df, 1, dep_var=dep_var)

1 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
386 “”"
387 if not issubclass_(arg1, generic):
–> 388 arg1 = dtype(arg1).type
389 if not issubclass_(arg2, generic):
390 arg2 = dtype(arg2).type

TypeError: Cannot interpret ‘UInt32Dtype()’ as a data type

I already run this notebook before and it run fine.
Maybe somebody else also experienced such a problem?
I can manually resolve this problem by changing the data type of the saleWeek to int64, but it throws a simmilar error with ProductSize which is a “category”.
Tnx!

This was a PR done by someone recently. Could you open up an issue for it in the fastai github? (And if possible just a minimal reproducer of what’s needed? It would be great if that were a colab notebook :slight_smile: )

I have opened an issue:


What is a PR? Pull request?
Tnx,
Maxim

If you just want to run the notebook for now:

Change saleWeek from UInt32 to uint32
df = df.astype({'saleWeek' : 'uint32'})

Modify cont_cat_split to detect ProductSize which is a CategoricalDtype properly by using the name property:

def cont_cat_split(df, max_card=20, dep_var=None):
    cont_names, cat_names = [], []
    for label in df:
        if label in L(dep_var): continue
        
        # mod to detect ProductSize type properly
        if (df[label].dtype.name == 'category'):
          cat_names.append(label)
          continue

        if (np.issubdtype(df[label].dtype, np.integer) and
            df[label].unique().shape[0] > max_card or
            np.issubdtype(df[label].dtype, np.floating)):
            cont_names.append(label)
        else: cat_names.append(label)
    return cont_names, cat_names
8 Likes

Great, tnx!

New issue and PR here, please review and let me know if you need any changes.


Hey @sylvaint , I did as you instructed, but I still have the same issue:

TypeError: Cannot interpret ‘CategoricalDtype(categories=[‘Large’, ‘Large / Medium’, ‘Medium’, ‘Small’, ‘Mini’,
‘Compact’],
ordered=True)’ as a data type

Sounds like you did not redefine the function cont_cat_split
Just copy, paste and run the code above before using the function.

@sylvaint Initially, I redefined the function cont_cat_split in the script core.py. Now, I had to copy the function cont_cat_split from the script and paste it into my code to redefine it there. And it worked. Thanks.

1 Like

Thanks, @sylvaint. This solved my issue.

1 Like

Do I understand it correctly, that we do not have the fix (checking the dtype name for categories) upstream yet? Is there any open pull request?

Answering to myself: The issue is fixed, it is just that the current PyPi release is still 2.2.3 in reality: https://github.com/fastai/fastai/issues/3220

When installed from git things work nicely for me. Thanks!