Can't convert np.ndarray of type numpy.object_. while doing inference with my tabular learner

Pumsa · June 7, 2021, 11:52am

Hi all!

I am getting this error while doing inference with my tabular learner using the predict function.
At first I thought it could be a conflict between libraries, but after installing older versions i’m still getting the same result. My pandas version is 1.1.0 and my fastai version is 2.3.1. After that I tried executing the same code in one of the Colab notebooks and i got the same error, which makes me think there is a problem with the way i manage my dataframe, here are the dtypes: (not all the variables are used in the model)
Date datetime64[ns]
pmc int8
growth float32
credit float32
revenue_last_m float32
revenue_last_2_m float32
revenue_last_3_m float32
credit_last_year float32
Year int16
Month int8
Week int8
Day int8
Dayofweek int8
Dayofyear int16
Is_month_end bool
Is_month_start bool
Is_quarter_end bool
Is_quarter_start bool
Is_year_end bool
Is_year_start bool
Elapsed float32
WeekOfMonth int8
is_holiday bool
next_holiday int8
previous_holiday int8
dtype: object

Can anyone help me with this? I’m at a loss and don’t really know what else to try.
Thanks in advance!

bsalita · August 27, 2021, 9:09am

There seems to be a bug in fastai.tabular.all.TabularDataLoaders.from_df() where bool is seen as an object instead of being accepted as-is. Try converting all bool columns to ‘uint8’. Bug still exists in fastai 2.5.2 and pytorch 3.9.

# workaround for fastai/pytorch bug where bool is treated as object and thus erroring out.
for n in df:
    if pd.api.types.is_bool_dtype(df[n]):
        df[n] = df[n].astype('uint8')

Pumsa · August 30, 2021, 10:51am

Hi Robert,

You hit the mark! That was actually the issue. Removing the variables from the dataframe if not used in the model or casting them with another type as you suggested solve the bug.

Huge thanks!

drscotthawley · November 6, 2021, 5:29pm

Hmm… I’m getting this error despite removing any unused columns, and not having any bool columns, and even running the above type conversion code.

Specificly the problematic line is in the tutorial (with my own data);

row, clas, probs = learn.predict(df.iloc[0])

(where df is the dataframe after I removed unused columns and did the conversion, it is the same dataframe used to define the dls and the learner.)

I don’t understand why this error is occurring given that when I run df.dtypes I get a listing of things like int64, float64 and object– same as when I run df.dtypes in the original tabular data tutorial.

Mine:

>>> df.dtypes 
genre                object
popularity            int64
acousticness        float64
danceability        float64
duration_ms           int64
energy              float64
instrumentalness    float64
key                  object
liveness            float64
loudness            float64
mode                 object
speechiness         float64
tempo               float64
beats_per_bar         int64
valence             float64
dtype: object

Original:

>>> df.dtypes
age                 int64
workclass          object
fnlwgt              int64
education          object
education-num     float64
marital-status     object
occupation         object
relationship       object
race               object
sex                object
capital-gain        int64
capital-loss        int64
hours-per-week      int64
native-country     object
salary             object
dtype: object

My colab link: Google Colab

Any other suggestions?

erwald · July 8, 2022, 9:43am

@drscotthawley I’m running into the exact same issue – did you ever figure out a way to solve it?

erwald · July 9, 2022, 7:25am

So eventually I figured out what it was. I was giving learn.predict() one row (as a Pandas Series) from my test df. Series don’t have type information when extracted from DataFrames (unless all columns have the same type) – it’ll just have the object dtype. Normally this is fine since fastai converts categorical and continuous columns to the right types, but in my case my row also included the y column. fastai didn’t know how to convert it, which produced the above error. The solution was to drop it from the series.