Tabular Data Type Error

I am having issues with a dataset for collaborative learning having chars instead of floats.

procs_nn = [Categorify, FillMissing, Normalize]

df_nn = pd.read_csv(path/'20220315_Dataset_WindowShadingMLControlApp.csv', low_memory=False)

df_nn_final = df_nn[list(xs_final) + ['surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool']]

cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=5, dep_var='surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool')

to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn, splits=splits, y_names="surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool")

dls = to_nn.dataloaders(1024)

learn = tabular_learner(dls, y_range = (0, 1), layers = [500,250], n_out = 1, loss_func = F.mse_loss)

learn.lr_find()

The lr_find() utility throws the following error:

RuntimeError: Found dtype Char but expected Float

I am wondering if maybe the procs_nn aren’t working. Does anyone else have any ideas?

The following indicates that there are no categorical variables:

df_nn_final[cat_nn].nunique()
Series([], dtype: float64)

And this indicates that there are only eight continuous variables even though there are 9 in the set.

df_nn_final[cont_nn].nunique()
sunAngle_Azimuth_deg                52560
sunIntensity_Diffuse_WperSqm         2329
districtCooling_J                    9586
sunAngle_Hour_deg                   52560
indoorTemperature_degC              52560
sunIntensity_Direct_WPerSqM          4948
indoorTemperatureGrad_degCPerMin    52035
districtHeating_J                   37070
dtype: int64

-Simon

1 Like

Hi Simon,
Did you manage to fix this issue? I am facing the same error. I think it’s because to_nn.train.y is int8 which is of char type, but dls.show_batch() target is of char type with no categorical encoding, so this issue occurs. Because our dls is of a different type than our learner PyTorch model in float.

But I am not sure how to fix this. Can anyone help with how we can fix this issue? Thanks in advance.!