I am having issues with a dataset for collaborative learning having char
s instead of float
s.
procs_nn = [Categorify, FillMissing, Normalize]
df_nn = pd.read_csv(path/'20220315_Dataset_WindowShadingMLControlApp.csv', low_memory=False)
df_nn_final = df_nn[list(xs_final) + ['surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool']]
cont_nn, cat_nn = cont_cat_split(df_nn_final, max_card=5, dep_var='surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool')
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn, splits=splits, y_names="surfaceShadingDeviceIsOnTimeFraction_SouthFacingWindow_bool")
dls = to_nn.dataloaders(1024)
learn = tabular_learner(dls, y_range = (0, 1), layers = [500,250], n_out = 1, loss_func = F.mse_loss)
learn.lr_find()
The lr_find()
utility throws the following error:
RuntimeError: Found dtype Char but expected Float
I am wondering if maybe the procs_nn
aren’t working. Does anyone else have any ideas?
The following indicates that there are no categorical variables:
df_nn_final[cat_nn].nunique()
Series([], dtype: float64)
And this indicates that there are only eight continuous variables even though there are 9 in the set.
df_nn_final[cont_nn].nunique()
sunAngle_Azimuth_deg 52560
sunIntensity_Diffuse_WperSqm 2329
districtCooling_J 9586
sunAngle_Hour_deg 52560
indoorTemperature_degC 52560
sunIntensity_Direct_WPerSqM 4948
indoorTemperatureGrad_degCPerMin 52035
districtHeating_J 37070
dtype: int64
-Simon