TabularDataBunch.from_df Categorical cannot perform the operation median

I am trying to use TabularDataBunch.from_df, for london energy usage kaggle problem. The dataframe has the following fields:

LCLid MAC000002
energy_sum 7.098
stdorToU Std
Acorn_grouped Affluent
temperatureMax 11.53
windBearing 252
icon partly-cloudy-day
dewPoint 6.15
cloudCover 0.29
windSpeed 2.18
pressure 1004.92
apparentTemperatureHigh 11.53
precipType rain
visibility 12.94
humidity 0.84
apparentTemperatureLow 1.64
apparentTemperatureMax 11.53
uvIndex 2
temperatureLow 2.81
temperatureMin 6.41
temperatureHigh 11.53
summary Partly cloudy until evening.
apparentTemperatureMin 4.01
moonPhase 0.92
temperature_skewness -0.131944
temperature_kurtosis 1.82943
day_length 0.451424
day.of.week Friday
Type Normal
before_holiday 5
after_holiday 5
month 10
year 2012
temperatureMaxHour 14
temperatureMinHour 22
apparentTemperatureMinHour 22
apparentTemperatureHighHour 14
sunsetHour 17
uvIndexHour 12
sunriseHour 6
temperatureHighHour 14
temperatureLowHour 7
apparentTemperatureMaxHour 14
apparentTemperatureLowHour 6

I created cat_names:
cat_names = [‘LCLid’,‘Acorn’, ‘Acorn_grouped’, “icon”, “stdorToU”, “Type”, “day.of.week”, ‘precipType’, ‘summary’,
‘before_holiday’, ‘after_holiday’, ‘month’, ‘year’]

dep_var = ‘energy_sum’
cont_names = list(filter(lambda x: x not in cat_names, df.columns))

and when I got the data:
data = TabularDataBunch.from_df(path, train_df, valid_df, dep_var,
tfms=[FillMissing, Categorify], cat_names=cat_names)

I obtain the following error:

TypeError Traceback (most recent call last)
in ()
1 data = TabularDataBunch.from_df(path, train_df, valid_df, dep_var,
----> 2 tfms=[FillMissing, Categorify], cat_names=cat_names)

~/anaconda3/envs/aind2/lib/python3.6/site-packages/fastai/tabular/ in from_df(cls, path, train_df, valid_df, dep_var, test_df, tfms, cat_names, cont_names, stats, log_output, **kwargs)
77 cat_names = ifnone(cat_names, [])
78 cont_names = ifnone(cont_names, list(set(train_df)-set(cat_names)-{dep_var}))
—> 79 train_ds = TabularDataset.from_dataframe(train_df, dep_var, tfms, cat_names, cont_names, stats, log_output)
80 valid_ds = TabularDataset.from_dataframe(valid_df, dep_var, train_ds.tfms, train_ds.cat_names,
81 train_ds.cont_names, train_ds.stats, log_output)

~/anaconda3/envs/aind2/lib/python3.6/site-packages/fastai/tabular/ in from_dataframe(cls, df, dep_var, tfms, cat_names, cont_names, stats, log_output)
61 else:
62 tfm = tfm(cat_names, cont_names)
—> 63 tfm(df)
64 tfms[i] = tfm
65 cat_names, cont_names = tfm.cat_names, tfm.cont_names

~/anaconda3/envs/aind2/lib/python3.6/site-packages/fastai/tabular/ in call(self, df, test)
13 “Apply the correct function to df depending on test.”
14 func = self.apply_test if test else self.apply_train
—> 15 func(df)
17 def apply_train(self, df:DataFrame):

~/anaconda3/envs/aind2/lib/python3.6/site-packages/fastai/tabular/ in apply_train(self, df)
51 df[name+’_na’] = pd.isnull(df[name])
52 if name+’_na’ not in self.cat_names: self.cat_names.append(name+’_na’)
—> 53 if self.fill_strategy == FillStrategy.MEDIAN: filler = df[name].median()
54 elif self.fill_strategy == FillStrategy.CONSTANT: filler = self.fill_val
55 else: filler = df[name].dropna().value_counts().idxmax()

~/anaconda3/envs/aind2/lib/python3.6/site-packages/pandas/core/ in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
7313 skipna=skipna)
7314 return self._reduce(f, name, axis=axis, skipna=skipna,
-> 7315 numeric_only=numeric_only)
7317 return set_function_name(stat_func, name, cls)

~/anaconda3/envs/aind2/lib/python3.6/site-packages/pandas/core/ in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
2579 return delegate._reduce(op=op, name=name, axis=axis, skipna=skipna,
2580 numeric_only=numeric_only,
-> 2581 filter_type=filter_type, **kwds)
2583 def _reindex_indexer(self, new_index, indexer, copy):

~/anaconda3/envs/aind2/lib/python3.6/site-packages/pandas/core/ in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
1963 if func is None:
1964 msg = ‘Categorical cannot perform the operation {op}’
-> 1965 raise TypeError(msg.format(op=name))
1966 return func(numeric_only=numeric_only, **kwds)

TypeError: Categorical cannot perform the operation median

I am not sure what causes this error.

I think you have one categorical variable flagged as continuous. So when it tried to find its median value during FillMissing, it throws an error.


Thanks, it was strange, but next time I tried it, it worked. Thanks