I’m currently trying to replicate the ML1 course with the v1 API to familiarize
myself with the new version. In lesson1 it is demonstrated how one can change the order in categorical variables.
I’m not 100% sure, but I do not see the possibility to modify individual columns in the new TabularDataBunch object. Do I have to do this manually upfront myself?
At the moment I simply identify all non-numeric columns and specify them as categorical target to the TabularDataBunch
I came up with this, but it feels clunky. How would I pass an optional cat_name dictionary to my Transform class?
from pandas import DataFrame
class CustomOrder:
"Information for optional custom categorical ordering"
cat_order={'UsageBand': ['High', 'Medium', 'Low']}
class CategorifyWithCustomOrder(TabularTransform, CustomOrder):
"Transform the categorical variables to that type."
def apply_train(self, df:DataFrame):
self.categories = {}
for n in self.cat_names:
df[n] = df[n].astype('category').cat.as_ordered()
if n in CustomOrder.cat_order:
df[n].cat.set_categories(CustomOrder.cat_order[n], ordered=True, inplace=True)
self.categories[n] = df[n].cat.categories
def apply_test(self, df:DataFrame):
for n in self.cat_names:
df[n] = pd.Categorical(df[n], categories=self.categories[n], ordered=True)