I am working on a
TabularPandas problem, and I am defining my data as follows:
data = TabularPandas( df, [Categorify, FillMissing], categorical_variables, continuous_variables, y_names=dependent_variable, splits=split_indices, )
which results in a CrossEntropy loss function when instantiating a
tabular_learner, and works great.
At a later point I am resetting the data split (I am using ensembles). I don’t know if there is an easy way to do this, but what I am trying is first stitching up my data in a single DataFrame:
data_df = pd.concat([data.train.xs, data.valid.xs]) data_df[data.y_names] = pd.concat([data.train.y, data.valid.y])
and then redeclaring a
TabularPandas object, exactly like before!
data = TabularPandas( data_df, [Categorify, FillMissing], data.cat_names, data.cont_names, y_names=data.y_names, splits=splits, )
where I pass new lists of indices for splits.
The DataFrames seem correct (
data.train.xs.head() gives the same result before and after if I don’t get new indices for the split). But now my loss is
MSELoss, which is clearly not what I need. I guess I can manually fix my loss, but I need to understand why this is changing, because it may indicate other problems I am overlooking.
I double checked that
data.valid.y are both