I am working on a TabularPandas
problem, and I am defining my data as follows:
data = TabularPandas(
df,
[Categorify, FillMissing],
categorical_variables,
continuous_variables,
y_names=dependent_variable,
splits=split_indices,
)
which results in a CrossEntropy loss function when instantiating a tabular_learner
, and works great.
At a later point I am resetting the data split (I am using ensembles). I don’t know if there is an easy way to do this, but what I am trying is first stitching up my data in a single DataFrame:
data_df = pd.concat([data.train.xs, data.valid.xs])
data_df[data.y_names[0]] = pd.concat([data.train.y, data.valid.y])
and then redeclaring a TabularPandas
object, exactly like before!
data = TabularPandas(
data_df,
[Categorify, FillMissing],
data.cat_names,
data.cont_names,
y_names=data.y_names[0],
splits=splits,
)
where I pass new lists of indices for splits.
The DataFrames seem correct (data.train.xs.head()
gives the same result before and after if I don’t get new indices for the split). But now my loss is MSELoss
, which is clearly not what I need. I guess I can manually fix my loss, but I need to understand why this is changing, because it may indicate other problems I am overlooking.
I double checked that data.train.y
and data.valid.y
are both dtype: int8
.
Big thanks!