Hi,
I’m using fill missing with filling strategy median.
I’ve noticed (using databunch.show_batch()),
that I still get some nan values (on continuous columns) after building my databunch.
here’s my code:
procs = [FillMissing, Categorify, Normalize]
databunch = TabularDataBunch.from_df(
path=model_path,
df=train_and_dev,
valid_idx=valid_idx,
cat_names=categorical_columns,
cont_names=cont_columns,
dep_var='outcome',
procs=procs,
num_workers=0)
my question is:
is median calculated on the whole data frame or only on train part of the df (excluding val_idx)?
I’m asking this since I might have some columns with nan all along train part and some other value on validation part.
Thanks!