Hello, there. When playing with Tabular Playground Series - May 2022 at kaggle, I have noticed something. I created Tabular Pandas object for train dataset like that:
to=TabularPandas(df_train, procs=[Categorify, Normalize], cat_names=cat,
cont_names=cont, y_names=dep_var, y_block=CategoryBlock(),
splits=RandomSplitter()(range_of(df_train)))
After training my model on Decision Trees, I want to make a prediction from test so I created Tabular Pandas object for test dataset like that:
to_test=TabularPandas(df_test, procs=[Categorify, Normalize], cat_names=cat,
cont_names=cont)
tst_xs=to_test.train.xs
As you noticed these are 2 separate TP objects with different means and standard deviations. As well as great loss on the performance of a model.
To learn more about I made my notebook visible to anyone: TabularPS May 2022 | Kaggle
P.S. I should also mention that I am not expert practitioner and I am new there, so any resources & links for solving this problem is appreciated.
P.S. 2: Sorry for the title