Hello, there. When playing with Tabular Playground Series - May 2022 at kaggle, I have noticed something. I created Tabular Pandas object for train dataset like that:
to=TabularPandas(df_train, procs=[Categorify, Normalize], cat_names=cat, cont_names=cont, y_names=dep_var, y_block=CategoryBlock(), splits=RandomSplitter()(range_of(df_train)))
After training my model on Decision Trees, I want to make a prediction from test so I created Tabular Pandas object for test dataset like that:
to_test=TabularPandas(df_test, procs=[Categorify, Normalize], cat_names=cat, cont_names=cont) tst_xs=to_test.train.xs
As you noticed these are 2 separate TP objects with different means and standard deviations. As well as great loss on the performance of a model.
To learn more about I made my notebook visible to anyone: TabularPS May 2022 | Kaggle
P.S. I should also mention that I am not expert practitioner and I am new there, so any resources & links for solving this problem is appreciated.
P.S. 2: Sorry for the title