Why not merge train and test before applying train_cats and proc_df?


(S) #1

In apply_cats, we pass trn in order to create the same categories for the same words. Similarly, nas is present in proc_df so that empty columns from train can be added in test. (See How to use proc_df on a test set?)

Is there anything wrong with the following approach instead? Merge the train and test sets, then apply train_cats and proc_df, and then split them back again. This way we don’t have to pass arguments back and forth.

Is there any issue with this approach?