Proc_df subset overlap with validation data?


#1

As mentioned in the videos i observe the proc_df returning sample subset randomly picked from the df_raw
df_trn, y_trn, nas = proc_df(df_raw, ‘SalePrice’, subset=30000, na_dict=nas)
Wouldn’t this training data then overlap with the validation data ??


#2

I think we should create a pipeline to prevent the mixing of the data.