In proc_df(), why are the NA values replaced by the median value of the column?


(odysseus.kaziolas) #1

I was looking at the source code of the proc_df where it said that upon calling this function the the NA values will be replaced by the median value of the column.

I am wondering what’s the rationale behind this step. From a theoretical ML standpoint, what are the advantages of replacing NAs with the median values?