I have a question about filling in
NA in the Rossmann notebook.
In the notebook, it states that
many models have problems when missing values are present, so it’s always important to think about how to deal with them. In these cases, we are picking an arbitrary signal value that doesn’t otherwise appear in the data.
And the code following that looks like:
joined.CompetitionOpenSinceYear = joined.CompetitionOpenSinceYear.fillna(1900).astype(np.int32)
joined.CompetitionOpenSinceMonth = joined.CompetitionOpenSinceMonth.fillna(1).astype(np.int32)
joined.Promo2SinceYear = joined.Promo2SinceYear.fillna(1900).astype(np.int32)
joined.Promo2SinceWeek = joined.Promo2SinceWeek.fillna(1).astype(np.int32)
By looking at the initial data exploration, these values do appear in the data (i.e. minimum of CompetitionOpenSinceYear is 1900 etc).
How do they work as signal values when there are some rows with these values in the original data?