Handling missing values in structured data

arjunrajkumar · December 2, 2017, 3:18am

Hey all.

I understood that we can fill missing values in datasets by an integer that is not present in the dataset like -999 or the mean or median for integers. But what if the row is a string? E.g. a city row which has ‘NEw York’, ‘London’ etc - and the row also has missing values. How can I fill the missing values in this case?

Can I fill this with a unique string that does not appear elsewhere in the training/test datasets?

jeremy · December 2, 2017, 5:15am

You can just use the pandas missing value. We discuss this at length in the ML course, FYI.