Machine Learning Lesson 1 function "proc_df"

Hello, I just completed the first lesson and code every single line by myself. I have two questions about proc_df.

First, how to understand the parameter nas?

Second, I noticed that in the original DataFrame, some columns are ‘text’, such as fiProductClassDesc, so proc_df convert it to a numerical column like any categorical columns?

Thank you!

nas is a dictionary with the columns names as key and the respective median as value. You can easily print it out and check.

You are right, proc_df replaces categorical columns’ values by their category codes. But you have to set the column’s dtype to ‘category’ either manually or using train_cats().

Also, this is all mentioned in the code so you can just press shift + tab and take a look at it. :slight_smile:

2 Likes