Wiki thread: lesson 1

(Jonas) #88

Question about proc_df

Hi everyone,
I’ve got a question about how you would handle it if in your train and test data set different columns have NaN in them or not. I used the Kaggle data from the housing price competition. In the training data column A has no NaN inside and column B also not but in the test data column B has a few NaN inside therefore proc_df creates the column B_na. Now the test data set has one column more and can’t be used.

To make it work I just dropped all the feature_na columns proc_df created in the test and training data set. What better way would there be? Create a _na column for every column with only false inside if no value is NaN?

Thanks for your help,

(Jackson Isaac) #89

You can add a folder (e.g., custom) within fastai module directory. Your directory might look something like -

– tabular
– utils
– …
– custom

You may add user defined modules under custom folder. git pull won’t overwrite/revert your changes. If you want to have git tracking for custom files as well, you may want to fork the fastai repo and push your custom code to the fork.

In your code, you would import the function as -
from fastai.custom import function_name

You can add the module to existing folder as well, but to keep user code separated from upstream it might be easier to have it under a different folder.

(Abhinav Verma) #90

Yeah,sometimes the server is down . It’s very frequent in kaggle