Rossmann - NaN in processed dataframe

Training sample is showing NaN’s thus below statement is failing to execute with an exception that code contains NaNs.

df, y, nas, mapper = proc_df(joined, ‘Sales’, do_scale=True)

Has anyone faced similar issue?

Note: I’ve not made any changes to lesson 3 notebook yet I face this issue. I even tried restarting Kernel and executing all code lines again to see I could avoid NaN but it didn’t work.

This is because you ran these two lines continously…

So df is the coulmns of the test dataframe,
and thus you never calculated the AfterStateHoliday value of the training set.

Remove the line

df = test[columns]

and it should work.

But when you run this again on the test set - you’ll have to add the line again.

Another option is to change
df = test[columns] to df_test = test[columns]
But you have to change get_elapsed to take in a df too.

6 Likes

How blind of me. :expressionless:

What @arjunrajkumar suggests worked for me fine without any errors.

I changed the second df to df_test and kinda duplicated nearly everything afterwards, until we merge back these dataframes into joined and joined_test later in the notebook. :slight_smile:

3 Likes

Unfortunately I’m getting another exception:

This is because you probably did not convert joined’s cat_var’s to type ‘category’.
Sending from mobile(so cant check the code), but Jeremy had written a for loop where you go thru each and convert.

1 Like

May help in debugging if I can see a few lines of code - before this line.

Note that in general I don’t design the notebooks to be just run top to bottom (although the 1st lessons are OK to do that). Be sure to think about what every line is doing, and when/whether to use it. Here’s some more details http://wiki.fast.ai/index.php/How_to_use_the_Provided_Notebooks

4 Likes