Same issue here!
Removing Normalize
from procs_nn
array, seems to “fix” the issue.
The good thing is that it doesn’t look like user error on our part. The trouble is we need normalization for a neural network.
I’ve been playing with tabular a while and never faced this issue before, ever. I can try and run the fastbook version and see what’s up but I wouldn’t rule out user error to some degree (or book error)
Encountering the same error. It works if you leave out the cont_nn variable. Will continue digging into this…
I don’t know if this is the best answer, but don’t think it is right to remove the Normalize processor from procs_nn or to remove cont_nn from the Tabular Pandas call. We need the ‘saleElapsed’ continuous variable and we need to normalize it.
I did notice that
df_nn_final.dtypes
YearMade int64
ProductSize category
Coupler_System object
fiProductClassDesc object
Hydraulics_Flow object
ModelID int64
saleElapsed object
fiSecondaryDesc object
fiModelDesc object
Enclosure object
Hydraulics object
ProductGroup object
fiModelDescriptor object
Drive_System object
Tire_Size object
SalePrice float64
dtype: object
After I changed ‘saleElapsed’ to int64, I was about to move past TabularPandas without the error.
df_nn_final.dtypes
YearMade int64
ProductSize category
Coupler_System object
fiProductClassDesc object
Hydraulics_Flow object
ModelID int64
saleElapsed int64
fiSecondaryDesc object
fiModelDesc object
Enclosure object
Hydraulics object
ProductGroup object
fiModelDescriptor object
Drive_System object
Tire_Size object
SalePrice float64
dtype: object
The rest of the neural networks section ran to conclusion and gave a r_mse of 0.226128
preds,targs = learn.get_preds()
r_mse(preds,targs)
0.226128
Not sure if this is the correct answer to this problem, but it gives a better result than removing cont_nn, which gives a r_mse of 0.270476
preds,targs = learn.get_preds()
r_mse(preds,targs)
0.270476
Can someone more experienced weigh in on this? Perhaps @muellerzr?
Thanks,
Jeff
That does indeed make perfect sense. Great debugging @jeffchen72! Everything works by integrating well with pandas (hence TP), and if it’s not a numerical datatypes then it will break on normalize (which we could expect)
Thanks for confirming this, @muellerzr. Maybe I can propose my fix to 09_tabular as my first PR.
That would be a fantastic idea
Great job @jeffchen72! I was certainly not suggesting that that we remove cont_nn , simply that it was related to the error.
As a side note, I got r_mse of 0.224892 by dropping fitModelDesc (which has 5K+ cardinality), instead of fiModelDescriptor
This may seem like an obvious question but how did you convert to int64. I have been trying to use .astype(int) both on the column and with a for loop looping through each value in the column but it isn’t working. Is there another method I’m missing?
train[‘saleElapsed’] = train[‘saleElapsed’].astype(‘int’)
One remark here, if anyone else has got this error-message:
This is my solution:
I had to add copy()
when creating the df_nn_final
Dataframe:
I have changed this line:
df_nn_final = df_nn[list(xs_final_time.columns) + [dep_var]]
to:
df_nn_final = (df_nn[list(xs_final_time.columns) + [dep_var]]).copy()
I have this solution on: https://stackoverflow.com/questions/49728421/pandas-dataframe-settingwithcopywarning-a-value-is-trying-to-be-set-on-a-copy
Changing to int, like below did not work for me,
df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)
but changing to float did!
df_nn_final = df_nn_final.astype({"saleElapsed": float})
Thx! Worked for me as well.
- In your first one you are using dn_nn instead of df_nn_final. But that still gives an error
- In your second one you can replace float by int and it will work.
-
It was a mistake during commenting! It was supposed to be df_nn_final. Yes, that line still throws error but using
copy()
as per @ulat comment seems to work. I didn’t test though. -
Yes, it works for both ‘int’ and ‘float’!
Did this change ever get merged in?
thanks, I tried it and it works for me