Fastai library showing some error for structured data, course:1(2018) lesson:4


(nazim) #1

Edit:- I am working on Kaggle Kernels which I think are using an outdated version of the fastai library, as the solution proposed by @sam2 should work in the current version.

I have a feeling I am using the wrong function as my target cariable is binary but I am not sure can anyone help?

The command: md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=categorical_columns, bs=128, test_df=df_test)

Where: y=array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., …, 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], dtype=float32)

Is giving Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=categorical_columns, bs=128, is_reg=True, test_df=df_test)

/opt/conda/lib/python3.6/site-packages/fastai-0.6-py3.6.egg/fastai/column_data.py in from_data_frame(cls, path, val_idxs, df, y, cat_flds, bs, is_reg, test_df)
     68     def from_data_frame(cls, path, val_idxs, df, y, cat_flds, bs, is_reg=True, test_df=None):
     69         ((val_df, trn_df), (val_y, trn_y)) = split_by_idx(val_idxs, df, y)
---> 70         return cls.from_data_frames(path, trn_df, val_df, trn_y, val_y, cat_flds, bs, is_reg, test_df=test_df)
     71 
     72     def get_learner(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops,

/opt/conda/lib/python3.6/site-packages/fastai-0.6-py3.6.egg/fastai/column_data.py in from_data_frames(cls, path, trn_df, val_df, trn_y, val_y, cat_flds, bs, is_reg, test_df)
     61     @classmethod
     62     def from_data_frames(cls, path, trn_df, val_df, trn_y, val_y, cat_flds, bs, is_reg, test_df=None):
---> 63         test_ds = ColumnarDataset.from_data_frame(test_df, cat_flds, is_reg) if test_df is not None else None
     64         return cls(path, ColumnarDataset.from_data_frame(trn_df, cat_flds, trn_y, is_reg),
     65                     ColumnarDataset.from_data_frame(val_df, cat_flds, val_y, is_reg), bs, test_ds=test_ds)

/opt/conda/lib/python3.6/site-packages/fastai-0.6-py3.6.egg/fastai/column_data.py in from_data_frame(cls, df, cat_flds, y, is_reg)
     43     @classmethod
     44     def from_data_frame(cls, df, cat_flds, y=None, is_reg=True):
---> 45         return cls.from_data_frames(df[cat_flds], df.drop(cat_flds, axis=1), y, is_reg)
     46 
     47 

/opt/conda/lib/python3.6/site-packages/fastai-0.6-py3.6.egg/fastai/column_data.py in from_data_frames(cls, df_cat, df_cont, y, is_reg)
     39         cat_cols = [c.values for n,c in df_cat.items()]
     40         cont_cols = [c.values for n,c in df_cont.items()]
---> 41         return cls(cat_cols, cont_cols, y, is_reg)
     42 
     43     @classmethod

/opt/conda/lib/python3.6/site-packages/fastai-0.6-py3.6.egg/fastai/column_data.py in __init__(self, cats, conts, y, is_reg)
     27         self.y = np.zeros((n,1)) if y is None else y
     28         if is_reg:
---> 29             self.y =  self.y[:,None]
     30         self.is_reg = is_reg
     31 

TypeError: 'bool' object is not subscriptable

(Sam) #2

@uleoimlg,
If your y is 0 or 1, the model should be a binary classification model. hence expressly specify is_reg=False and is_multi=False in line:

md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=categorical_columns, bs=128, is_reg=False, is_multi=False, test_df=df_test)

since in your code is_reg was not specified, the default value is_reg=True has come in.

try it out


(nazim) #3

@sam2
There is no ‘is_multi = False’ as you can see below. I tried out is_reg= False but got the same error.
I am going through the library to see if there is any way to let the function know if the target variable is binary and will post it here as soon as I find it, but do let me know if you find it first :wink:


TypeError                                 Traceback (most recent call last)
<ipython-input-103-0bbbfcbf6246> in <module>()
----> 1 md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y, cat_flds=categorical_columns, bs=128, is_reg=False,is_multi=False, test_df=df_test)

TypeError: from_data_frame() got an unexpected keyword argument 'is_multi'

(Sam) #4

@uleoimlg,
is_multi is relevant once you determine that the problem is classification and not regression.
You specify that it is classification by specifying is_reg=False.
Now you must specify that it is not a multi-class classification (but is binary) by specifying is_multi=False.

Check the code may be is_multi=False the default


(nazim) #5

@sam2 Thank you for your response. It appears that Kaggle as well as pip are using an older version of fastai library that does have ‘is_ref’ but no ‘is_multi’. So I think I’ll probably try running it on my local machine and see if can get any positive results. Will let you know by today if successful.