09_tabular: Value Error: Unable to coerce to Series, length must be 1: given 0

jeffchen72 · October 17, 2020, 7:14am

Hello,

I am running into the following error while running 09_tabular with the latest version of the code and fastai 2.0.15 and encountered the following error with the TabularPandas call.

In [100]:

procs_nn = [Categorify, FillMissing, Normalize]
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
splits=splits, y_names=dep_var)

Here’s the trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
1 procs_nn = [Categorify, FillMissing, Normalize]
2 to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
----> 3 splits=splits, y_names=dep_var)

~/miniconda3/lib/python3.7/site-packages/fastai/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, y_block, splits, do_setup, device, inplace, reduce_memory)
    164         self.cat_names,self.cont_names,self.procs = L(cat_names),L(cont_names),Pipeline(procs)
    165         self.split = len(df) if splits is None else len(splits[0])
--> 166         if do_setup: self.setup()
    167 
    168     def new(self, df):

~/miniconda3/lib/python3.7/site-packages/fastai/tabular/core.py in setup(self)
    175     def decode_row(self, row): return self.new(pd.DataFrame(row).T).decode().items.iloc[0]
    176     def show(self, max_n=10, **kwargs): display_df(self.new(self.all_cols[:max_n]).decode().items)
--> 177     def setup(self): self.procs.setup(self)
    178     def process(self): self.procs(self)
    179     def loc(self): return self.items.loc

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in setup(self, items, train_setup)
    190         tfms = self.fs[:]
    191         self.fs.clear()
--> 192         for t in tfms: self.add(t,items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in add(self, t, items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):
--> 195         t.setup(items, train_setup)
    196         self.fs.append(t)
    197 

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in setup(self, items, train_setup)
     77     def setup(self, items=None, train_setup=False):
     78         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 79         return self.setups(getattr(items, 'train', items) if train_setup else items)
     80 
     81     def _call(self, fn, x, split_idx=None, **kwargs):

~/miniconda3/lib/python3.7/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    110         if not f: return args[0]
    111         if self.inst is not None: f = MethodType(f, self.inst)
--> 112         return f(*args, **kwargs)
    113 
    114     def __get__(self, inst, owner):

~/miniconda3/lib/python3.7/site-packages/fastai/tabular/core.py in setups(self, to)
    271     store_attr(means=dict(getattr(to, 'train', to).conts.mean()),
    272                stds=dict(getattr(to, 'train', to).conts.std(ddof=0)+1e-7))
--> 273     return self(to)
    274 
    275 @Normalize

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in __call__(self, x, **kwargs)
     71     @property
     72     def name(self): return getattr(self, '_name', _get_name(self))
---> 73     def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
     74     def decode  (self, x, **kwargs): return self._call('decodes', x, **kwargs)
     75     def __repr__(self): return f'{self.name}:\nencodes: {self.encodes}decodes: {self.decodes}'

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in _call(self, fn, x, split_idx, **kwargs)
     81     def _call(self, fn, x, split_idx=None, **kwargs):
     82         if split_idx!=self.split_idx and self.split_idx is not None: return x
---> 83         return self._do_call(getattr(self, fn), x, **kwargs)
     84 
     85     def _do_call(self, f, x, **kwargs):

~/miniconda3/lib/python3.7/site-packages/fastcore/transform.py in _do_call(self, f, x, **kwargs)
     87             if f is None: return x
     88             ret = f.returns_none(x) if hasattr(f,'returns_none') else None
---> 89             return retain_type(f(x, **kwargs), x, ret)
     90         res = tuple(self._do_call(f, x_, **kwargs) for x_ in x)
     91         return retain_type(res, x)

~/miniconda3/lib/python3.7/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    110         if not f: return args[0]
    111         if self.inst is not None: f = MethodType(f, self.inst)
--> 112         return f(*args, **kwargs)
    113 
    114     def __get__(self, inst, owner):

~/miniconda3/lib/python3.7/site-packages/fastai/tabular/core.py in encodes(self, to)
    275 @Normalize
    276 def encodes(self, to:Tabular):
--> 277     to.conts = (to.conts-self.means) / self.stds
    278     return to
    279 

~/miniconda3/lib/python3.7/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level, fill_value)
    645         # TODO: why are we passing flex=True instead of flex=not special?
    646         #  15 tests fail if we pass flex=not special instead
--> 647         self, other = _align_method_FRAME(self, other, axis, flex=True, level=level)
    648 
    649         if isinstance(other, ABCDataFrame):

~/miniconda3/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _align_method_FRAME(left, right, axis, flex, level)
    501     elif is_list_like(right) and not isinstance(right, (ABCSeries, ABCDataFrame)):
    502         # GH17901
--> 503         right = to_series(right)
    504 
    505     if flex is not None and isinstance(right, ABCDataFrame):

~/miniconda3/lib/python3.7/site-packages/pandas/core/ops/__init__.py in to_series(right)
    464             if len(left.columns) != len(right):
    465                 raise ValueError(
--> 466                     msg.format(req_len=len(left.columns), given_len=len(right))
    467                 )
    468             right = left._constructor_sliced(right, index=left.columns)

ValueError: Unable to coerce to Series, length must be 1: given 0

is this a bug?

Thanks,
Jeff

kozer · October 17, 2020, 8:14am

Same issue here!
Removing Normalize from procs_nn array, seems to “fix” the issue.

jeffchen72 · October 17, 2020, 7:39pm

The good thing is that it doesn’t look like user error on our part. The trouble is we need normalization for a neural network.

muellerzr · October 17, 2020, 7:43pm

I’ve been playing with tabular a while and never faced this issue before, ever. I can try and run the fastbook version and see what’s up but I wouldn’t rule out user error to some degree (or book error)

porich · October 17, 2020, 8:53pm

Encountering the same error. It works if you leave out the cont_nn variable. Will continue digging into this…

jeffchen72 · October 18, 2020, 6:25am

I don’t know if this is the best answer, but don’t think it is right to remove the Normalize processor from procs_nn or to remove cont_nn from the Tabular Pandas call. We need the ‘saleElapsed’ continuous variable and we need to normalize it.

I did notice that

df_nn_final.dtypes

YearMade                 int64
ProductSize           category
Coupler_System          object
fiProductClassDesc      object
Hydraulics_Flow         object
ModelID                  int64
saleElapsed             object
fiSecondaryDesc         object
fiModelDesc             object
Enclosure               object
Hydraulics              object
ProductGroup            object
fiModelDescriptor       object
Drive_System            object
Tire_Size               object
SalePrice              float64
dtype: object

After I changed ‘saleElapsed’ to int64, I was about to move past TabularPandas without the error.

df_nn_final.dtypes

YearMade                 int64
ProductSize           category
Coupler_System          object
fiProductClassDesc      object
Hydraulics_Flow         object
ModelID                  int64
saleElapsed              int64
fiSecondaryDesc         object
fiModelDesc             object
Enclosure               object
Hydraulics              object
ProductGroup            object
fiModelDescriptor       object
Drive_System            object
Tire_Size               object
SalePrice              float64
dtype: object

The rest of the neural networks section ran to conclusion and gave a r_mse of 0.226128

preds,targs = learn.get_preds()
r_mse(preds,targs)
0.226128

Not sure if this is the correct answer to this problem, but it gives a better result than removing cont_nn, which gives a r_mse of 0.270476

preds,targs = learn.get_preds()
r_mse(preds,targs)
0.270476

Can someone more experienced weigh in on this? Perhaps @muellerzr?

Thanks,
Jeff

muellerzr · October 18, 2020, 9:54am

That does indeed make perfect sense. Great debugging @jeffchen72! Everything works by integrating well with pandas (hence TP), and if it’s not a numerical datatypes then it will break on normalize (which we could expect)

jeffchen72 · October 18, 2020, 3:52pm

Thanks for confirming this, @muellerzr. Maybe I can propose my fix to 09_tabular as my first PR.

muellerzr · October 18, 2020, 3:53pm

That would be a fantastic idea

porich · October 19, 2020, 10:58am

Great job @jeffchen72! I was certainly not suggesting that that we remove cont_nn , simply that it was related to the error.

As a side note, I got r_mse of 0.224892 by dropping fitModelDesc (which has 5K+ cardinality), instead of fiModelDescriptor

jeffchen72 · October 20, 2020, 4:26am

Thanks @porich for sharing your results. This is a great notebook. I learned a lot from it.

Jeff

jimmiemunyi · November 23, 2020, 8:40am

This actually works, thanks @jeffchen72

pauls97 · November 26, 2020, 4:16am

This may seem like an obvious question but how did you convert to int64. I have been trying to use .astype(int) both on the column and with a for loop looping through each value in the column but it isn’t working. Is there another method I’m missing?

riteshpaul · November 29, 2020, 2:45pm

train[‘saleElapsed’] = train[‘saleElapsed’].astype(‘int’)

ulat · December 1, 2020, 1:51pm

One remark here, if anyone else has got this error-message:

This is my solution:
I had to add copy() when creating the df_nn_final Dataframe:

I have changed this line:

df_nn_final = df_nn[list(xs_final_time.columns) + [dep_var]]

to:

df_nn_final = (df_nn[list(xs_final_time.columns) + [dep_var]]).copy()

I have this solution on: https://stackoverflow.com/questions/49728421/pandas-dataframe-settingwithcopywarning-a-value-is-trying-to-be-set-on-a-copy

manju-dev · December 4, 2020, 10:13pm

Changing to int, like below did not work for me,
df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)

but changing to float did!
df_nn_final = df_nn_final.astype({"saleElapsed": float})

rlampron · December 7, 2020, 10:55pm

Thx! Worked for me as well.

robocup · December 28, 2020, 10:57pm

In your first one you are using dn_nn instead of df_nn_final. But that still gives an error
In your second one you can replace float by int and it will work.

manju-dev · December 30, 2020, 7:29am

It was a mistake during commenting! It was supposed to be df_nn_final. Yes, that line still throws error but using copy() as per @ulat comment seems to work. I didn’t test though.
Yes, it works for both ‘int’ and ‘float’!

mattmoehr · December 30, 2020, 6:07pm

Did this change ever get merged in?