Error while using tabular in v2

I am trying fastai2 with tabular data. I am trying to follow closely examples from the notebooks but with my data. It was working with v1 but using the exact same data I am getting errors in v2.

Here is my code:

group = df.groupby(['StoreID', 'QuestionID'], observed=True)
# Take the last question from the group for groups larger than 1
valid_indexes = group.tail(1)[(group.size() > 1).values].index

splits = IndexSplitter(valid_indexes.tolist())(range_of(df))
to = TabularPandas(df, procs, cat_names, cont_names, y_names=dep_var, splits=splits)
dls = to.dataloaders(bs=64)

But then I get the following error on the last line:

Could not do one pass in your dataloader, there is something wrong in it

Any easy way to investigate what is wrong with the dataloader?

Thanks!

Call dls.one_batch() (or show_batch())

This is just a warning message but it still builds the DataLoader (and can help us narrow it down more)

5 Likes

Thank you for the quick response. I get the following exception when calling show_batch()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-90634fcc3c9e> in <module>
----> 1 dls.show_batch()

c:\work\ml\fastai2\fastai2\data\core.py in show_batch(self, b, max_n, ctxs, show, **kwargs)
     88 
     89     def show_batch(self, b=None, max_n=9, ctxs=None, show=True, **kwargs):
---> 90         if b is None: b = self.one_batch()
     91         if not show: return self._pre_show_batch(b, max_n=max_n)
     92         show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)

c:\work\ml\fastai2\fastai2\data\load.py in one_batch(self)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
--> 130         with self.fake_l.no_multiproc(): res = first(self)
    131         if hasattr(self, 'it'): delattr(self, 'it')
    132         return res

~\Anaconda3\envs\fastai2\lib\site-packages\fastcore\utils.py in first(x)
    174 def first(x):
    175     "First element of `x`, or None if missing"
--> 176     try: return next(iter(x))
    177     except StopIteration: return None
    178 

c:\work\ml\fastai2\fastai2\data\load.py in __iter__(self)
     95         self.randomize()
     96         self.before_iter()
---> 97         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     98             if self.device is not None: b = to_device(b, self.device)
     99             yield self.after_batch(b)

~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
    383     def _next_data(self):
    384         index = self._next_index()  # may raise StopIteration
--> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    386         if self._pin_memory:
    387             data = _utils.pin_memory.pin_memory(data)

~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\_utils\fetch.py in fetch(self, possibly_batched_index)
     32                 raise StopIteration
     33         else:
---> 34             data = next(self.dataset_iter)
     35         return self.collate_fn(data)
     36 

c:\work\ml\fastai2\fastai2\data\load.py in create_batches(self, samps)
    104         self.it = iter(self.dataset) if self.dataset is not None else None
    105         res = filter(lambda o:o is not None, map(self.do_item, samps))
--> 106         yield from map(self.do_batch, self.chunkify(res))
    107 
    108     def new(self, dataset=None, cls=None, **kwargs):

c:\work\ml\fastai2\fastai2\data\load.py in do_batch(self, b)
    125     def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
    126     def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
--> 127     def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')

c:\work\ml\fastai2\fastai2\tabular\core.py in create_batch(self, b)
    303         super().__init__(dataset, bs=bs, shuffle=shuffle, after_batch=after_batch, num_workers=num_workers, **kwargs)
    304 
--> 305     def create_batch(self, b): return self.dataset.iloc[b]
    306 
    307 TabularPandas._dl_type = TabDataLoader

c:\work\ml\fastai2\fastai2\tabular\core.py in __getitem__(self, idxs)
     94         df = self.to.items
     95         if isinstance(idxs,tuple):
---> 96             rows,cols = idxs
     97             cols = df.columns.isin(cols) if is_listy(cols) else df.columns.get_loc(cols)
     98         else: rows,cols = idxs,slice(None)

ValueError: too many values to unpack (expected 2)

Are you using the released (pip) version of fastai2? Or the dev version? I ask as I had this issue earlier and it was due to a dependency mismatch

Yeah using the latest code from the repo and installed it using:

pip install -e ".[dev]"

I just did a git pull.

That would do it :wink: (see the FAQ, I bring this up)

You need the dev version of fastcore along with it

3 Likes

This did the trick! I will make sure to read the FAQ carefully!

Thanks!

2 Likes

I recently (7-21-21) retired last year’s fastai environment. In my win10 environment I updated Anaconda and used conda to git fastai and create the new environment. It seems to be a clean install.

While running tabular I got the error:
‘Could not do one pass in your dataloader, there is something wrong in it.’

so I’m here. I checked and I have fastcore 1.3.20.

fastcore and nbdev were mentioned in issues/3159 but I am not sure they are connected. Should I be looking for nbdev? If not, any ideas for troubleshooting?

I appreciate your help!

When you get issues like that, immediately call dls.one_batch() (or whatever you’ve named your DataLoader object), and we can see the full stack trace of what’s up :slight_smile:

‘’’
OK here it is:


TypeError Traceback (most recent call last)
in
----> 1 dls.show_batch()

~\fastai\NACC NBs\fastai\data\core.py in show_batch(self, b, max_n, ctxs, show, unique, **kwargs)
98 old_get_idxs = self.get_idxs
99 self.get_idxs = lambda: Inf.zeros
→ 100 if b is None: b = self.one_batch()
101 if not show: return self._pre_show_batch(b, max_n=max_n)
102 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)

~\fastai\NACC NBs\fastai\data\load.py in one_batch(self)
146 def one_batch(self):
147 if self.n is not None and len(self)==0: raise ValueError(f’This DataLoader does not contain any batches’)
→ 148 with self.fake_l.no_multiproc(): res = first(self)
149 if hasattr(self, ‘it’): delattr(self, ‘it’)
150 return res

~\anaconda3\envs\fastai\lib\site-packages\fastcore\basics.py in first(x, f, negate, **kwargs)
545 x = iter(x)
546 if f: x = filter_ex(x, f=f, negate=negate, gen=True, **kwargs)
→ 547 return next(x, None)
548
549 # Cell

~\fastai\NACC NBs\fastai\data\load.py in iter(self)
109 for b in _loadersself.fake_l.num_workers==0:
110 if self.device is not None: b = to_device(b, self.device)
→ 111 yield self.after_batch(b)
112 self.after_iter()
113 if hasattr(self, ‘it’): del(self.it)

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in call(self, o)
198 self.fs = self.fs.sorted(key=‘order’)
199
→ 200 def call(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
201 def repr(self): return f"Pipeline: {’ → '.join([f.name for f in self.fs if f.name != ‘noop’])}"
202 def getitem(self,i): return self.fs[i]

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in compose_tfms(x, tfms, is_enc, reverse, **kwargs)
148 for f in tfms:
149 if not is_enc: f = f.decode
→ 150 x = f(x, **kwargs)
151 return x
152

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in call(self, x, **kwargs)
111 “A transform that always take tuples as items”
112 _retain = True
→ 113 def call(self, x, **kwargs): return self._call1(x, ‘call’, **kwargs)
114 def decode(self, x, **kwargs): return self._call1(x, ‘decode’, **kwargs)
115 def _call1(self, x, name, **kwargs):

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in _call1(self, x, name, **kwargs)
114 def decode(self, x, **kwargs): return self._call1(x, ‘decode’, **kwargs)
115 def _call1(self, x, name, **kwargs):
→ 116 if not _is_tuple(x): return getattr(super(), name)(x, **kwargs)
117 y = getattr(super(), name)(list(x), **kwargs)
118 if not self._retain: return y

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in call(self, x, **kwargs)
71 @property
72 def name(self): return getattr(self, ‘_name’, _get_name(self))
—> 73 def call(self, x, **kwargs): return self._call(‘encodes’, x, **kwargs)
74 def decode (self, x, **kwargs): return self._call(‘decodes’, x, **kwargs)
75 def repr(self): return f’{self.name}:\nencodes: {self.encodes}decodes: {self.decodes}’

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in _call(self, fn, x, split_idx, **kwargs)
81 def _call(self, fn, x, split_idx=None, **kwargs):
82 if split_idx!=self.split_idx and self.split_idx is not None: return x
—> 83 return self._do_call(getattr(self, fn), x, **kwargs)
84
85 def _do_call(self, f, x, **kwargs):

~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in do_call(self, f, x, **kwargs)
87 if f is None: return x
88 ret = f.returns(x) if hasattr(f,‘returns’) else None
—> 89 return retain_type(f(x, **kwargs), x, ret)
90 res = tuple(self.do_call(f, x, **kwargs) for x
in x)
91 return retain_type(res, x)

~\anaconda3\envs\fastai\lib\site-packages\fastcore\dispatch.py in call(self, *args, **kwargs)
116 elif self.inst is not None: f = MethodType(f, self.inst)
117 elif self.owner is not None: f = MethodType(f, self.owner)
→ 118 return f(*args, **kwargs)
119
120 def get(self, inst, owner):

~\fastai\NACC NBs\fastai\tabular\core.py in encodes(self, to)
325 else: res = (tensor(to.cats).long(),tensor(to.conts).float())
326 ys = [n for n in to.y_names if n in to.items.columns]
→ 327 if len(ys) == len(to.y_names): res = res + (tensor(to.targ),)
328 if to.device is not None: res = to_device(res, to.device)
329 return res

~\fastai\NACC NBs\fastai\torch_core.py in tensor(x, *rest, **kwargs)
131 else torch.tensor(x, **kwargs) if isinstance(x, (tuple,list))
132 else _array2tensor(x) if isinstance(x, ndarray)
→ 133 else as_tensor(x.values, **kwargs) if isinstance(x, (pd.Series, pd.DataFrame))
134 else as_tensor(x, **kwargs) if hasattr(x, ‘array’) or is_iter(x)
135 else _array2tensor(array(x), **kwargs))

TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
‘’’

Thank you again for your help.

1 Like

Looks to be an issue with your targets, how are you setting up your TabularDataLoaders (or TabularPandas)?

Also if you wrap the code in three backticks (`), it will format nice (like you would with markdown):

this is some python code

‘’’
I used the Rossmann example notebook, starting at the section named Preparing full data set but my data was like last year’s Adult example. Pretty much ready to use.
I separated the data into a train, test and validate.
I separated the continuous and catagorical, and picked my dependent variable.

I hope I don’t screw this up again, but here is the code.

path = Config().data/'NACC'
train_df = pd.read_csv(path/'21V1_training')
test_df = pd.read_csv(path/'21V1_test')
len(train_df),len(test_df)
###Output: (117338, 40681)

procs=[FillMissing, Categorify, Normalize]
cat_names = ['NACCREAS', 'NACCREFR', ==appreviated== 'ARTH', 'ARTYPE',]

cont_names = ['VISITYR', 'BIRTHMO', 'BIRTHYR', ==appreviated== 'MINTPCNC']

dep_var = 'DECAGE'
df = train_df[cat_names + cont_names + [dep_var,'DECAGE']].copy()

test_df['DECAGE'].min(), test_df['DECAGE'].max()
###Output: (15.0, 999.0)

cut = train_df['DECAGE'][(train_df['DECAGE'] == train_df['DECAGE'][len(test_df)])].index.max()
cut
###Output: 78785

splits = (list(range(cut, len(train_df))),list(range(cut)))

train_df[dep_var].head()

train_df[dep_var] = np.log(train_df[dep_var])

##This gives a TypeError 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
AttributeError: 'str' object has no attribute 'log'

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-49-cbf844be5123> in <module>
----> 1 train_df[dep_var] = np.log(train_df[dep_var])
      2 #train_df = train_df.iloc[:100000]

~\anaconda3\envs\fastai\lib\site-packages\pandas\core\generic.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
   1934         self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
   1935     ):
-> 1936         return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
   1937 
   1938     # ideally we would define this to avoid the getattr checks, but

~\anaconda3\envs\fastai\lib\site-packages\pandas\core\arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
    356         # ufunc(series, ...)
    357         inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
--> 358         result = getattr(ufunc, method)(*inputs, **kwargs)
    359     else:
    360         # ufunc(dataframe)

TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

###The next runs ok
splits = (list(range(cut, len(train_df))),list(range(cut)))
%time to = TabularPandas(train_df, procs, cat_names, cont_names, dep_var, y_block=TransformBlock(), splits=splits)

%time to = TabularPandas(train_df, procs, cat_names, cont_names, dep_var, y_block=TransformBlock(), splits=splits)
###Wall time: 40 s
dls = to.dataloaders(bs=512, path=path)
dls.show_batch()
###This is when I got the obscure error I started with.
###This is also where I got the trace in my last note. 

Thanks again.

What does train_df[dep_var].dtype give you

‘’’
0 888.0
1 888.0
2 888.0
3 888.0
4 888.0

40676 888.0
40677 71.0
40678 888.0
40679 888.0
40680 77.0
Name: DECAGE, Length: 40681, dtype: float64
‘’’

Having looked at the data, again, about 42% of the DECAGE fields are 888.0, which means info is not-yet available. The other 58% range between 15 and 105. Thx!

When I run train_df[dev_var].dtype it gives me dtype: float64.

Sorry to grind on about this, but I recreated the problem using last year’s Adult notebook. It’s 11 lines. This time, my data type is int64 instead of float64, but it appears either should work.
‘’’

  1. from fastai.tabular.all import *
  2. import pandas as pd
  3. path = %pwd
  4. df = pd.read_csv(‘v1.csv’)
  5. procs=[FillMissing, Categorify, Normalize]
  6. dep_var = ‘DECAGE’
  7. cat_names = [‘NACCREAS’, ‘NACCREFR’, ==abbreviated== ‘ARTH’, ‘ARTYPE’,]
  8. cont_names = [‘VISITYR’, ‘BIRTHMO’, ‘BIRTHYR’, ==abbreviated== ‘MINTPCNC’]
  9. tdl = TabularDataLoaders.from_df(df.iloc[1000:1800].copy(), path=path)
    ==>error: Could not do one pass in your dataloader, there is something wrong in it
  10. tdl.one_batch()
    ==>error: TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
  11. df[dep_var].dtype
    ==> dtype(‘int64’)
    ‘’’
1 Like

I think this is a problem with NOT one-hot-encoded, multi-label classification and Tabular in fastai. It just doesn’t work. I think its a small bug somewhere but I can’t figure it out. It doesn’t work like the docs claim it should work: Tabular core | fastai

Guatam-e, thanks. I agree that there is a flaw/bug somewhere. I had no problem whatsoever setting up and running the tabular model with a previous version (spring 2019) of fastai. I have to suspect Jeremey’s new fastai re-write. I need to see what the variables are at line 325 in tabular\core.py.

The only way forward I can see is to activate fastai in Anaconda and use Viaual Studio to run and debug.

Cataract treatment will keep me from being able to report any progress for a month or two.

I think you are supposed to drop/delete the target from the data you are predictinfg

test_df = df_nn_final.copy()
test_df.drop([‘Radiation’], axis=1, inplace=True)

row, clas, probs = learn.predict(test_df.iloc[0])