I am trying fastai2 with tabular data. I am trying to follow closely examples from the notebooks but with my data. It was working with v1 but using the exact same data I am getting errors in v2.
Here is my code:
group = df.groupby(['StoreID', 'QuestionID'], observed=True)
# Take the last question from the group for groups larger than 1
valid_indexes = group.tail(1)[(group.size() > 1).values].index
splits = IndexSplitter(valid_indexes.tolist())(range_of(df))
to = TabularPandas(df, procs, cat_names, cont_names, y_names=dep_var, splits=splits)
dls = to.dataloaders(bs=64)
But then I get the following error on the last line:
Could not do one pass in your dataloader, there is something wrong in it
Any easy way to investigate what is wrong with the dataloader?
Thank you for the quick response. I get the following exception when calling show_batch()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-90634fcc3c9e> in <module>
----> 1 dls.show_batch()
c:\work\ml\fastai2\fastai2\data\core.py in show_batch(self, b, max_n, ctxs, show, **kwargs)
88
89 def show_batch(self, b=None, max_n=9, ctxs=None, show=True, **kwargs):
---> 90 if b is None: b = self.one_batch()
91 if not show: return self._pre_show_batch(b, max_n=max_n)
92 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)
c:\work\ml\fastai2\fastai2\data\load.py in one_batch(self)
128 def one_batch(self):
129 if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
--> 130 with self.fake_l.no_multiproc(): res = first(self)
131 if hasattr(self, 'it'): delattr(self, 'it')
132 return res
~\Anaconda3\envs\fastai2\lib\site-packages\fastcore\utils.py in first(x)
174 def first(x):
175 "First element of `x`, or None if missing"
--> 176 try: return next(iter(x))
177 except StopIteration: return None
178
c:\work\ml\fastai2\fastai2\data\load.py in __iter__(self)
95 self.randomize()
96 self.before_iter()
---> 97 for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
98 if self.device is not None: b = to_device(b, self.device)
99 yield self.after_batch(b)
~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
343
344 def __next__(self):
--> 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \
~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
383 def _next_data(self):
384 index = self._next_index() # may raise StopIteration
--> 385 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
386 if self._pin_memory:
387 data = _utils.pin_memory.pin_memory(data)
~\Anaconda3\envs\fastai2\lib\site-packages\torch\utils\data\_utils\fetch.py in fetch(self, possibly_batched_index)
32 raise StopIteration
33 else:
---> 34 data = next(self.dataset_iter)
35 return self.collate_fn(data)
36
c:\work\ml\fastai2\fastai2\data\load.py in create_batches(self, samps)
104 self.it = iter(self.dataset) if self.dataset is not None else None
105 res = filter(lambda o:o is not None, map(self.do_item, samps))
--> 106 yield from map(self.do_batch, self.chunkify(res))
107
108 def new(self, dataset=None, cls=None, **kwargs):
c:\work\ml\fastai2\fastai2\data\load.py in do_batch(self, b)
125 def create_item(self, s): return next(self.it) if s is None else self.dataset[s]
126 def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
--> 127 def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
128 def one_batch(self):
129 if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
c:\work\ml\fastai2\fastai2\tabular\core.py in create_batch(self, b)
303 super().__init__(dataset, bs=bs, shuffle=shuffle, after_batch=after_batch, num_workers=num_workers, **kwargs)
304
--> 305 def create_batch(self, b): return self.dataset.iloc[b]
306
307 TabularPandas._dl_type = TabDataLoader
c:\work\ml\fastai2\fastai2\tabular\core.py in __getitem__(self, idxs)
94 df = self.to.items
95 if isinstance(idxs,tuple):
---> 96 rows,cols = idxs
97 cols = df.columns.isin(cols) if is_listy(cols) else df.columns.get_loc(cols)
98 else: rows,cols = idxs,slice(None)
ValueError: too many values to unpack (expected 2)
I recently (7-21-21) retired last year’s fastai environment. In my win10 environment I updated Anaconda and used conda to git fastai and create the new environment. It seems to be a clean install.
While running tabular I got the error:
‘Could not do one pass in your dataloader, there is something wrong in it.’
so I’m here. I checked and I have fastcore 1.3.20.
fastcore and nbdev were mentioned in issues/3159 but I am not sure they are connected. Should I be looking for nbdev? If not, any ideas for troubleshooting?
When you get issues like that, immediately call dls.one_batch() (or whatever you’ve named your DataLoader object), and we can see the full stack trace of what’s up
TypeError Traceback (most recent call last)
in
----> 1 dls.show_batch()
~\fastai\NACC NBs\fastai\data\core.py in show_batch(self, b, max_n, ctxs, show, unique, **kwargs)
98 old_get_idxs = self.get_idxs
99 self.get_idxs = lambda: Inf.zeros
→ 100 if b is None: b = self.one_batch()
101 if not show: return self._pre_show_batch(b, max_n=max_n)
102 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)
~\fastai\NACC NBs\fastai\data\load.py in one_batch(self)
146 def one_batch(self):
147 if self.n is not None and len(self)==0: raise ValueError(f’This DataLoader does not contain any batches’)
→ 148 with self.fake_l.no_multiproc(): res = first(self)
149 if hasattr(self, ‘it’): delattr(self, ‘it’)
150 return res
~\anaconda3\envs\fastai\lib\site-packages\fastcore\basics.py in first(x, f, negate, **kwargs)
545 x = iter(x)
546 if f: x = filter_ex(x, f=f, negate=negate, gen=True, **kwargs)
→ 547 return next(x, None)
548
549 # Cell
~\fastai\NACC NBs\fastai\data\load.py in iter(self)
109 for b in _loadersself.fake_l.num_workers==0:
110 if self.device is not None: b = to_device(b, self.device)
→ 111 yield self.after_batch(b)
112 self.after_iter()
113 if hasattr(self, ‘it’): del(self.it)
~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in call(self, o)
198 self.fs = self.fs.sorted(key=‘order’)
199
→ 200 def call(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
201 def repr(self): return f"Pipeline: {’ → '.join([f.name for f in self.fs if f.name != ‘noop’])}"
202 def getitem(self,i): return self.fs[i]
~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in compose_tfms(x, tfms, is_enc, reverse, **kwargs)
148 for f in tfms:
149 if not is_enc: f = f.decode
→ 150 x = f(x, **kwargs)
151 return x
152
~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in _call(self, fn, x, split_idx, **kwargs)
81 def _call(self, fn, x, split_idx=None, **kwargs):
82 if split_idx!=self.split_idx and self.split_idx is not None: return x
—> 83 return self._do_call(getattr(self, fn), x, **kwargs)
84
85 def _do_call(self, f, x, **kwargs):
~\anaconda3\envs\fastai\lib\site-packages\fastcore\transform.py in do_call(self, f, x, **kwargs)
87 if f is None: return x
88 ret = f.returns(x) if hasattr(f,‘returns’) else None
—> 89 return retain_type(f(x, **kwargs), x, ret)
90 res = tuple(self.do_call(f, x, **kwargs) for x in x)
91 return retain_type(res, x)
~\anaconda3\envs\fastai\lib\site-packages\fastcore\dispatch.py in call(self, *args, **kwargs)
116 elif self.inst is not None: f = MethodType(f, self.inst)
117 elif self.owner is not None: f = MethodType(f, self.owner)
→ 118 return f(*args, **kwargs)
119
120 def get(self, inst, owner):
~\fastai\NACC NBs\fastai\tabular\core.py in encodes(self, to)
325 else: res = (tensor(to.cats).long(),tensor(to.conts).float())
326 ys = [n for n in to.y_names if n in to.items.columns]
→ 327 if len(ys) == len(to.y_names): res = res + (tensor(to.targ),)
328 if to.device is not None: res = to_device(res, to.device)
329 return res
~\fastai\NACC NBs\fastai\torch_core.py in tensor(x, *rest, **kwargs)
131 else torch.tensor(x, **kwargs) if isinstance(x, (tuple,list))
132 else _array2tensor(x) if isinstance(x, ndarray)
→ 133 else as_tensor(x.values, **kwargs) if isinstance(x, (pd.Series, pd.DataFrame))
134 else as_tensor(x, **kwargs) if hasattr(x, ‘array’) or is_iter(x)
135 else _array2tensor(array(x), **kwargs))
TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
‘’’
‘’’
I used the Rossmann example notebook, starting at the section named Preparing full data set but my data was like last year’s Adult example. Pretty much ready to use.
I separated the data into a train, test and validate.
I separated the continuous and catagorical, and picked my dependent variable.
I hope I don’t screw this up again, but here is the code.
path = Config().data/'NACC'
train_df = pd.read_csv(path/'21V1_training')
test_df = pd.read_csv(path/'21V1_test')
len(train_df),len(test_df)
###Output: (117338, 40681)
procs=[FillMissing, Categorify, Normalize]
cat_names = ['NACCREAS', 'NACCREFR', ==appreviated== 'ARTH', 'ARTYPE',]
cont_names = ['VISITYR', 'BIRTHMO', 'BIRTHYR', ==appreviated== 'MINTPCNC']
dep_var = 'DECAGE'
df = train_df[cat_names + cont_names + [dep_var,'DECAGE']].copy()
test_df['DECAGE'].min(), test_df['DECAGE'].max()
###Output: (15.0, 999.0)
cut = train_df['DECAGE'][(train_df['DECAGE'] == train_df['DECAGE'][len(test_df)])].index.max()
cut
###Output: 78785
splits = (list(range(cut, len(train_df))),list(range(cut)))
train_df[dep_var].head()
train_df[dep_var] = np.log(train_df[dep_var])
##This gives a TypeError
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
AttributeError: 'str' object has no attribute 'log'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-49-cbf844be5123> in <module>
----> 1 train_df[dep_var] = np.log(train_df[dep_var])
2 #train_df = train_df.iloc[:100000]
~\anaconda3\envs\fastai\lib\site-packages\pandas\core\generic.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
1934 self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
1935 ):
-> 1936 return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
1937
1938 # ideally we would define this to avoid the getattr checks, but
~\anaconda3\envs\fastai\lib\site-packages\pandas\core\arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
356 # ufunc(series, ...)
357 inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
--> 358 result = getattr(ufunc, method)(*inputs, **kwargs)
359 else:
360 # ufunc(dataframe)
TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method
###The next runs ok
splits = (list(range(cut, len(train_df))),list(range(cut)))
%time to = TabularPandas(train_df, procs, cat_names, cont_names, dep_var, y_block=TransformBlock(), splits=splits)
%time to = TabularPandas(train_df, procs, cat_names, cont_names, dep_var, y_block=TransformBlock(), splits=splits)
###Wall time: 40 s
dls = to.dataloaders(bs=512, path=path)
dls.show_batch()
###This is when I got the obscure error I started with.
###This is also where I got the trace in my last note.
Having looked at the data, again, about 42% of the DECAGE fields are 888.0, which means info is not-yet available. The other 58% range between 15 and 105. Thx!
Sorry to grind on about this, but I recreated the problem using last year’s Adult notebook. It’s 11 lines. This time, my data type is int64 instead of float64, but it appears either should work.
‘’’
tdl = TabularDataLoaders.from_df(df.iloc[1000:1800].copy(), path=path)
==>error: Could not do one pass in your dataloader, there is something wrong in it
tdl.one_batch()
==>error: TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
I think this is a problem with NOT one-hot-encoded, multi-label classification and Tabular in fastai. It just doesn’t work. I think its a small bug somewhere but I can’t figure it out. It doesn’t work like the docs claim it should work: Tabular core | fastai
Guatam-e, thanks. I agree that there is a flaw/bug somewhere. I had no problem whatsoever setting up and running the tabular model with a previous version (spring 2019) of fastai. I have to suspect Jeremey’s new fastai re-write. I need to see what the variables are at line 325 in tabular\core.py.
The only way forward I can see is to activate fastai in Anaconda and use Viaual Studio to run and debug.
Cataract treatment will keep me from being able to report any progress for a month or two.