Fastai v2 tabular

Adults works fine, but MSELossFlat returns RuntimeError: bool value of Tensor with more than one value is ambiguous. Did a quick %debug, but not immediately obvious to me why. It fails on if size_average and reduce: in torch/nn/_reduction.py. I’m not sure what ‘size_average’ refers to, but it’s a tensor with many different values, some of them negative, min value: -0.1936, max value 0.4885.

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py in __init__(self, size_average, reduce, reduction)
    426 
    427     def __init__(self, size_average=None, reduce=None, reduction='mean'):
--> 428         super(MSELoss, self).__init__(size_average, reduce, reduction)
    429 
    430     def forward(self, input, target):

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py in __init__(self, size_average, reduce, reduction)
     10         super(_Loss, self).__init__()
     11         if size_average is not None or reduce is not None:
---> 12             self.reduction = _Reduction.legacy_get_string(size_average, reduce)
     13         else:
     14             self.reduction = reduction

~/anaconda3/lib/python3.7/site-packages/torch/nn/_reduction.py in legacy_get_string(size_average, reduce, emit_warning)
     34         reduce = True
     35 
---> 36     if size_average and reduce:
     37         ret = 'mean'
     38     elif reduce:

RuntimeError: bool value of Tensor with more than one value is ambiguous

First, we’re sure that it’s MSELossFlat() when passing it in? If so the next step would be to manually calculate it with two of your y’s (one from a model standpoint and one from your ground truth)

Yeah, it’s definitely MSELossFlat.

~/git/fastai2/nbs/mine/fastai2/layers.py in MSELossFlat(axis, floatify, *args, **kwargs)
    313 def MSELossFlat(*args, axis=-1, floatify=True, **kwargs):
    314     "Same as `nn.MSELoss`, but flattens input and target."
--> 315     return BaseLoss(nn.MSELoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)
    316 
    317 # Cell

@travis a few things I’m noticing in your DataBunch creation. You’re not specifying a regression problem, so most likely it’s standardizing to a classification problem. To do so, in your call to TabularPandas you should add block_y = TransformBlock() (or RegressionBlock). Second, you’re also attempting to use accuracy, which is meant for classification problems. Try that and see if it solves your issue :slight_smile:

The notebook I’m looking at is the Rossmann notebook here:

I’m actually treating it as a classification problem, as either a win or a loss. The targets are all 0’s & 1’s. And by the way, this same technique worked fine last year in Fastai v1. I’m just trying to convert it over to v2.

1 Like

In that case you should be using CrossEntropyLoss instead as MSELossFlat is meant for regression problems (hence our issue I think) as we’re actually outputting 2 values (probability of 0 and 1) when in reality it’s expecting just one (our one value) for MSE. Will it run with CrossEntropy? (And you were using MSE in v1?)

Unfortunately, cross entropy is not working either. That’s what was throwing the error originally. I tried sklearn log_loss, BCELossFlat, torch.nn.functional.binary_cross_entropy. Each one throws TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first..

No, sklearn log_loss is what I used. I should have known MSE wouldn’t work, but didn’t think it through. I only know enough about all this to be dangerous. :grinning:

Well, actually, I just looked back at my notebook from last year. I don’t see where I defined a loss function. I guess tabular learner inferred it. It wasn’t able to do that this time, so I supplied log_loss, which threw the error.

Anyway, thanks @muellerzr for your help.

Said fix was pushed the other day btw :slight_smile:

1 Like

Thanks a lot @muellerzr & @nestorDemeure,

SHAP for FASTAI Tabular Regression is working.
Others can find this Github Gist for Colab notebbok for plots.

1 Like

I just installed latest version of fastai2 and are trying to run the notebooks for tabular data but it fails.

One of the places are:

dls = to.dataloaders()
dls.valid.show_batch()

It seems like one_batch() is called and generates a subcall with variable “b” which is of type “tuple”.
the following lines are called to extract the row from the dataframe:

class _TabIloc:
“Get/set rows by iloc and cols by name”
def init(self,to): self.to = to
def getitem(self, idxs):
df = self.to.items
if isinstance(idxs,tuple):
rows,cols = idxs

So the problem seem to be that the idx that is a tuple with one item (row 0) is trying to be unpacked to rows and cols.
Does anyone else have this problem or did I fail to install fastai2 properly?
If this is a bug, what’s the correct way to fix it? Should the “b” variable be another type or should the tuple check if the length is 1 or 2 and unpack the rows and cols accordingly?

@dangraf can you tell us the error code and what you did to set up your TabularPandas?

I installed fastaiv2 by creating a news environment and then installing pytortch (conda install -c pytorch pytorch) to get version 1.4 and then cloning the gitrepo and then using pip install -e ."[dev]" to install fastai2.

After that, I opened up notebook 40 and started to run the cells from top down.

The error code I got is the following:

Could not do one pass in your dataloader, there is something wrong in it


ValueError Traceback (most recent call last)
in
1 dls = to.dataloaders()
----> 2 dls.valid.show_batch()

c:\gitrepo\fastai2\fastai2\data\core.py in show_batch(self, b, max_n, ctxs, show, **kwargs)
88
89 def show_batch(self, b=None, max_n=9, ctxs=None, show=True, **kwargs):
—> 90 if b is None: b = self.one_batch()
91 if not show: return self._pre_show_batch(b, max_n=max_n)
92 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)

c:\gitrepo\fastai2\fastai2\data\load.py in one_batch(self)
128 def one_batch(self):
129 if self.n is not None and len(self)==0: raise ValueError(f’This DataLoader does not contain any batches’)
–> 130 with self.fake_l.no_multiproc(): res = first(self)
131 if hasattr(self, ‘it’): delattr(self, ‘it’)
132 return res

C:\ProgramData\Anaconda3\envs\cryptopred\lib\site-packages\fastcore\utils.py in first(x)
174 def first(x):
175 “First element of x, or None if missing”
–> 176 try: return next(iter(x))
177 except StopIteration: return None
178

c:\gitrepo\fastai2\fastai2\data\load.py in iter(self)
95 self.randomize()
96 self.before_iter()
—> 97 for b in _loadersself.fake_l.num_workers==0:
98 if self.device is not None: b = to_device(b, self.device)
99 yield self.after_batch(b)

C:\ProgramData\Anaconda3\envs\cryptopred\lib\site-packages\torch\utils\data\dataloader.py in next(self)
343
344 def next(self):
–> 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \

C:\ProgramData\Anaconda3\envs\cryptopred\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
383 def _next_data(self):
384 index = self._next_index() # may raise StopIteration
–> 385 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
386 if self._pin_memory:
387 data = _utils.pin_memory.pin_memory(data)

C:\ProgramData\Anaconda3\envs\cryptopred\lib\site-packages\torch\utils\data_utils\fetch.py in fetch(self, possibly_batched_index)
32 raise StopIteration
33 else:
—> 34 data = next(self.dataset_iter)
35 return self.collate_fn(data)
36

c:\gitrepo\fastai2\fastai2\data\load.py in create_batches(self, samps)
104 self.it = iter(self.dataset) if self.dataset is not None else None
105 res = filter(lambda o:o is not None, map(self.do_item, samps))
–> 106 yield from map(self.do_batch, self.chunkify(res))
107
108 def new(self, dataset=None, cls=None, **kwargs):

c:\gitrepo\fastai2\fastai2\data\load.py in do_batch(self, b)
125 def create_item(self, s): return next(self.it) if s is None else self.dataset[s]
126 def create_batch(self, b): return (fa_collate,fa_convert)self.prebatched
–> 127 def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
128 def one_batch(self):
129 if self.n is not None and len(self)==0: raise ValueError(f’This DataLoader does not contain any batches’)

in create_batch(self, b)
7 super().init(dataset, bs=bs, shuffle=shuffle, after_batch=after_batch, num_workers=num_workers, **kwargs)
8
----> 9 def create_batch(self, b): return self.dataset.iloc[b]
10
11 TabularPandas._dl_type = TabDataLoader

in getitem(self, idxs)
6 df = self.to.items
7 if isinstance(idxs,tuple):
----> 8 rows,cols = idxs
9 cols = df.columns.isin(cols) if is_listy(cols) else df.columns.get_loc(cols)
10 else: rows,cols = idxs,slice(None)

ValueError: too many values to unpack (expected 2)

Is there any way to use Tabular as a TransformBlock in DataBlock API? Like using it with other types of data (image,mask,etc.)

No, it’s independent of the other blocks. Tabular is there to preprocess datafraems and creates batches from them. It only supports a y_block for targets. There will be a more modular block but since multimodal settings was not a priority in development, it’s not ready yet.

1 Like

Does anyone have input on the error above? I just tried install the fastai2 on a linux machine (previously was windows) using the environment.yml file. I get the same error.

1 Like

@dangraf I can’t recreate the error in Colab on the regular install (I haven’t tried dev yet).

Edit: Okay, now I can. It’s a bug inside of the dev version.

@sgugger it seems to be from the fact that the idxs are a very long list when calling a batch, whereas simply doing TabularPandas returns something different (The root of the bug is TabIloc, I put the print statement in idxs like so:

#export
class _TabIloc:
    "Get/set rows by iloc and cols by name"
    def __init__(self,to): self.to = to
    def __getitem__(self, idxs):
        df = self.to.items
        print(idxs)
        if isinstance(idxs,tuple):
            rows,cols = idxs
            cols = df.columns.isin(cols) if is_listy(cols) else df.columns.get_loc(cols)
        else: rows,cols = idxs,slice(None)
        return self.to.new(df.iloc[rows, cols])

What is expected when just doing a TabularPandas:

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)
(slice(None, None, None), 'workclass')
(slice(None, None, None), 'education')
(slice(None, None, None), 'marital-status')
(slice(None, None, None), 'occupation')
(slice(None, None, None), 'relationship')
(slice(None, None, None), 'race')
(slice(None, None, None), 'age_na')
(slice(None, None, None), 'fnlwgt_na')
(slice(None, None, None), 'education-num_na')
(slice(None, None, None), 'salary')

Behavior on dls.one_batch():

(1829, 6000, 4754, 3678, 823, 4682, 3525, 3136, 4430, 6376, 3077, 5487, 4382, 1594, 3501, 4306, 258, 7924, 6271, 7174, 5970, 1363, 7407, 4908, 2201, 7369, 3305, 7116, 499, 4439, 5406, 4046, 3743, 6204, 639, 1232, 3675, 256, 5134, 4411, 7563, 6902, 5661, 3314, 1243, 5573, 3327, 750, 6232, 3363, 2840, 5906, 4775, 7995, 4008, 3089, 7674, 4214, 5414, 5955, 7726, 3045, 7570, 3432)

ValueError                                Traceback (most recent call last)
<ipython-input-90-ccb93b9fbe07> in <module>()
----> 1 dls.one_batch()

9 frames
/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in one_batch(self)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
--> 130         with self.fake_l.no_multiproc(): res = first(self)
    131         if hasattr(self, 'it'): delattr(self, 'it')
    132         return res

/usr/local/lib/python3.6/dist-packages/fastcore/utils.py in first(x)
    174 def first(x):
    175     "First element of `x`, or None if missing"
--> 176     try: return next(iter(x))
    177     except StopIteration: return None
    178 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in __iter__(self)
     95         self.randomize()
     96         self.before_iter()
---> 97         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     98             if self.device is not None: b = to_device(b, self.device)
     99             yield self.after_batch(b)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    383     def _next_data(self):
    384         index = self._next_index()  # may raise StopIteration
--> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    386         if self._pin_memory:
    387             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     32                 raise StopIteration
     33         else:
---> 34             data = next(self.dataset_iter)
     35         return self.collate_fn(data)
     36 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in create_batches(self, samps)
    104         self.it = iter(self.dataset) if self.dataset is not None else None
    105         res = filter(lambda o:o is not None, map(self.do_item, samps))
--> 106         yield from map(self.do_batch, self.chunkify(res))
    107 
    108     def new(self, dataset=None, cls=None, **kwargs):

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in do_batch(self, b)
    125     def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
    126     def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
--> 127     def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')

<ipython-input-46-cad3c12e3ff5> in create_batch(self, b)
      6         super().__init__(dataset, bs=bs, shuffle=shuffle, after_batch=after_batch, num_workers=num_workers, **kwargs)
      7 
----> 8     def create_batch(self, b): return self.dataset.iloc[b]
      9 
     10 TabularPandas._dl_type = TabDataLoader

<ipython-input-52-8847abb6a04f> in __getitem__(self, idxs)
      6         print(idxs)
      7         if isinstance(idxs,tuple):
----> 8             rows,cols = idxs
      9             cols = df.columns.isin(cols) if is_listy(cols) else df.columns.get_loc(cols)
     10         else: rows,cols = idxs,slice(None)

ValueError: too many values to unpack (expected 2)

Hope that helps with debugging :slight_smile: (as I’m unsure what to do here)

1 Like

It’s all running fine for me, so I think the problem is not having the dev install of fastcore to go with fastai2.

2 Likes

Shoot… that was totally what was going on… my bad!

@dangraf there you go :slight_smile:

Considering this is a common issue, I added it to the FAQ as well

2 Likes