Prediction of a scalar with a CNN

redturtle · January 7, 2019, 6:43pm

Would you be willing to provide an example of predicting scalars from images using the standard ImageDataBunch.from_df() ?

I have a very similar problem. I have a data frame with two columns, the first “filename” and the second “age”, saved as a string object and int64 respectively.

When I run the following code:

data = ImageDataBunch.from_df(path=path, df=df, size=224, bs=64)
arch = models.resnet34
learn = create_cnn(data, arch, metrics=MSELossFlat)
lr = 5e-2
learn.fit_one_cycle(1, slice(lr))

I get the same error message as jamesp:
RuntimeError: bool value of Tensor with more than one value is ambiguous

EDIT: I think I figured it out. This blog post/code are very helpful:

The key is to use ‘label_cls=FloatList’ when constructing the labels. This tells fastai to expect a regression problem, e.g:

data = (ImageItemList.from_csv(path, ‘csv_file’, folder=’’, suffix=’’)
.random_split_by_pct(0.2)
.label_from_df(cols=1, label_cls=FloatList)
.transform(tfms, size=224)
.databunch())
data.normalize(imagenet_stats)

dthiagarajan · January 9, 2019, 3:40am

Could anyone provide some insight on how this might be extended to predict multiple scalar outputs? I’m having trouble figuring out a way to do this.

Another post with the same question that is unresolved.

sgugger · January 9, 2019, 2:18pm

If you pass a list instead of 1 to cols, it should work.

Lankinen · January 9, 2019, 2:37pm

Not working on my problem. I needed to change a lot of code and I think now it is not working for other problems. I will send code here if everything worked well.

sgugger · January 9, 2019, 4:10pm

Should be fixed in master now.

Lankinen · January 9, 2019, 4:58pm

I’m acctualy using this for text data so I’m not sure is this something related to that but I got this error when I tried to train the model.

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'other'

targs: tensor([[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]], device=‘cuda:0’)
input: tensor([[2],
[2],
[0],
[0],
[0],
[0]], device=‘cuda:0’)

Targets can be anything from 0 to 1 and I need three different values.

data = (TextList.from_csv('.', text_data_with_three_targets.csv', vocab=data.vocab)
             .random_split_by_pct()
             .label_from_df(cols=[1,2,3],label_cls=FloatList)
             .databunch(bs=32,no_check=True))

dthiagarajan · January 9, 2019, 5:28pm

When I try that, I run into the following error when I try to create my DataBunch:

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

This occurs when the summary (repr) of the DataBunch is about to be outputted, specifically in tensor() method, where there still seems to be an issue as referred to in this line:
# XXX: Pytorch bug in dataloader using num_workers>0; TODO: create repro and report

How can I get around this?

sgugger · January 9, 2019, 7:53pm

It’s hard to help without the rest of your code and the full error message.

Lankinen · January 9, 2019, 8:18pm

I will provide those tomorrow and I will also try to debug it by myself.

dthiagarajan · January 9, 2019, 11:28pm

I’ve put my example here, with the error message shown in the 4th cell.

sgugger · January 9, 2019, 11:36pm

You don’t have the latest code, so it’s normal it doesn’t work.

Lankinen · January 10, 2019, 6:41am

path = Path('/path/to/texts')
data = (TextList.from_folder(path,extensions='.txt')
            .random_split_by_pct(0.1)
            .label_for_lm()
            .databunch(bs=32))
learn = language_model_learner(data, pretrained_model=URLs.WT103_1, drop_mult=0.3)
learn.load(Path('/path/to/fine_tuned/language_model'))

data2 = (TextList.from_csv('.', 'example.csv', vocab=data.vocab)
             .random_split_by_pct()
             .label_from_df(cols=[1,2,3],label_cls=FloatList)
             .databunch(bs=32))

learn = text_classifier_learner(data2,drop_mult=0.5)
learn.load_encoder('/path/to/fine_tuned_enc')
learn.freeze()
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))

Error message:
RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))

~/Downloads/fastai-master (1)/fastai-master/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/Downloads/fastai-master (1)/fastai-master/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    171         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    172         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 173             callbacks=self.callbacks+callbacks)
    174 
    175     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/Downloads/fastai-master (1)/fastai-master/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     93     except Exception as e:
     94         exception = e
---> 95         raise e
     96     finally: cb_handler.on_train_end(exception)
     97 

~/Downloads/fastai-master (1)/fastai-master/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     88             if not data.empty_val:
     89                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 90                                        cb_handler=cb_handler, pbar=pbar)
     91             else: val_loss=None
     92             if cb_handler.on_epoch_end(val_loss): break

~/Downloads/fastai-master (1)/fastai-master/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     53             if not is_listy(yb): yb = [yb]
     54             nums.append(yb[0].shape[0])
---> 55             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     56             if n_batch and (len(nums)>=n_batch): break
     57         nums = np.array(nums, dtype=np.float32)

~/Downloads/fastai-master (1)/fastai-master/fastai/callback.py in on_batch_end(self, loss)
    248         "Handle end of processing one batch with `loss`."
    249         self.state_dict['last_loss'] = loss
--> 250         stop = np.any(self('batch_end', not self.state_dict['train']))
    251         if self.state_dict['train']:
    252             self.state_dict['iteration'] += 1

~/Downloads/fastai-master (1)/fastai-master/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    196     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    197         "Call through to all of the `CallbakHandler` functions."
--> 198         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    199         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    200 

~/Downloads/fastai-master (1)/fastai-master/fastai/callback.py in <listcomp>(.0)
    196     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    197         "Call through to all of the `CallbakHandler` functions."
--> 198         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    199         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    200 

~/Downloads/fastai-master (1)/fastai-master/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
    283         if not is_listy(last_target): last_target=[last_target]
    284         self.count += last_target[0].size(0)
--> 285         self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
    286 
    287     def on_epoch_end(self, **kwargs):

~/Downloads/fastai-master (1)/fastai-master/fastai/metrics.py in accuracy(input, targs)
     29     print('targs:',targs)
     30     print('input:',input)
---> 31     return (input==targs).float().mean()
     32 
     33 def accuracy_thresh(y_pred:Tensor, y_true:Tensor, thresh:float=0.5, sigmoid:bool=True)->Rank0Tensor:

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'other'

example.csv

sgugger · January 10, 2019, 2:25pm

Ah, this is because of the metrics always set to accuracy. You can remove it by passing metrics=[] in your learner creation. I’ll fix it this morning.

Lankinen · January 10, 2019, 7:17pm

Thank you @sgugger ! It is working now.

Do you think there might be idea to create own accuracy for this kind of multi output problem? It could calculate the accuracy for every pair and then calculate the mean of those.

jamesp · February 7, 2019, 12:05am

Coming back to this after the v1 API seems to have stabilized, I see ImageDataBunch and ImageItemList. It seems that ImageDataBunch's initializers are only for classifiers (since I don’t see a way to set label_cls), so if we want to build a regressor without writing a custom initializer, we need to use ImageItemList. Is that accurate?

sgugger · February 7, 2019, 2:25pm

ImageDataBunch and its factory methods are for beginners. You can do a regression problem with it if your targets are exactly like the library would expect them to be (floats) but you should really use the data block API (and ImageItemList) to have the maximum flexibility.

jamesp · February 7, 2019, 2:38pm

Thanks! I was able to whip up an ImageItemList to do what I wanted, and I must say that things are much easier to use now than they were in October. Thanks for all of your efforts on this library.

jwuphysics · February 7, 2019, 7:10pm

Hi @jamesp, I’m encountering a similar problem with the latest fastai v1, and I’m afraid I’m still a beginner and have been relying on ImageDataBunch and its factory methods.

I’ve set up a custom simple CNN:

class SimpleCNN(nn.Module):
    def __init__(self, pretrained=False):                  # `pretrained` kw seems to be needed for `create_cnn`
        super(SimpleCNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=5, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc = nn.Linear(32*32*32, 1)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

and then run

learn = create_cnn(data, arch=SimpleCNN, pretrained=False, metrics=[RMSE])

lr = 1e-2
learn.fit_one_cycle(5, slice(lr), pct_start=0.9)

I end up with a long traceback:

TypeError                                 Traceback (most recent call last)
<ipython-input-87-d8be187b950e> in <module>
      1 lr = 1e-2
----> 2 learn.fit_one_cycle(5, lr)

~/anaconda3/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    176         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    177         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 178             callbacks=self.callbacks+callbacks)
    179 
    180     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/lib/python3.6/site-packages/fastai/utils/mem.py in wrapper(*args, **kwargs)
     83 
     84         try:
---> 85             return func(*args, **kwargs)
     86         except Exception as e:
     87             if "CUDA out of memory" in str(e) or tb_clear_frames=="1":

~/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     98     except Exception as e:
     99         exception = e
--> 100         raise e
    101     finally: cb_handler.on_train_end(exception)
    102 

~/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     93             if not data.empty_val:
     94                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 95                                        cb_handler=cb_handler, pbar=pbar)
     96             else: val_loss=None
     97             if cb_handler.on_epoch_end(val_loss): break

~/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     55             if not is_listy(yb): yb = [yb]
     56             nums.append(yb[0].shape[0])
---> 57             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     58             if n_batch and (len(nums)>=n_batch): break
     59         nums = np.array(nums, dtype=np.float32)

~/anaconda3/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, loss)
    256         "Handle end of processing one batch with `loss`."
    257         self.state_dict['last_loss'] = loss
--> 258         stop = np.any(self('batch_end', not self.state_dict['train']))
    259         if self.state_dict['train']:
    260             self.state_dict['iteration'] += 1

~/anaconda3/lib/python3.6/site-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    197     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    198         "Call through to all of the `CallbakHandler` functions."
--> 199         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    200         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    201 

~/anaconda3/lib/python3.6/site-packages/fastai/callback.py in <listcomp>(.0)
    197     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    198         "Call through to all of the `CallbakHandler` functions."
--> 199         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    200         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    201 

~/anaconda3/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
    291         if not is_listy(last_target): last_target=[last_target]
    292         self.count += last_target[0].size(0)
--> 293         self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
    294 
    295     def on_epoch_end(self, **kwargs):

TypeError: object() takes no parameters

Is this at all related to what you’ve discussed above, and or would you be able to supply some simple working code?

Thanks!

EDIT: I have all this in a Jupyter notebook that might be a little more self-explanatory. Ignore the README file since it is referring to fastai v0.7.

jamesp · February 7, 2019, 7:21pm

I’m doing barebones simple stuff - using ImageNet on pretrained resnets. So, I doubt that my solution will be helpful unless your underlying problem was with the databunch, like mine was. But, here is my ImageItemList code, in case it’s helpful:

np.random.seed(42)
bs = 40
tfms = get_transforms(do_flip=False)
data = (ImageItemList.from_df(path=p/'../all/', 
                              df=annotated, 
                              cols='file',
                              suffix='.jpg')
        .random_split_by_pct(0.3)
        .label_from_df(cols=trait, label_cls=FloatList)
        .transform(tfms, size=224)
        .databunch())

data.bs = bs

learn = create_cnn(data, models.resnet50, metrics=[mean_squared_error])

In this case, p is the pathlib.Path to the images, I have a dataframe called annotated with filenames and my trait of interest, and I think that’s all that’s missing from the definitions above. The key for regression was label_cls=FloatList.

jwuphysics · February 7, 2019, 7:27pm

Thanks! There was no issue with the DataBunch… I found out that the error was with the function fastai.metrics.RMSE, which for some reason gave the long chain of errors. Your use of mean_squared_error worked perfectly, and I found that using root_mean_squared_error works as well.

Now I’m just curious why fastai.metrics.RMSE causes problems…