Only 5 out of 6 items are loaded to training dataset

Danrohn · June 8, 2021, 1:16pm

This problem is quite tricky since there is no actual error appearing.

I’m trying to load 6 different items to the training dataset, but actually, it’s loaded with 5 merely.

Of course with more items, there are more items to be lost. I don’t mind one or two items to be lost, but it doesn’t let me run the fit_one_cycle training without those missing files.

Any idea?

The last similar issue that I found was this, but stayed unsolved:

JackByte · June 8, 2021, 7:25pm

Hi @Danrohn,

what is the output of dls.valid.items ?

If its not in the valid set. You might play around with the batch size and see if you find a workaround.

Cheers

Danrohn · June 8, 2021, 8:02pm

You’re right. It was under the valid.items.

But now I’m wondering, why does the fit_one_cycle keep asking for that “missing” item?

Here’s the full stack trace:

/usr/local/lib/python3.7/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    110     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    111               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    113 
    114 # Cell

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    216             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    217             self.n_epoch = n_epoch
--> 218             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    219 
    220     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    158 
    159     def _with_events(self, f, event_type, ex, final=noop):
--> 160         try: self(f'before_{event_type}');  f()
    161         except ex: self(f'after_cancel_{event_type}')
    162         self(f'after_{event_type}');  final()

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_fit(self)
    207         for epoch in range(self.n_epoch):
    208             self.epoch=epoch
--> 209             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    210 
    211     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    158 
    159     def _with_events(self, f, event_type, ex, final=noop):
--> 160         try: self(f'before_{event_type}');  f()
    161         except ex: self(f'after_cancel_{event_type}')
    162         self(f'after_{event_type}');  final()

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch(self)
    202     def _do_epoch(self):
    203         self._do_epoch_train()
--> 204         self._do_epoch_validate()
    205 
    206     def _do_fit(self):

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch_validate(self, ds_idx, dl)
    198         if dl is None: dl = self.dls[ds_idx]
    199         self.dl = dl
--> 200         with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException)
    201 
    202     def _do_epoch(self):

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    158 
    159     def _with_events(self, f, event_type, ex, final=noop):
--> 160         try: self(f'before_{event_type}');  f()
    161         except ex: self(f'after_cancel_{event_type}')
    162         self(f'after_{event_type}');  final()

/usr/local/lib/python3.7/dist-packages/fastai/learner.py in all_batches(self)
    164     def all_batches(self):
    165         self.n_iter = len(self.dl)
--> 166         for o in enumerate(self.dl): self.one_batch(*o)
    167 
    168     def _do_one_batch(self):

/usr/local/lib/python3.7/dist-packages/fastai/data/load.py in __iter__(self)
    107         self.before_iter()
    108         self.__idxs=self.get_idxs() # called in context of main process (not workers/subprocesses)
--> 109         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
    110             if self.device is not None: b = to_device(b, self.device)
    111             yield self.after_batch(b)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
   1197             else:
   1198                 del self._task_info[idx]
-> 1199                 return self._process_data(data)
   1200 
   1201     def _try_put_index(self):

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1223         self._try_put_index()
   1224         if isinstance(data, ExceptionWrapper):
-> 1225             data.reraise()
   1226         return data
   1227 

/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
    427             # have message field
    428             raise self.exc_type(message=msg)
--> 429         raise self.exc_type(msg)
    430 
    431 

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/transforms.py", line 246, in encodes
    return TensorCategory(self.vocab.o2i[o])
KeyError: 'long/10/8.jpg'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data = next(self.dataset_iter)
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/load.py", line 118, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/usr/local/lib/python3.7/dist-packages/fastcore/basics.py", line 216, in chunked
    res = list(itertools.islice(it, chunk_sz))
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/load.py", line 133, in do_item
    try: return self.after_item(self.create_item(s))
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/load.py", line 140, in create_item
    if self.indexed: return self.dataset[s or 0]
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/core.py", line 333, in __getitem__
    res = tuple([tl[it] for tl in self.tls])
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/core.py", line 333, in <listcomp>
    res = tuple([tl[it] for tl in self.tls])
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/core.py", line 299, in __getitem__
    return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/core.py", line 261, in _after_item
    def _after_item(self, o): return self.tfms(o)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/transform.py", line 200, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/transform.py", line 150, in compose_tfms
    x = f(x, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/transform.py", line 73, in __call__
    def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/transform.py", line 83, in _call
    return self._do_call(getattr(self, fn), x, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/transform.py", line 89, in _do_call
    return retain_type(f(x, **kwargs), x, ret)
  File "/usr/local/lib/python3.7/dist-packages/fastcore/dispatch.py", line 118, in __call__
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/fastai/data/transforms.py", line 248, in encodes
    raise KeyError(f"Label '{o}' was not included in the training dataset") from e
KeyError: "Label 'long/10/8.jpg' was not included in the training dataset"

JackByte · June 8, 2021, 8:53pm

Hi @Danrohn
I have a feeling that the labeling is not working fine. Does dls.show_batch() show the labels above the image as expected?

Your df has a column with the labels, right? Then you should mention the index of that column with label_col (docs).

Cheers

Danrohn · June 8, 2021, 8:56pm

Alright, my solution for that was to set valid_pct=0, so now all of the training dataset is included:

But I think that I need to configure it somehow to work as a Regression training.

Now I got this error:

AssertionError                            Traceback (most recent call last)
<ipython-input-8-b26b868f23d4> in <module>()
----> 1 learn = cnn_learner(dls, resnet18, pretrained=False,loss_func=F.l1_loss, metrics=accuracy)

/usr/local/lib/python3.7/dist-packages/fastai/vision/learner.py in cnn_learner(dls, arch, normalize, n_out, pretrained, config, loss_func, opt_func, lr, splitter, cbs, metrics, path, model_dir, wd, wd_bn_bias, train_bn, moms, **kwargs)
    175 
    176     if n_out is None: n_out = get_c(dls)
--> 177     assert n_out, "`n_out` is not defined, and could not be inferred from data, set `dls.c` or pass `n_out`"
    178     model = create_cnn_model(arch, n_out, pretrained=pretrained, **kwargs)
    179 

AssertionError: `n_out` is not defined, and could not be inferred from data, set `dls.c` or pass `n_out`

Danrohn · June 8, 2021, 8:59pm

It doesn’t really show the labels, but on the other hand, the lables aren’t of classification.

For each image (input) there is a labeled other image (target) from which the training makes a progress.

Should I mention somewhere that it’s a regression?

JackByte · June 8, 2021, 9:18pm

Hi @Danrohn,

okay, I might not be of much help anymore. My knowledge limit is reached

I’ve used cnn_learner mostly for classification so far. And I have thought regression just means that instead of a class, the model predicts a number (e.g. age of the person in the image).

Once your model is trained, what output do you expect? I’m not sure if ResNet architectures and cnn_learner are what you need when you expect something else than a class or a number.

Danrohn · June 8, 2021, 9:23pm

Hey man, thanks for your help whatsoever! I appriciate it!

I’ve learnt that Resnet could fit the probelm I need, but also Unet.

When the model is trained, I expect the model to super enhance the image quality like in the Ground truth. I suppose that if the model is feeded with dark images as input, and fully detailed images as target (gt), the model will be able to reconstruct any other dark image as if it was taken picture of under lighten conditions.

JackByte · June 8, 2021, 9:36pm

Ah I see. I want to dive into generative models soon, too

I remember in the previous 2019 course there was a chapter about GANs and the quality enhancement implementation can be found, too. However, the code is based on fastai-v1 and many things have changed to fastai-v2

Danrohn · June 8, 2021, 9:39pm

Thanks! I’ll dive into the links that you shared with me!

Danrohn · June 9, 2021, 2:07am

Alright, I read the guide about the GAN with the dogs.
There is something that I don’t get it.

He wrote:

src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)
def get_data(bs,size):
    data = (src.label_from_func(lambda x: path_hr/x.name)
           .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
           .databunch(bs=bs).normalize(imagenet_stats, do_y=True))

    data.c = 3
    return data
data_gen = get_data(bs,size)
data_gen.show_batch(4)

And I’m trying to find the most fitting function from fastai2.
Is ImageImagelist equivalent to ImageDataLoader?

I tried to write something like this:

def get_input(x): return path + '/' + x['input']
def get_target(y): return path + '/' + y['target']

dls = DataBlock(
    blocks=(ImageBlock, RegressionBlock),
    get_x= get_input, 
    get_y= get_target,
    splitter=RandomSplitter,
    #batch_tfms=[*aug_transforms(size=(240,320)), 
              #  Normalize.from_stats(*imagenet_stats)]
)
dls = dblock.datasets(df)

But I don’t know where I can set the Batch size.

If can’t configure that batch size, then I can’t even show_batch of the images, like he did:

data_gen.show_batch(4)

JackByte · June 10, 2021, 8:21pm

Hi @Danrohn,

I would try ImageDataLoaders.from_name_func, since you could use a function for the labels there, too.

If you just try dls.show_batch() , it will use a batch size of 16.

But if you want to choose the batch size, add bs=8 to the constructor of ImageDataLoaders.from_name_func. Not all available parameters are named in the documentation. I guess this is for readability, and due to the fact, that the batch size can be set on other “levels”, too.

Danrohn · June 10, 2021, 8:53pm

Somebody addressed me to this guide: (which is what you had told me about, but fitted to fastai2)

And after playing a little more with the code, I managed to show the batch!