I experienced failure at lr_find
and fit_one_cycle
with an input class that is inherited from TensorImage
.
I had made a class that inherits from TensorImage
that add a few class members and factory functions and a show function:
class ImageArray(TensorImage):
sample_seq_len = 10
@classmethod
def create(cls, seq_path, glob_pattern='*rgb.png', rotate=False, size=128):
...
@classmethod
def from_images(cls, imgs):
return cls(torch.stack((ToTensor()(img)/255.0 for img in imgs)))
def show(self, ctx=None, **kwargs):
...
lr_find and fit_one_cycle work well (but slow) when num_workers=0 but if I leave it unset or at any number>0 I see a bunch of error messaging and the call fails. things like:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py", line 313, in __reduce_ex__
args = (type(self), self.storage(), self.storage_offset(), tuple(self.size()), self.stride())
TypeError: 'int' object is not callable
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: 'int' object is not callableTraceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py", line 313, in __reduce_ex__
args = (type(self), self.storage(), self.storage_offset(), tuple(self.size()), self.stride())
TypeError: 'int' object is not callable
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: 'int' object is not callable
If, instead, I change my factory functions to return a regular TensorImage
and patch the show function everything works fine (so multiprocessing in itself works fine).
Unfortunately, my code is a bit long and I don’t have a nice clean reproducible example so can’t say for sure it’s not caused by something else I’ve done but I haven’t mentioned here.
Note that dls.one_batch() and model.forward() were not affected