Dataloader error

chsafouane · January 29, 2019, 2:43pm

Hi!

I’ve been executing lesson 1 notebook and one I execute this line:

data.show_batch(rows=3, figsize=(7,6))

, I get this error:

RuntimeError: DataLoader worker (pid 1579) is killed by signal: Bus error.

Here is the whole error message:

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-66824b983385> in <module>
----> 1 data.show_batch(rows=3, figsize=(7,6))

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in show_batch(self, rows, ds_type, **kwargs)
    157     def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, **kwargs)->None:
    158         "Show a batch of data in `ds_type` on a few `rows`."
--> 159         x,y = self.one_batch(ds_type, True, True)
    160         n_items = rows **2 if self.train_ds.x._square_show else rows
    161         if self.dl(ds_type).batch_size < n_items: n_items = self.dl(ds_type).batch_size

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm, cpu)
    140         w = self.num_workers
    141         self.num_workers = 0
--> 142         try:     x,y = next(iter(dl))
    143         finally: self.num_workers = w
    144         if detach: x,y = to_detach(x,cpu=cpu),to_detach(y,cpu=cpu)

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in __iter__(self)
     69     def __iter__(self):
     70         "Process and returns items from `DataLoader`."
---> 71         for b in self.dl: yield self.proc_batch(b)
     72 
     73     @classmethod

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    629         while True:
    630             assert (not self.shutdown and self.batches_outstanding > 0)
--> 631             idx, batch = self._get_batch()
    632             self.batches_outstanding -= 1
    633             if idx != self.rcvd_idx:

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _get_batch(self)
    608             # need to call `.task_done()` because we don't use `.join()`.
    609         else:
--> 610             return self.data_queue.get()
    611 
    612     def __next__(self):

/dds/miniconda/envs/fastai/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout)
     92         if block and timeout is None:
     93             with self._rlock:
---> 94                 res = self._recv_bytes()
     95             self._sem.release()
     96         else:

/dds/miniconda/envs/fastai/lib/python3.7/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

/dds/miniconda/envs/fastai/lib/python3.7/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

/dds/miniconda/envs/fastai/lib/python3.7/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

/dds/miniconda/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in handler(signum, frame)
    272         # This following call uses `waitid` with WNOHANG from C side. Therefore,
    273         # Python can still get and update the process status successfully.
--> 274         _error_if_any_worker_fails()
    275         if previous_handler is not None:
    276             previous_handler(signum, frame)

RuntimeError: DataLoader worker (pid 1579) is killed by signal: Bus error.

kechan · January 31, 2019, 10:33pm

I got the same error too. Are you on python 3.7? My fastai version is 1.0.42.

Note: I didn’t get this error if i ran this on colab, and it’s with python 3.6. I am not sure if there are known problems with python 3.7

iNLyze · February 1, 2019, 3:49pm

I saw the same problem, also python 3.7, fastai 1.0.42

chsafouane · February 1, 2019, 10:21pm

Yeah that is exactly the case but then I tried it later on and it worked ( I’ve changed nothing I just restarted the kernel)

heye0507 · February 2, 2019, 1:59am

You can change the num_workers=0, it should fix the issue
I think by default num_workers = 4 or 8, this is asking the framework to use more cpu to load the image. If you set to 0, it will just use one core.

I have this issue when using kaggle kernels, somehow is not supported. I dont think GCP has this issue.

kechan · February 3, 2019, 12:06am

Did this:
data.show_batch(rows=3, figsize=(7,6), num_workers=0)

still got:

RuntimeError: DataLoader worker (pid 18959) is killed by signal: Unknown signal: 0.

heye0507 · February 3, 2019, 12:22am

You need to add it to the databunch(num_workers=0), it is a databunch argument.

then call data.show_batch().

I experienced the same thing when I was playing around on kaggle.
You can check out the following link as an example to see if it can solve your problem.

https://www.kaggle.com/heye0507/prepare-databunch-with-fastai-1-0

kechan · February 3, 2019, 12:31am

ah ok. i put it in data bunch and it is working now. thanks!

ding404 · February 4, 2019, 4:24am

Hi,

I also met this issue by default in notebook with fastai 1.0.42.
My default kernel is [Python 3] which is shown on notebook
After I change kernel by click [Kernel]->[Change kernel]->[Python[conda env: xxx]], and run this data.show_batch(rows=3, figsize=(7,6)), it works without error.

I didn’t put num_workers=0 in data bunch in my running.

zhao · February 12, 2019, 2:31pm

I met the same issue. as @ding404 pointed out, I installed conda and run notebook under a conda environment, it works.

prairieguy · August 22, 2019, 8:27pm

I too was hit with this issue. I had previously used the notebook but ran into trouble with the new NPL course. Changing bs or num_workers had no effect. I restarted, reinstalled the fastai repo, updated conda, but nothing helped. Ultimately, I just did a clean install of the conda fastai environment. That did fix it. (I run fastai on own server and use pillow-simd, which I also re-installed. I have a hunch that it might be the source of this issue, but I wasn’t able to run it to ground.)

irudyak · September 13, 2019, 11:35am

Works for me.