BigData, databunch and training loop

Then I don’t know what to say: there is no persistent array containing the data, it’s exactly like your second example with get_item.

Edit: ah I get it, it’s the tensor the problem. The problem is that we need a tensor otherwise we can’t use a language model. SO this is not a fix…

Thanks. I will dig a bit more to try to understand what is going on.

The problem is severely reduced if shuffle_train is set to False in the databunch. How bad is to turn shuffle off to que quality of the training while finetuning a language model? Without this option the workers end up eating all 32Gb of physical memory and 32Gb of swap resulting in a OOM. Turning off shuffle_train the memory usage while training starts with 12Gb of physical memory usage in the beggining of an epoch and ends the epoch with ~ 19Gb of physical memory usage. When the epoch ends it goes back to 12Gb.

dbunch_lm = dsrc.databunch(bs=bs, seq_len=sl, val_bs=bs, after_batch=Cuda, num_workers=2, pin_memory=True, shuffle_train=False)

Interesting. If your dataset is so huge, I doubt shuffling the training set will help, so you can definitely try without.
We’ll investigate this memory leak when we have some time, in any case.

Thank you, very much @sgugger for your superb work with fastai!

Best regards!

When I try to create the datasource object for text through:

tfms = [attrgetter(“text”), Tokenizer.from_df(“text”), Numericalize()]
dsrc = DataSource(df, [tfms], splits=splits, dl_type=LMDataLoader)

Unfortunately, it shows the following error, which, I guess, has to do with some kind of palatalization issue.

AttributeError: Can't pickle local object 'parallel_gen.<locals>.f'  

Can anybody help what is causing such an error?

I searched on the web for a solution, but could not find anything :roll_eyes::roll_eyes::roll_eyes:

Perhaps try using dill instead of pickle? (It works exactly the same way but it handles more complex stuff.)

Hi Pablo,
Could you elaborate more on your solution. I am facing the same error as Preka. As far as I know it’s related to multi-processing but not able to resolve it. This is what I tried so far:

dsets = DataSource(df_all, [tfms], splits=splits, dl_type=partial(LMDataLoader,num_workers=0))

If you are also getting

AttributeError: Can't pickle local object 'parallel_gen.<locals>.f'  

when trying to save data, then the first thing I would try would be to use dill instead of pickle to save and load data.

They work exactly the same way. In fact, many people go as far as importing dill as pickle so the code is the same:

import dill as pickle

Hi Pablo, thanks for your reply.

However, I tried what you suggested but it didn’t work. It also gives the same error as before.

I seem to believe I was having a similar problem with version 2, I’ll update when I can confirm.

Update: After some update it’s working fine!

Seems like it is an issue of windows (I was using Anaconda Powershell in Windows), as
it works perfectly in Linux (Ubuntu 18.4).

Thanks a lot for your extremely helpful support! :slightly_smiling_face:

1 Like

It was nothing, I’m just glad I could help!

1 Like

I came across the same problem with Win10. I just wonder whether the issue being fixed. Ubuntu 19 was working with CUDA(18.04 cuda didn’t work) in my Vbox, but it seems have problem recently. Maybe I should wait for Ubuntu 20. Any resolution for Fastai2 in Win?

I’ve been having the AttributeError: Can't pickle local object 'parallel_gen.<locals>.f' issue with MacOS but only when calling

dls_cl = TextDataLoaders.from_df(
    df, 
    text_col='headline', 
    label_col=topic,
    valid_pct=0.1, 
    text_vocab=learn.dls.vocab[0], 
    bs=128
)

using Fastapi. When I run the Python file normally it’s fine.

I was also having the same problem with pickling in Win10. In my case when trying to create a DataBlock that contains a TextBlock. I tracked down the issue and found a solution, but involves changes in torch_core. I will share it here for anyone that might be interested.

The root of the issue is that the function torch_core.parallel_gen contains two functions within it that cannot be pickled (at least it in windows, but AFAIK, nested functions in general are not pickleable). Pickling in this case is needed to parallelize processes, so this is why this problem might appear in different contexts. The workaround I found is to take the nested functions out, and use functools.partial to evaluate the needed info inside parallel_gen. So, if we change the function parallel_gen from this:

    def parallel_gen(cls, items, n_workers=defaults.cpus, as_gen=False, **kwargs):
        "Instantiate `cls` in `n_workers` procs & call each on a subset of `items` in parallel."
        batches = np.array_split(items, n_workers)
        idx = np.cumsum(0 + L(batches).map(len))
        queue = Queue()
        def f(batch, start_idx):
            for i,b in enumerate(cls(**kwargs)(batch)): queue.put((start_idx+i,b))
        def done(): return (queue.get() for _ in progress_bar(items, leave=False))
        yield from run_procs(f, done, L(batches,idx).zip())

to this:

    import functools
    def f_pg(clse,queue,batch, start_idx):
        for i,b in enumerate(clse(batch)): queue.put((start_idx+i,b))
    def done_pg(queue,items): return (queue.get() for _ in progress_bar(items, leave=False))

    def parallel_gen(cls, items, n_workers=defaults.cpus, as_gen=False, **kwargs):
        "Instantiate `cls` in `n_workers` procs & call each on a subset of `items` in parallel."
        batches = np.array_split(items, n_workers)
        idx = np.cumsum(0 + L(batches).map(len))
        queue = Queue()
        f=functools.partial(f_pg,cls(**kwargs),queue)
        done=functools.partial(done_pg,queue,items)
        yield from run_procs(f, done, L(batches,idx).zip())

Then the problem is solved. It feels a bit slow tho, so I am not sure if this is a good solution or a simple hack to make this work. I tested with this simple example that tries to load data into a Datablock:

    import numpy as np
    import pandas as pd
    from fastai2.text.all import Callback,DataBlock,TextBlock,CategoryBlock,ColReader

    t=[list('abcdefghijklmnopqrstuvwxyz'),list('0123456789')]
    t_l=np.random.randint(0,2,100)
    t_d=[''.join(np.random.choice(t[i],np.random.randint(1,10))) for i in t_l]
    df=pd.DataFrame.from_dict({'text':t_d,'label':t_l})

    db = DataBlock(blocks=(TextBlock.from_df('text'), CategoryBlock),
                          get_x=ColReader('text'),
                          get_y=ColReader('label'))
    dls = db.dataloaders(df)

I am new to fastai, so I am not sure if this is the right way of doing things, but I thought this might be helpful, so I decided to share.

2 Likes

grillard, thank you for this workaround. Great job and a perfect contribution!

seriousyhyena

Are you just using a forked version of fastcore without the functions in functions? I have a feeling it’s possibe to patch new functionality into fastai2 but I’m not sure how

I have a pull request open based on your proposed solution @grillard. It looks like it doesn’t work in Python 3.6 though which is a shame.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-151-da1277b2abe5> in <module>
      4 items = range(5)
      5 
----> 6 res = L(parallel_gen(_C, items, n_workers=3))
      7 idxs,dat1 = zip(*res.sorted(itemgetter(0)))
      8 test_eq(dat1, range(1,6))

~/Documents/fastcore/nbs/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     45             return x
     46 
---> 47         res = super().__call__(*((x,) + args), **kwargs)
     48         res._newchk = 0
     49         return res

~/Documents/fastcore/nbs/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    322         if items is None: items = []
    323         if (use_list is not None) or not _is_array(items):
--> 324             items = list(items) if use_list else _listify(items)
    325         if match is not None:
    326             if is_coll(match): match = len(match)

~/Documents/fastcore/nbs/fastcore/foundation.py in _listify(o)
    258     if isinstance(o, list): return o
    259     if isinstance(o, str) or _is_array(o): return [o]
--> 260     if is_iter(o): return list(o)
    261     return [o]
    262 

<ipython-input-150-9348352aeb75> in parallel_gen(cls, items, n_workers, **kwargs)
     10     f=partial(f_pg, cls(**kwargs), queue)
     11     done=partial(done_pg, queue, items)
---> 12     yield from run_procs(f, done, L(batches,idx).zip())

<ipython-input-147-1d6836dd4202> in run_procs(f, f_done, args)
      4     processes = L(args).map(Process, args=arg0, target=f)
      5     for o in processes: o.start()
----> 6     yield from f_done()
      7     processes.map(Self.join())

<ipython-input-149-d8e2b9ae3cb4> in done_pg(queue, items)
----> 5     return (queue.get() for _ in progress_bar(items, leave=False))

TypeError: 'NoneType' object is not callable

Jeremy sorted it! The changes are up on master