BigData, databunch and training loop

Thank you, very much @sgugger for your superb work with fastai!

Best regards!

When I try to create the datasource object for text through:

tfms = [attrgetter(“text”), Tokenizer.from_df(“text”), Numericalize()]
dsrc = DataSource(df, [tfms], splits=splits, dl_type=LMDataLoader)

Unfortunately, it shows the following error, which, I guess, has to do with some kind of palatalization issue.

AttributeError: Can't pickle local object 'parallel_gen.<locals>.f'  

Can anybody help what is causing such an error?

I searched on the web for a solution, but could not find anything :roll_eyes::roll_eyes::roll_eyes:

Perhaps try using dill instead of pickle? (It works exactly the same way but it handles more complex stuff.)

Hi Pablo,
Could you elaborate more on your solution. I am facing the same error as Preka. As far as I know it’s related to multi-processing but not able to resolve it. This is what I tried so far:

dsets = DataSource(df_all, [tfms], splits=splits, dl_type=partial(LMDataLoader,num_workers=0))

If you are also getting

AttributeError: Can't pickle local object 'parallel_gen.<locals>.f'  

when trying to save data, then the first thing I would try would be to use dill instead of pickle to save and load data.

They work exactly the same way. In fact, many people go as far as importing dill as pickle so the code is the same:

import dill as pickle

Hi Pablo, thanks for your reply.

However, I tried what you suggested but it didn’t work. It also gives the same error as before.

I seem to believe I was having a similar problem with version 2, I’ll update when I can confirm.

Update: After some update it’s working fine!

Seems like it is an issue of windows (I was using Anaconda Powershell in Windows), as
it works perfectly in Linux (Ubuntu 18.4).

Thanks a lot for your extremely helpful support! :slightly_smiling_face:

1 Like

It was nothing, I’m just glad I could help!

1 Like