TextClasDataBunch: save and load

What is the desired way to save a TextClasDataBunch to disk and then load it later?

I tried ‘obj.save’ which goes back in the inheritance chain to `DataBunch.save’ followed by various loaders, but all of them failed for pretty trivial reasons.

A few questions:

  1. Should I be using the load_data function?

  2. I see the warning “Serializing the DataBunch only works when you created it using the data block API”, so maybe I need to create my TextClasBunch.from_df differently?

  3. In general, how do I know if my bunch was created with the data block API?

Thanks!

  1. Yes, that is the function you should use
  2. The factory methods use the data block API behind the scenes, they are just shortcuts, so you’re safe
  3. If you used the data block API or a factory method from fastai, it’s by opposition to people creating their DataBunch by passing PyTorch DataLoader directly
1 Like

Awesome! Is there a reason load_data isn’t a factory method? Sort of asymmetric.

It would require you to type the same class as for the data you created if it was. Since the class is saved with the rest of the data object, it’s just easier to use a regular function to load everything.

For posterity, there is one more tiny trick needed: passing ‘.’ to load_data

data_clas = TextClasDataBunch.from_df(...)
data_clas.save('text_clas_data_bunch')
data_clas = load_data('.', 'text_clas_data_bunch')

Thanks for your help!

2 Likes

‘.’ doesn’t always work. It should be the same value in path when you define your TextLMDataBunch and your TextClasDataBunch