Learner.load()

A lot of the lessons discuss how to save a learner. What is the process for loading one after a notebook is completely closed and then reopened? The documentation is a little fuzzy on how the learner needs to be instantiated prior to calling “load”. should we be saving the dataset associate with the leaner as well so that it guarantees that training and validation sets from the save are the same as the ones loaded. Is there a working example that someone can point me to that I can reference?

3 Likes

So I think that you have two options to save a learner :

  • You only want to save the weights and load them up later. You can do that with learner.save and learner.load on an already instantiated learner instance.
  • You want to save and load the full learner with everything you had. Then you do learner.export then learner = load_learner.
9 Likes

Hi, I’m having a weird error. I train my model (I’m using Colab) as usual with learn= create_cnn(data, models.resnet50, metrics=[accuracy]) then fit the model and save it. After that I use learn.export() to get the file export.pkl when I try to do inference I used learn=load_learner(path_to_pkl_file) and the following error appears:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
<ipython-input-84-b9e031f97a6b> in <module>()
      1 pkl = Path('fer2013/')
----> 2 learner= load_learner(pkl)

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in load_learner(path, fname, test)
    467 def load_learner(path:PathOrStr, fname:PathOrStr='export.pkl', test:ItemList=None):
    468     "Load a `Learner` object saved with `export_state` in `path/fn` with empty data, optionally add `test` and load on `cpu`."
--> 469     state = torch.load(open(Path(path)/fname, 'rb'))
    470     model = state.pop('model')
    471     src = LabelLists.load_state(path, state.pop('data'))

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in load(f, map_location, pickle_module)
    365         f = open(f, 'rb')
    366     try:
--> 367         return _load(f, map_location, pickle_module)
    368     finally:
    369         if new_fd:

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module)
    526             f.seek(0)
    527 
--> 528     magic_number = pickle_module.load(f)
    529     if magic_number != MAGIC_NUMBER:
    530         raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: unpickling stack underflow

Hi !

On what version of fastai are you ? Are you loading the learner on colab as well ?

1 Like

I had a similar problem with my language model. I was following lesson 3, starting with these steps to create my initial TestLMDataBunch:

path = datapath4file('/media/DataHD2/Notes/notes_dana_hp')
data_lm = TextLMDataBunch.from_csv(path=path, csv_name='notes_hp_half.txt', text_cols='note_text', 
             header=0)

This took an hour, so I saved it:

data_lm.save()

Then I created my learner from pre-trained wiki 103, and trained it for 1 epoch:

learn = language_model_learner(data_lm, pretrained_model=URLs.WT103_1, drop_mult=0.3)
learn.fit_one_cycle(1, 3e-2, moms=(0.8,0.7))

Then I saved my model weights:

learn.save('fit_head')
learn.save_encoder('fit_head_enc')

At this point, I assumed I had everything I needed to re-created my state at that point, so I didn’t do an “export” and I shut down my notebook, and came back later, realizing I didn’t know how to re-create it. So this is what I tried, which seemed to work so far:

path = datapath4file('/media/DataHD2/Notes/notes_dana_hp')
data_lm = TextLMDataBunch.load(path, 'tmp', bs=48)   
learn = language_model_learner(data_lm, pretrained_model=None, drop_mult=0.3)
learn.load('fit_head')

This was a guess, especially “pretrained_model=None”, but it seems that it worked.

Any thoughts on how this could have been improved?

1 Like

Am I training my LM above (data_lm) with a different vocabulary than Wiki103? I am not sure because the class example used the Data Block API and I just used TextLMDataBunch.from_csv(). Any suggestions appreciated!
Dana

Yes. You need to instantiate the learner prior to use of the load method for continued training and testing of a saved model. Thus, you should also save your DataBunch and load it prior to instantiating your learner.

This procedure is not necessary when loading models exported for inference. You can simply load_learner in this case.

As to the questions about training the language model, your model has been initialized with pre-trained weights from Wiki103, but you’re training via transfer learning on your TextLMDataBunch.

3 Likes

I did this and it worked for me:
I was doing image classification on 512 * 512 images. So, according to Jeremy’s advice in one of the lectures I first trained the model on 128 * 128 dimension. I used
“learn = cnn_learner(data, models.resnet50, metrics = accuracy)”
Then after fitting the model on 128 * 128 images I planned to train model on 256 * 256 images. But I needed to restart the kernel in order to not exhaust the 12gb RAM of the server. So here’s what I did

  1. path = Config.data_path()/‘my_model’
    2.path.mkdir(parents = True, exist_ok = True)
  2. learn.save(path/‘stage1’)
  3. Restarted my kernel and as a result all the variables were lost.
  4. Reinitiated the tfms and data variable but this time for 256*256 images by "data = (src.transform(tfms, size = 256).databunch().normalize(imagenet_stats))
  5. Reinitiated learn by “learn = cnn_learner(data, models.resnet50, metrics = accuracy)”
    7 Then finally, learn.load(path/‘stage1’)
1 Like

this is weird. i did everything pretty much the same. could you try using a path instead of a string for learnenr.load() and see what happens. like this Path(‘kaggle’)/‘input’/‘etc’, i.e just try turning it into a path using pathlib. just taking a shot here, because if it couldnt load it would show error. see the pathlib part here.
Is there any possibility of the file getting corrupted or something?

Somehow the problem resolved itself :sweat_smile:
Ran another session overnight and reached 34% acc. Maybe something was wrong in the kernel instance.
By the way, how long till the ULMFiT thread?
Cheers.

If you want to release RAM, here are some tricks that don’t require restarting your kernel/jupyter/session

https://docs.fast.ai/basic_train.html#Freeing-memory

As explained on the documentation
https://docs.fast.ai/basic_train.html#Saving-and-loading-models

Saving and loading models

Simply call Learner.save and Learner.load to save and load models. Only the parameters are saved, not the actual architecture (so you’ll need to create your model in the same way before loading weights back in).

So you save like:

learn.save("trained_model")
or
learn.save("trained_model", return_path=True)

But then load by attaching it to a learner architecture

learn = cnn_learner(data, models.resnet18).load("trained_model")

or
learn = cnn_learner(data, models.resnet18)
learn.load("trained_model")

Note that here we also need to attach the databunch (data) to the learner


https://docs.fast.ai/tutorial.inference.html

A very different thing is to export the learner with everything it needs like

learn.export()
or
learn.export('trained_model.pkl')

Once everything is ready for inference, we just have to call learn.export to save all the information of our Learner object for inference: the stuff we need in the DataBunch (transforms, classes, normalization…), the model with its weights and all the callbacks our Learner was using. Everything will be in a file named export.pkl in the folder learn.path. If you deploy your model on a different machine, this is the file you’ll need to copy.

And then importing it with

learn = load_learner(path)
or
learn = load_learner(path, 'trained_model.pkl')

To create the Learner for inference, you’ll need to use the load_learner function. Note that you don’t have to specify anything: it remembers the classes, the transforms you used or the normalization in the data, the model, its weigths… The only argument needed is the folder where the ‘export.pkl’ file is.


Here you can see a similar conversation

1 Like