Fastai v2 – export / load_learner issue

joneslloyd · March 5, 2020, 2:12pm

Hello all,

I have encountered this error: ZeroDivisionError: integer division or modulo by zero when using Google Colab.

Here is my code – Heavily inspired by one of Zach Mueller’s lessons:

from torchvision.models import resnet34
from fastai2.metrics import accuracy_multi

# CNN Learner
learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[accuracy_multi])

learn.lr_find()

lr = 1e-2
learn = learn.to_fp16()

learn.fit_one_cycle(5, slice(lr))

#Export
learn.export('stage-1.pkl')

#Import
learn = load_learner('stage-1.pkl')

#Unfreeze and find
learn.unfreeze()
learn.lr_find()

The learn.lr_find() line (the one at the bottom) seems to be where the error occurs, and I believe it’s because of using export / load_learner.

If I try this, instead:

learn.export('stage-1.pkl')
path = learn.path
learn = load_learner(path, 'stage-1.pkl')

I get IsADirectoryError: [Errno 21] Is a directory: '.'.

–

(Note: I’m using export / load_learner because when I just tried using learn.save and learn.load, the lr_find graph would not reflect the earlier state of the model, even after I’d used learn.load – So I assumed something was going wrong and decided to explicitly export the model to file and re-load it).

Does anybody know why this might be happening?

I’m fully prepared for the answer to be: Because I’m not doing it properly / have made an obvious mistake

muellerzr · March 5, 2020, 2:20pm

When you load the model back in via load_learner, what is the output of learn.summary()?

(Also moved this to fastai2 as it seemed more relevant there )

joneslloyd · March 5, 2020, 2:26pm

I’m just re-running things and will report back!

(I did try to post in fastaiv2 but it didn’t seem to be an option – I guess I missed it).

joneslloyd · March 5, 2020, 5:01pm

So, if I run:

learn.export(base_dir + 'stage-1')
learn.summary()

I get this.

If I then run this:

learn = load_learner(base_dir + 'stage-1')
learn.summary()

I get this.

muellerzr · March 5, 2020, 5:03pm

Ah yes the bottom makes sense, because we have no data! My bad

So the real issue here is when you export a learner, it does not keep the data, so what you’re trying to do won’t really work ever (Wasn’t able to really look it over again until now)

Export is only designed for when you’re all done and ready to deploy (or atleast at a point where you don’t need your training data)

joneslloyd · March 5, 2020, 5:04pm

Oohhhhh, I see – I didn’t know this (and didn’t find reference to that in the documentation).

Is it therefore a case (every time I find that doing something like this learn.fit_one_cycle(5, slice(1e-3, lr/5)) needs tweaking) of refreshing the page and re-running every cell again?

muellerzr · March 5, 2020, 5:06pm

It’s the same as v1

There shouldn’t ever be I don’t think during training? You certainly don’t need to run code top-down in a jupyter environment. If you want to go back to previous training while you’re working on finding a good learning rate just simply save your learner, then fit, and if it’s not good enough load it back in via learn.load() and fit again (I think that’s what you’re asking?)

joneslloyd · March 5, 2020, 5:48pm

This is what I thought I was supposed to do – But when I do that, and run lr_find(), the graph was different the second time (as if it was using the unfrozen model).

So that’s what I was trying to export it to file and re-import.

muellerzr · March 5, 2020, 5:50pm

It looks like you were doing exactly that though? (You unfroze your learner and then found the learning rate). Also, the longer you train the more like an unfrozen graph it will look like I’ve found

joneslloyd · March 5, 2020, 5:54pm

Sorry I wasn’t at all clear there.

I mean that after unfreezing and running, and deciding that I wanted to try some different settings, I re-loaded the model from before unfreezing, and ran lr_find() again, but the graph was different (implying that the model had changed and was not the same unfrozen model as before).

muellerzr · March 5, 2020, 5:56pm

Ah that would do it. You never froze again, so you should call learn.freeze() and try again

joneslloyd · March 5, 2020, 5:59pm

Ohhh! So even though I’m re-loading the previous (frozen) version, the unfreeze attribute/flag persists unless I explicitly call freeze? That may well be the issue here! Thanks a lot!

One other question:

Is there a way to avoid having to re-run everything after the notebook (in Colab) times out and the runtime disconnects?

muellerzr · March 5, 2020, 6:00pm

If you save intermittently during training and link it to your google drive, you can load in previous weights. else no

joneslloyd · March 5, 2020, 7:05pm

Got it, okay. I believe I’m doing that too, but I’ll review everything. Thanks again.

apsal · August 16, 2020, 6:13am

hey, I just trained an image classification model in colab and exported the model as a .pkl file in my GDrive. I then downloaded the pickle file locally to my system. And now when I’m using the load_learner, I get an error - ‘PosixPath’ object has no attribute ‘tell’. Using the pickle.load function also gives some error.

apsal · August 16, 2020, 4:36pm

This solved the issue.