Hey everyone! In chapter 2, the book covers exporting models for inference. When I export my model using learner.export() and upload the .pkl file to another colab notebook, I get the following error:
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Does anyone have any ideas on what is causing this?
What code are you actually running? Can you share a public link?
p.s. Please read here the section “Be precise and informative about your problem”
Hey @bencoman, sorry about that. The code is basically the exact same code that’s used in Chapter 2 notebook. Here is the notebook where the model is created and here is the notebook where the model is used for inference.
I’ve seen something similar to this when a pkl file has been downloaded, but not fully. I would try recreating the pkl file in this case and see if that fixes things for you. Is it possible the pkl file was interrupted as it was being created?
Hey @KevinB, thanks for the response. I am pretty sure nothing was interrupted during the model creation.
I reproduced your error.
TL;DR: the key thing is to use checksums to ensure file transfers are successful.
In detail…
I ran your first notebook with two changes:
- since I didn’t ahve an AZURE key, I repalced search_images_bind() with search_images_ddg() per latest course notebook fast ai Course 1 - Lesson 2 : Bear Classification | Kaggle
- I added a checksum of the model export…
learner.export()
!md5sum ./export.pkl
0a5bac47a4708f849fba379438e0e148 ./export.pkl
This checksum allows you to confirm the file is identical on the receiving end.
In the second notepad, while the upload circle remains part way round (and btw, my upload is extremely slow, like 5 minutes)…
then I get the following, which differs from original…
!md5sum "export.pkl"
b532b9e9990546dd48fa500b01bc7dce export.pkl
so load_learner() fails with…
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Once the yellow upload circle disappers, the receiving checksum matches the origin, and load_learner() is successful.
@bencoman Thank you very much for your help! I’ll try this out and close the issue.