Serialize learner to string

bachsh · March 13, 2019, 7:39pm

Hey there,

Is there a way to dump a model/learner to string instead of a file? Similar to how pickle.dumps works. I think it might be possible to save to an io.BytesIO object but not sure how to do it.

Avi

sgugger · March 13, 2019, 7:48pm

We use torch.save behind the scenes, so it would have to be possible on that level. They use pickle, I think, so maybe it’s doable. If you tell me how to torch.save to a string, I can add the option.

bachsh · March 13, 2019, 7:58pm

Taken from torch serialization (https://pytorch.org/docs/stable/_modules/torch/serialization.html)

x = torch.tensor([0, 1, 2, 3, 4])
# Save to io.BytesIO buffer
buffer = io.BytesIO()
torch.save(x, buffer)

It seems that torch.save accepts a “maybe file” parameter. The thing is that the Learner.save methods accepts a PathOrStr and relies on it inside.
I can make the change myself and PR it if you give me the guideline on how you’d like to see the interface of the Learner.save and Learner.export methods (or alternatively if you think it should be a separate method).

sgugger · March 13, 2019, 8:50pm

It can be the same method. Just add a new argument buffer that defaults to None, and then have it point to path/fname if it’s none. That should do the trick.

bachsh · March 13, 2019, 9:19pm

You mean you want the signature to be like this?

def save(self, name:PathOrStr, return_path:bool=False, with_opt:bool=True, buffer:io.BytesIO=None)

If that is the case then you will have to learn.save(name=None, buffer=buffer) to save to a buffer.
How about putting it under the same parameter just like Torch does? That way we could just use learn.save(name=buffer) (perhaps change the name of the variable to make it more readable). Inside the implementation we can check for the type and work accordingly (also raise an error if return_path=True and using a buffer).
WDTY?

sgugger · March 13, 2019, 9:22pm

I’m not too fond of replacing fname by a potential buffer and rename it, since it’s what most people use.
Maybe we can make fname default to None so that you can just type learn.save(buffer=buffer)? Then an assert will throw an error if name and buffer are None.

bachsh · March 13, 2019, 9:27pm

Sounds good, thanks. Sometimes I wish Python supported function overloading