Changing the path of the sentencepiece in exported learner


I’d like to ask, is there possibility to change the path of the sentencepiece files after exporting the learner?
When I’m exporting my learner with learn.export() and loading it on the another machine with load_learner(), there is an error when I’m trying to use learn.predict() that it can’t find these sentencepiece files because I used absolute path to them when I loaded them to databunch while learning.

Best regards

@kontrabas we had the same problem when we tried to use them in production. If you convert it to a relative path when training it will work as long as you put the spm.model and spm.vocab in the same relative path. If you run into issues let me know, I can try to walk you through all the steps that I did to get it to work.

Thank you for your answer. I’ve loaded the model on the data bunch with relative path and then again used export to get exported file. It works now, but I think that it could be possible to change that path after loading the model.

I don’t know enough to tell you exactly why, but it seems like the path is hard coded in the model object. The justification I have for why this is the case is that any sentencepiece model need to know where to find the tokenization files(spm.model and spm.vocab), unlike spacy tokenized models. When you use learn.load (as far as I know) it dosen’t use a dest variable, like it does when you create the learner. Now that I am thinking about it you might be able to modify the load learner and use regex to change it, I remember that Jeromy talked about how to look into model objects, but I don’t know which lesson that it was in.

When I ran into this issue and got it to work by using a relative path, my senior dev told me that there is lots of issues with using absolute paths and that he never uses them.

Have you looked into how it is done in fastai V2? Perhaps there has been a change.

@kontrabas You can update the paths after loading the learner:

def _fix_sp_processor(learner: Learner, sp_path: Path, sp_model: str, sp_vocab: str) -> None:
    Fixes SentencePiece paths serialized into the model.
        Learner object
        path to the directory containing the SentencePiece model and vocabulary files.
        SentencePiece model filename.
        SentencePiece vocabulary filename.
    for processor in
        if isinstance(processor, SPProcessor):
            processor.sp_model = sp_path / sp_model
            processor.sp_vocab = sp_path / sp_vocab

That’s exactly what I was looking for! Thank you for you answer.

Best regards