Changing the path of the sentencepiece in exported learner

kontrabas · May 28, 2020, 3:18pm

Hello,

I’d like to ask, is there possibility to change the path of the sentencepiece files after exporting the learner?
When I’m exporting my learner with learn.export() and loading it on the another machine with load_learner(), there is an error when I’m trying to use learn.predict() that it can’t find these sentencepiece files because I used absolute path to them when I loaded them to databunch while learning.

Best regards

Daniel.R.Armstrong · June 2, 2020, 11:05am

@kontrabas we had the same problem when we tried to use them in production. If you convert it to a relative path when training it will work as long as you put the spm.model and spm.vocab in the same relative path. If you run into issues let me know, I can try to walk you through all the steps that I did to get it to work.

kontrabas · June 5, 2020, 2:04pm

Thank you for your answer. I’ve loaded the model on the data bunch with relative path and then again used export to get exported file. It works now, but I think that it could be possible to change that path after loading the model.

Daniel.R.Armstrong · June 6, 2020, 11:13am

I don’t know enough to tell you exactly why, but it seems like the path is hard coded in the model object. The justification I have for why this is the case is that any sentencepiece model need to know where to find the tokenization files(spm.model and spm.vocab), unlike spacy tokenized models. When you use learn.load (as far as I know) it dosen’t use a dest variable, like it does when you create the learner. Now that I am thinking about it you might be able to modify the load learner and use regex to change it, I remember that Jeromy talked about how to look into model objects, but I don’t know which lesson that it was in.

When I ran into this issue and got it to work by using a relative path, my senior dev told me that there is lots of issues with using absolute paths and that he never uses them.

Have you looked into how it is done in fastai V2? Perhaps there has been a change.

mkardas · June 6, 2020, 5:14pm

@kontrabas You can update the paths after loading the learner:

def _fix_sp_processor(learner: Learner, sp_path: Path, sp_model: str, sp_vocab: str) -> None:
    """
    Fixes SentencePiece paths serialized into the model.
    Parameters
    ----------
    learner
        Learner object
    sp_path
        path to the directory containing the SentencePiece model and vocabulary files.
    sp_model
        SentencePiece model filename.
    sp_vocab
        SentencePiece vocabulary filename.
    """
    for processor in learner.data.processor:
        if isinstance(processor, SPProcessor):
            processor.sp_model = sp_path / sp_model
            processor.sp_vocab = sp_path / sp_vocab

kontrabas · June 8, 2020, 7:43am

That’s exactly what I was looking for! Thank you for you answer.

Best regards