I’d like to ask, is there possibility to change the path of the sentencepiece files after exporting the learner?
When I’m exporting my learner with
learn.export() and loading it on the another machine with
load_learner(), there is an error when I’m trying to use
learn.predict() that it can’t find these sentencepiece files because I used absolute path to them when I loaded them to databunch while learning.
@kontrabas we had the same problem when we tried to use them in production. If you convert it to a relative path when training it will work as long as you put the spm.model and spm.vocab in the same relative path. If you run into issues let me know, I can try to walk you through all the steps that I did to get it to work.
Thank you for your answer. I’ve loaded the model on the data bunch with relative path and then again used export to get exported file. It works now, but I think that it could be possible to change that path after loading the model.
I don’t know enough to tell you exactly why, but it seems like the path is hard coded in the model object. The justification I have for why this is the case is that any sentencepiece model need to know where to find the tokenization files(spm.model and spm.vocab), unlike spacy tokenized models. When you use learn.load (as far as I know) it dosen’t use a dest variable, like it does when you create the learner. Now that I am thinking about it you might be able to modify the load learner and use regex to change it, I remember that Jeromy talked about how to look into model objects, but I don’t know which lesson that it was in.
When I ran into this issue and got it to work by using a relative path, my senior dev told me that there is lots of issues with using absolute paths and that he never uses them.
Have you looked into how it is done in fastai V2? Perhaps there has been a change.
@kontrabas You can update the paths after loading the learner:
def _fix_sp_processor(learner: Learner, sp_path: Path, sp_model: str, sp_vocab: str) -> None:
Fixes SentencePiece paths serialized into the model.
path to the directory containing the SentencePiece model and vocabulary files.
SentencePiece model filename.
SentencePiece vocabulary filename.
for processor in learner.data.processor:
if isinstance(processor, SPProcessor):
processor.sp_model = sp_path / sp_model
processor.sp_vocab = sp_path / sp_vocab
That’s exactly what I was looking for! Thank you for you answer.