MultiFiT Inference and SentencePiece Hardcoded tmp path

After creating an Arabic MultiFiT model (based on work by @pierreguillou ) , I tried to use it for inference. I can load the learner from the export.pkl file but when I try to predict, sentencepiece is called to encode the entry from a hardcoded path as “/root/.fastai/data/[wiki-path}/tmp/spm.model”. I can, of course, create this path locally and copy spm.model but not easy to deploy. I tried to pass the text as encoded by sp but did not work
How can I get around this? Here’s the error:

/usr/local/lib/python3.6/dist-packages/sentencepiece.py in Load(self, filename)
    116 
    117     def Load(self, filename):
--> 118         return _sentencepiece.SentencePieceProcessor_Load(self, filename)
    119 
    120     def LoadOrDie(self, filename):

OSError: Not found: "/root/.fastai/data/.../tmp/spm.model": No such file or directory Error #2
1 Like

When loading the learner, does learn.data.path give you anything?

Yes, that’s the current folder, where the exported model is loaded from.

learn = load_learner('/content/','ar_classifier_hard_sp15_multifit.pkl')
learn.data.path

Result (from colab notebook): PosixPath(’/content’)
I guess, at least, part of the hard coded path is in learn somewhere.

The SentencePiece training model is saved in cache_dir (an argument of SentencePieceTokenizer you can set to whatever you like). It’s very likely to save the absolute path when exporting the learner, I can check if we can save it as a relative path, which would probably be easier for deployment.

Thanks Sylvain. That would be great. I recall switching Fastai versions (to 1.0.57) when working on the databunch b/c of sp. For now, I got it to work through docker on Heroku and it seems to work fine. Not a great coder, but here’s what I did:

data_path = Config.data_path()
name = f'arwiki/corpus2_100/tmp/'
path_t = data_path/name
path_t.mkdir(exist_ok=True, parents=True)
shutil.copy('./app/models/spm.model', path_t)

Here’s the ‘polished’ app: Arabic Sentiment Analyzer

2 Likes

First, AbuFadl, congrats to this neat project.

I am having the same problem but I want to use several inference learners at the same time. Moving different spm.model files around for each predict call seems not optimal.

Is there a way to set this path in the inference learner instance?
or
Is there a way to set this path before I export my model after training, that does not affect the export?

I had some success by changing the class SPProcessor2 to this:

class SPProcessor2(SPProcessor):
def process(self, ds):
super().process(ds)
1 Like

I can create a working inference learner if I recreate the folder structure for sps.model and export.pkl.

I could set the correct path in processes with ds.path (cache_dir = ds.path)

However, in the method _encode_batch(), there is an error “unk is not defined”.

module versions for export and load are identical.

This should not depend on the folder structure.