Hello everyone,
I’m trying to deploy ULMFiT models for several languages (English, Spanish and French).
I have exactly used the same workflow for all languages, based on sgugger DeepFrench notebook, using related pre-trained weights.
English model is 138 MB, Spanish 170 MB, and French 93 MB.
However, when it comes to inference time, I have some big difference on the same machine in the same conditions:
- English: around 0.32 s
- Spanish: around 0.17 s
- French: around 10 s !!!
What can explain this difference? How can I improve this?
Bonus question: Was someone able to export the model with the ONNX format ?
Thank you for your help