SentencePiece

@rahuluppari, I think that you need to make sure that the wiki downloads are in that location. I am sure that there are lots of ways to do it, but I use a config file to make sure everything is where I expected it to be.

1 Like

@Daniel.R.Armstrong, thanks for your reply.

Even Iā€™m using config file for wiki downloads, still it is giving me the error.
below is the snapshot of the downloads.

Anyother way to solve this error.

@Daniel.R.Armstrong

Even i tried doing the same thing in Google Colab.
It is giving me the same error.

@rahuluppari, If i was you, I would clone the fastai nlp repo and run the vietnamese notebook while you are following along with lesson 10 , to make sure it works. After that do the same with the Turkish notebook. After that you can make changes to fit your needs.

Thanks @Daniel.R.Armstrong,

I have one more small, it might sound silly.

I have no idea from where I can get the fastai nlp repo.

Many thanks @Daniel.R.Armstrong

Could you help me understand one more thing. where should i clone the repository should i clone it through command or should i download zip folder from git hub and store it.

but to which location?

I didnt follow these instructions but they look pretty good.

@rahuluppari if you want a different perspective on the fastai nlp class, this was a study group. https://www.youtube.com/playlist?list=PLILZm3MRkvH_Yf4Ah9pkxgyzo9oi3mEly

Many thanks @Daniel.R.Armstrong.

Even after trying these steps it is still giving the path error.

If i use the same in linux will it make any difference and able to help me execute my code.

Regards,
Rahul Ramchandra Uppari.

@rahuluppari , I had issues with cofig files and path defaults in fast.ai, when I started using it. You need to inspect the path variables, and make sure you look at the each step. Sometimes I break apart a function and look what is happening step by step. I would also delete the wiki folder then try again.

It is complete junk, I didnā€™t know what I was doing, when I did it, but this is the notebook that I used to do it for french on GCP. Keep in mind that I didnā€™t run the code top to bottom.

Hi Daniel

I have trained a sentencepiece tokenizer and have the .vocab and .model files in a tmp folder. If I wanted to use them again How would I pass these into fastai2ā€™s datablock?

Thanks!

I havenā€™t had the time use SentencePiece in v2, sorry!