Creating a language model on new data

realdiganta · August 2, 2019, 7:02pm

So I had this book which I turned into a .txt file. Now I want to build a language model using transfer learning, which would complete sentences as given in the book. But I am having a problem turning the data into fastai databunch. So in the example in the course, the IMDB reviews were in a csv file where each row in a column represented a review. But I have only a .txt file of the whole book, how do I turn this into a fastai databunch which i can then insert into a learner? Can anyone help me please?

celiberate · September 10, 2019, 3:16am

You can use spacy tokenizer or preprocessing to read each sentences and create seperate rows for each sentence (or para)