Bug in predict for language_model_learner?

giacomov · May 6, 2020, 2:33am

I trained a simple language model for musical chords (that’s why the sequences look weird )

This is the simple code I’m running (df is a pandas with the input documents, my_vocabulary is the vocabulary that I pre-calculated):

dls = TextDataLoaders.from_df(df, 
                              text_col='text', 
                              seq_len=seq_length, 
                              is_lm=True,
                              bs=1024,
                              text_vocab=my_vocabulary)

learn = language_model_learner(dls, AWD_LSTM, metrics=[accuracy, Perplexity(), top_k_accuracy])

learn.fine_tune(50, freeze_epochs=2)

Then when I try to run it like this:

learn.predict('a|m e|m a|m e|m a|m g| em| a|m', n_words=1, no_unk=False)

I get this output:

'c| a|m e|m a|m e|m a|m g| c| a|m e|m'

which adds a character at the beginning as well as at the end. Why? Is this a bug or am I missing something?