I trained a simple language model for musical chords (that’s why the sequences look weird )
This is the simple code I’m running (df is a pandas with the input documents, my_vocabulary is the vocabulary that I pre-calculated):
dls = TextDataLoaders.from_df(df, text_col='text', seq_len=seq_length, is_lm=True, bs=1024, text_vocab=my_vocabulary) learn = language_model_learner(dls, AWD_LSTM, metrics=[accuracy, Perplexity(), top_k_accuracy]) learn.fine_tune(50, freeze_epochs=2)
Then when I try to run it like this:
learn.predict('a|m e|m a|m e|m a|m g| em| a|m', n_words=1, no_unk=False)
I get this output:
'c| a|m e|m a|m e|m a|m g| c| a|m e|m'
which adds a character at the beginning as well as at the end. Why? Is this a bug or am I missing something?