1 word skipped? language model learner show.results()

In fastai-master/docs/tutorial.data.html, learn.show_results() returns a table with 3 columns (text, target, pred). There seems to be a word consistently skipped between the end of text and the beginning of target?
row #1:
original text reads, ...his friends or that language... but that seems missing?
row #2:
original text reads,...greedy jerks, and the women merely... but and seems missing?
row #4:
original text reads, ...in an epic in the most... but epic seems missing?
row #5:
original text reads, ...save the fact you just wasted your life... but wasted seems missing?

1 Like

Just found one more thing I can’t understand. When text is at the end of a sample, learn.show_results() seems show as target text from the next sample. Is this by design? I mean, samples are not isolated/indepedent from each other?

Both questions (this and the earlier post) are reproducible in this notebook:

Looks like you found a bug.

I did some digging and it looks like this is because show_results pulls the x and y from one_batch and shows elements [:max_len] for x and [max_len:2*max_len] for y and z. But this isn’t quite right because one_batch pulls from LanguageModelPreLoader.__getitem__ which is offset by one word between x and y.

So this should really be [max_len-1:2*max_len-1] for y and z. Which, if you update that in the fastai source, fixes the issue with show_results being offset by one:

I submitted a pull request that should fix show_results.

1 Like

Thanks, Brad. That fixes the problem with the missing word.

My second issue, when text happens to be at the end of a sample, and the start of the next sample gets returned as target, still needs to be checked?

I think that’s as intended. It’s split by a xxbos token.

Ok, thanks!