In fastai-master/docs/tutorial.data.html
, learn.show_results()
returns a table with 3 columns (text
, target
, pred
). There seems to be a word consistently skipped between the end of text
and the beginning of target
?
row #1:
original text reads, ...his friends or that language...
but that
seems missing?
row #2:
original text reads,...greedy jerks, and the women merely...
but and
seems missing?
row #4:
original text reads, ...in an epic in the most...
but epic
seems missing?
row #5:
original text reads, ...save the fact you just wasted your life...
but wasted
seems missing?
Just found one more thing I can’t understand. When text
is at the end of a sample, learn.show_results()
seems show as target
text from the next sample. Is this by design? I mean, samples are not isolated/indepedent from each other?
Both questions (this and the earlier post) are reproducible in this notebook:
Looks like you found a bug.
I did some digging and it looks like this is because show_results
pulls the x and y from one_batch
and shows elements [:max_len]
for x and [max_len:2*max_len]
for y and z. But this isn’t quite right because one_batch
pulls from LanguageModelPreLoader.__getitem__
which is offset by one word between x and y.
So this should really be [max_len-1:2*max_len-1]
for y and z. Which, if you update that in the fastai source, fixes the issue with show_results
being offset by one:
I submitted a pull request that should fix show_results
.
Thanks, Brad. That fixes the problem with the missing word.
My second issue, when text
happens to be at the end of a sample, and the start of the next sample gets returned as target
, still needs to be checked?
I think that’s as intended. It’s split by a xxbos
token.
Ok, thanks!