The purpose of xxrep

Hi everyone.

From fastai’s docs:

TK_REP (xxrep) is used to indicate the next character is repeated n times in the original text (usage xxrep n {char})

Could someone explain the purpose of this special token? I haven’t found any explanation on why this token is used, no topics on why it’s better to use xxrep, etc. Let’s say we have a text generation problem. My guess xxrep is used to restrict the model, in that way it won’t generate too many repetitions such as “aaaaaa”. In other words, it memorizes less character repetitions during training.

Any answer would be really helpful. Thank you.

Hi Rob

It is to conolidate people who write ??? with people who write ?.

Regards Conwyn

I don’t think so. In that way, it will be applied to every pair: “zzz” vs “z”, “999” vs “9”, “shhh” vs “sh”, etc.

Hi Rob
Please see page 334 final paragraph of the book.
Regards Conwyn

1 Like

Thank you, now it’s more clear.

However, I have one more question. How does it affect xxwrep? I mean repeated words are the same anyway and do not affect the embedding matrix, am I wrong?

Hi Rob.

Jim had had, had had. Had had, had had the master’s approval.
So would you really want that to be Jim (1), had(xxwrep 2),had (xxrep 2). Had(xxrep 2), had (xxrep 2) assuming puncuation is a valid token.

Regards Conwyn

Didn’t get it here.

First of all, it converts to xxwrep only if there are 3 or more words in a row. But it doesn’t really matter for now, let’s say it can do the same when there are 2 words.

Your example “Jim had had, had had. Had had, had had the master’s approval” will be (xxmaj is omitted):

  1. [‘xxbos’, ‘jim’, ‘had’, ‘had’, ‘,’, ‘had’, ‘had’, ‘.’, ‘had’, ‘had’, ‘,’, ‘had’, ‘had’, ‘the’, ‘master’, ‘’s’, ‘approval’] when xxwrep is not allowed
  2. [‘xxbos’, ‘jim’, ‘xxwrep’, ‘2’, ‘had’, ‘,’, ‘xxwrep’, ‘2’, ‘had’, ‘.’, ‘xxwrep’, ‘2’, ‘had’, ‘,’, ‘xxwrep’, ‘2’, ‘had’, ‘the’, ‘master’, ‘’s’, ‘approval’] when xxwrep is allowed

I don’t see how it could really help here.

Still unresolved regarding xxwrep. Please anyone :frowning: