Feature Request: Formatted text generation in LanguageLearner predict

Hello all. In working with the LanguageLearner.predict method to generate text from a trained language model, I’d like to be able to generate formatted text instead of sequences of raw tokens (e.g. to display that text to an end user in some way eventually). That is, I’d like to de-tokenize the generated tokens in some sense. Here’s an example:

Current: xxmaj the quick brown fox jumps over the xxup lazy dog xxrep 3 .
Want: The quick brown fox jumps over the LAZY dog…

While I can certainly just implement a wrapper over the predict method to do what I want (which is currently what I’m doing), I think myself and others might find it useful for the predict method itself to be able to do this. Perhaps this can be a flag to pass in (e.g. formatted=True)? I can fairly easily implement this myself, I think, if nobody wants to take it up. Thoughts?

You must not have the latest fast.ai as it is already implemented :wink:
It’s just undoing the special tokens, but you can specify another function than the default in the decode_func argument.

1 Like

Hmm. I was doing this just last week with what should’ve been the latest version. I’ll take a look. Thanks.