Text Transform | Tests

I’ve noticed:

  1. missing tests for text transform.py and would like to contribute.
  2. replace_all_caps and deal_caps work not as expected – return a list, not a string.

Please advise if I understand the desired behavior correctly:
replace_all_caps('All CAPS WORDS to Replace') == 'All xxup caps xxup words to Replace'
and
deal_caps('Replace Capitals in Begining of WORDS Only') == 'xxmaj replace xxmaj capitals in xxmaj begining of WORDS xxmaj only'

If so, I will implement.

1 Like

That’s great!

There is a how to thread here:Improving/Expanding Functional Tests

@stas

@sgugger,
Since you’ve added deal_caps in this commit and replace_all_caps in this commit, you may clarify me the expected functionality, namely:

  • shall these functions accept and return Collection[str] (I doubt), or rather a str ?
  • shall replace_all_caps replace only WHOLLY CAPPED words, leaving These Unchanged?
  • shall deal_caps replace capitals in Begining of Words Only, leaving THESE UNCHANGED?

Both must insert corresponding tags (‘xxmaj’,‘xxup’) – this is clear.

I am willing to give you a hand with his.
Thanks.

They do take a Collection[str] since they are applied after the tokenization is done (post rules vs pre rules) so they get a list of tokens.
You have the behavior correct, additionally one of them is going to lower case everything (the second one applied but I can’t remember which one it is).

1 Like

First patch merged.