NLP - Best approach(es) for anonymization / pseudonymization of personal data?

Alexandre_DIEUL · February 24, 2020, 8:28pm

My first naive hypothesis would be : why not stripping the vocab from the TextClasDataBunch after the tokenization and the numericalization but then, could I be able to fine tune the langage model without the vocab ?

What’s your opinion on that ?