Best way to use Interpretation/ClassificationInterpretation and refer to original data

I’ve come across a particular problem that keeps me from really using the pre-built implementation of interpretations:

I can use the classes on train and valid, but only with data in the way it is in the datasets / dataloaders. I.e. the processed items and labels. However, there are several occasions where this isn’t really enough for me. For example:

  1. When items are excerpts of larger entities. E.g. paragraphs taken from a larger document. Maybe to really judge the quality of labels for dataset cleaning, I need context form the whole document. I acknowledge that it is always questionable practice to train a model on data without all context that even a human would need, but it’d be still good to have the possibility, if only to compare such models to ones that work with all necessary context.
  2. For readability of texts. Especially if there are lots of out-of-vocab items that get turned into xxunk, but also in general with all the injected tokens (xxmaj et al). If texts are not easy to understand, the extra tokens make it much harder to read. If I could refer to the original item, I could even make use of html formatting to improve readability.
  3. When working with models that do not have all features. E.g. I started with a dataframe with multiple fields and only used one (or a subset, for example with the built-in method so that texts with xxfld get constructed). Maybe during dataset cleaning, I would love to have access to the information I dropped before.

Is there some functionality to achieve this, that I missed so far?

If not, what would be a good approach to do this? So far my solution has been to not use the Interpretation classes at all and instead run the model on some input manually and compute top losses, for example. This works but it seems a bit sad, given how nice the example from the lecture was.

An alterative could be to inject whenever text to identify the original of the item (make sure it doesn’t get turned into xxunk) and then write a processor such that these tokens end up masked. Imho this seems a bit error prone, but if the Interpretation classes get more features in the future (e.g. something else in the direction of the intrinsic attention, maybe like LIME), it could be worth it, because eventually re-implementing interpretation will become tedious as well.

I posted basically the same question a while ago and received no replies (Can I somehow keep a linked copy of the original untokenized input data for the text_classifier_learner).

What I do for now is that every time I train I also do the interpretation right away and match them to each row of the data. Then I save the input texts and interpretations to a database and query that based on the interpretation later to retrieve the original.

It’s a bit of a hack, but it works reasonably well for me as long as there are not too many UNK tokens for short inputs in which case it might match to the wrong input.