Interpreting text models

dipam7 · May 28, 2020, 8:36pm

Suppose I create a model for sentiment analysis in text classification, is there a way to interpret the model in a way to see which words or combination of words it is looking at when deciding the class?

If we take the example of IMDB if the model predicts a review to be positive, is there a way to see that it’s looking at words like “amazing” or “one of the best movies”?

Thanks,
Dipam

msivanes · May 28, 2020, 11:13pm

captum.ai provides this type of model interpretation https://captum.ai/tutorials/

dipam7 · May 29, 2020, 12:56am

Thanks, this looks nice. However, I want to know the theory behind it and try to implement it myself. Is it based on any paper ?