Interpreting text models

Suppose I create a model for sentiment analysis in text classification, is there a way to interpret the model in a way to see which words or combination of words it is looking at when deciding the class?

If we take the example of IMDB if the model predicts a review to be positive, is there a way to see that it’s looking at words like “amazing” or “one of the best movies”?

Thanks,
Dipam

captum.ai provides this type of model interpretation https://captum.ai/tutorials/

2 Likes

Thanks, this looks nice. However, I want to know the theory behind it and try to implement it myself. Is it based on any paper ?