Interpreting text models

Suppose I create a model for sentiment analysis in text classification, is there a way to interpret the model in a way to see which words or combination of words it is looking at when deciding the class?

If we take the example of IMDB if the model predicts a review to be positive, is there a way to see that it’s looking at words like “amazing” or “one of the best movies”?

Dipam provides this type of model interpretation


Thanks, this looks nice. However, I want to know the theory behind it and try to implement it myself. Is it based on any paper ?