Posting this here so others can find it if they run into this issue:
If you pull the latest dev library and want to run ClassificationInterpretation.from_learner() on another dataset, you must first change your original databunch objects valid_dl or test_dl (whichever you want, a labeled or unlabled dataset) with your new one. Else you will get an error: fix_ds
Though actually I am having issues with this. I have one dataset where the error through learn.predict() is 1.58%, but through the above method it is 91.7%. This specifically happens when there is over two different classes, mine has 15. @sgugger do you have a suspicion as to what could be going on here? When I do it on the titanic or the ADULTs dataset it works perfectly fine.
If anyone has a dataset they know the outcomes of and could verify they see the case here too when there is over 2 variables, I would appreciate it
Would this be shown in the order of .unique() in the train and test sets?
Such as:
train[var].unique() vs test[var].unique() being in different orders?
Edit: ah I see what you mean now I believe. There were only 13 different outcomes in my test set vs my training set. Is there a way to override that mapping?
Oh you’re not applying the same preprocessing. You should pass processor=data.processor in the second call to make sure they’re treated the same way (otherwise you don’t map categorical variables to the same indices, and don’t normalize continuous variables the same way).