Correct. V1 did, v2 doesn’t
Hi again!
Is there an equivalent of plot_top_losses
for a tabular learner?
Thanks!
Should work out of the box on any Interpretation object (ClassificationInterpretation or otherwise)
strange, I manage to get plot_confusion_matrix
working, but not plot_top_losses(k=10)
. It does not raise any error, it just doesn’t show up anything
Hmmm. I’ll take a look at it on my end shortly
In the meanwhile I am using the show
method from TabularPandas
, but instead of using the whole validation dataset, I manually subset it with the indices given by Interpretation.top_losses
:
def show_top_losses(tab_learner, k=None, largest=True):
interp = Interpretation.from_learner(tab_learner)
top_losses = interp.top_losses(k, largest)
to_top_losses = tab_learner.dls.valid.dataset.iloc[top_losses.indices]
to_top_losses.show()
return to_top_losses
EDIT: This workaround just shows the rows associated to the top losses, but not the losses neither the predictions
@vrodriguezf the issue comes with _pre_show_batch
. It’s not returning the y
's or the outs as it should be. I’ll file a bug report (As @sgugger may not be able to get to this for a bit). In the meantime nice workaround!
@vrodriguezf Re: GitHub issue: currently not implemented, Sylvain will look at it when he has time (and isn’t doing the edits )
Good to know! thank you for raising the issue
Basic question about intepretability: Can fastshap
be used for a regression learner? Should I use the premutation importance class instead?
You should be able to use it for regression IIRC (it was a collaboration project and I believe we tackled it). Otherwise, yes permutation importance would be good to use too always I can’t recall if I had it set up specifically for classification or not, but it should be straightforward to adjust for regression too (just adjust what metric/loss function it’s using to generate it’s differences and possibly change how it uses them. IE MSE should favor a smaller number vs hinder such as accuracy)
Out of the box, calling ShapInterpretation
with a regression learner raises an error due to the attribute vocab
is not there :S
I’ll look at it when I have time, thanks!
Thank you, really appreciate your work!!
I just checked that setting learn.dl.vocab = learn.dl.y_names
solves the issue. I now have created some SHAP plots for regression successfully.
Awesome! Glad we got it figured out! Great work!
Does anybody know how to choose the encoding of the categories in a Categoryblock
? I want to ensure that the code 1
is used for the positive class of my problem.
Thanks!
Thanks for the prompt reply! Should it be a named list or sth like that? If I just change the default order of the classes of the Adults dataset:
dls = TabularDataLoaders.from_df(df=df,
procs=[FillMissing(add_col=False), Normalize],
cont_names=cont_names,
y_names=dep_var,
y_block=CategoryBlock(vocab=['>=50k', '<50k']),
valid_idx=splits[1],
bs=1024)
eventually what I get in dls.vocab
is again ['<50k','>=50k']
. The encoding does not change.