A walk with fastai2 - Tabular - Study Group and Online Lectures Megathread

Correct. V1 did, v2 doesn’t

Hi again!

Is there an equivalent of plot_top_losses for a tabular learner?

Thanks!

Should work out of the box on any Interpretation object :slight_smile: (ClassificationInterpretation or otherwise)

strange, I manage to get plot_confusion_matrix working, but not plot_top_losses(k=10). It does not raise any error, it just doesn’t show up anything

Hmmm. I’ll take a look at it on my end shortly :slight_smile:

In the meanwhile I am using the show method from TabularPandas, but instead of using the whole validation dataset, I manually subset it with the indices given by Interpretation.top_losses:

def show_top_losses(tab_learner, k=None, largest=True):
  interp = Interpretation.from_learner(tab_learner)
  top_losses = interp.top_losses(k, largest)
  to_top_losses = tab_learner.dls.valid.dataset.iloc[top_losses.indices]
  to_top_losses.show()
  return to_top_losses

EDIT: This workaround just shows the rows associated to the top losses, but not the losses neither the predictions

@vrodriguezf the issue comes with _pre_show_batch. It’s not returning the y's or the outs as it should be. I’ll file a bug report :slight_smile: (As @sgugger may not be able to get to this for a bit). In the meantime nice workaround!

Thanks a lot @muellerzr!!! I look forward to seeing the fix.

1 Like

@vrodriguezf Re: GitHub issue: currently not implemented, Sylvain will look at it when he has time (and isn’t doing the edits :wink: )

Good to know! thank you for raising the issue :wink:

Basic question about intepretability: Can fastshap be used for a regression learner? Should I use the premutation importance class instead?

You should be able to use it for regression IIRC (it was a collaboration project and I believe we tackled it). Otherwise, yes permutation importance would be good to use too always :slight_smile: I can’t recall if I had it set up specifically for classification or not, but it should be straightforward to adjust for regression too (just adjust what metric/loss function it’s using to generate it’s differences and possibly change how it uses them. IE MSE should favor a smaller number vs hinder such as accuracy)

Out of the box, calling ShapInterpretation with a regression learner raises an error due to the attribute vocab is not there :S

1 Like

I’ll look at it when I have time, thanks! :slight_smile:

Thank you, really appreciate your work!!

I just checked that setting learn.dl.vocab = learn.dl.y_names solves the issue. I now have created some SHAP plots for regression successfully.

1 Like

Awesome! Glad we got it figured out! Great work!

Does anybody know how to choose the encoding of the categories in a Categoryblock? I want to ensure that the code 1 is used for the positive class of my problem.

Thanks!

@vrodriguezf You should be able to pass in a list of classes to use, aka a vocab:

Thanks for the prompt reply! Should it be a named list or sth like that? If I just change the default order of the classes of the Adults dataset:

dls = TabularDataLoaders.from_df(df=df, 
                           procs=[FillMissing(add_col=False), Normalize], 
                           cont_names=cont_names, 
                           y_names=dep_var, 
                           y_block=CategoryBlock(vocab=['>=50k', '<50k']), 
                           valid_idx=splits[1],
                           bs=1024)

eventually what I get in dls.vocab is again ['<50k','>=50k']. The encoding does not change.