A walk with fastai2 - Tabular - Study Group and Online Lectures Megathread

muellerzr · May 12, 2020, 3:38pm

Correct. V1 did, v2 doesn’t

vrodriguezf · May 13, 2020, 3:08pm

Hi again!

Is there an equivalent of plot_top_losses for a tabular learner?

Thanks!

muellerzr · May 13, 2020, 3:19pm

Should work out of the box on any Interpretation object (ClassificationInterpretation or otherwise)

vrodriguezf · May 13, 2020, 4:01pm

strange, I manage to get plot_confusion_matrix working, but not plot_top_losses(k=10). It does not raise any error, it just doesn’t show up anything

muellerzr · May 13, 2020, 4:05pm

Hmmm. I’ll take a look at it on my end shortly

vrodriguezf · May 13, 2020, 6:26pm

In the meanwhile I am using the show method from TabularPandas, but instead of using the whole validation dataset, I manually subset it with the indices given by Interpretation.top_losses:

def show_top_losses(tab_learner, k=None, largest=True):
  interp = Interpretation.from_learner(tab_learner)
  top_losses = interp.top_losses(k, largest)
  to_top_losses = tab_learner.dls.valid.dataset.iloc[top_losses.indices]
  to_top_losses.show()
  return to_top_losses

EDIT: This workaround just shows the rows associated to the top losses, but not the losses neither the predictions

muellerzr · May 13, 2020, 6:48pm

@vrodriguezf the issue comes with _pre_show_batch. It’s not returning the y's or the outs as it should be. I’ll file a bug report (As @sgugger may not be able to get to this for a bit). In the meantime nice workaround!

vrodriguezf · May 13, 2020, 7:12pm

Thanks a lot @muellerzr!!! I look forward to seeing the fix.

muellerzr · May 14, 2020, 1:24pm

@vrodriguezf Re: GitHub issue: currently not implemented, Sylvain will look at it when he has time (and isn’t doing the edits )

vrodriguezf · May 14, 2020, 1:36pm

Good to know! thank you for raising the issue

vrodriguezf · May 14, 2020, 2:12pm

Basic question about intepretability: Can fastshap be used for a regression learner? Should I use the premutation importance class instead?

muellerzr · May 14, 2020, 2:17pm

You should be able to use it for regression IIRC (it was a collaboration project and I believe we tackled it). Otherwise, yes permutation importance would be good to use too always I can’t recall if I had it set up specifically for classification or not, but it should be straightforward to adjust for regression too (just adjust what metric/loss function it’s using to generate it’s differences and possibly change how it uses them. IE MSE should favor a smaller number vs hinder such as accuracy)

vrodriguezf · May 14, 2020, 2:21pm

Out of the box, calling ShapInterpretation with a regression learner raises an error due to the attribute vocab is not there :S

muellerzr · May 14, 2020, 2:21pm

I’ll look at it when I have time, thanks!

vrodriguezf · May 14, 2020, 2:23pm

Thank you, really appreciate your work!!

vrodriguezf · May 15, 2020, 6:22pm

I just checked that setting learn.dl.vocab = learn.dl.y_names solves the issue. I now have created some SHAP plots for regression successfully.

muellerzr · May 15, 2020, 6:23pm

Awesome! Glad we got it figured out! Great work!

vrodriguezf · May 17, 2020, 4:25pm

Does anybody know how to choose the encoding of the categories in a Categoryblock? I want to ensure that the code 1 is used for the positive class of my problem.

Thanks!

muellerzr · May 17, 2020, 4:30pm

@vrodriguezf You should be able to pass in a list of classes to use, aka a vocab:

github.com

fastai/fastai2/blob/master/fastai2/data/block.py#L22


# Cell
class TransformBlock():
    "A basic wrapper that links defaults transforms for the data block API"
    def __init__(self, type_tfms=None, item_tfms=None, batch_tfms=None, dl_type=None, dls_kwargs=None):
        self.type_tfms  =            L(type_tfms)
        self.item_tfms  = ToTensor + L(item_tfms)
        self.batch_tfms =            L(batch_tfms)
        self.dl_type,self.dls_kwargs = dl_type,({} if dls_kwargs is None else dls_kwargs)


# Cell
def CategoryBlock(vocab=None, add_na=False):
    "`TransformBlock` for single-label categorical targets"
    return TransformBlock(type_tfms=Categorize(vocab=vocab, add_na=add_na))


# Cell
def MultiCategoryBlock(encoded=False, vocab=None, add_na=False):
    "`TransformBlock` for multi-label categorical targets"
    tfm = EncodedMultiCategorize(vocab=vocab) if encoded else [MultiCategorize(vocab=vocab, add_na=add_na), OneHotEncode]
    return TransformBlock(type_tfms=tfm)


# Cell

vrodriguezf · May 17, 2020, 4:35pm

Thanks for the prompt reply! Should it be a named list or sth like that? If I just change the default order of the classes of the Adults dataset:

dls = TabularDataLoaders.from_df(df=df, 
                           procs=[FillMissing(add_col=False), Normalize], 
                           cont_names=cont_names, 
                           y_names=dep_var, 
                           y_block=CategoryBlock(vocab=['>=50k', '<50k']), 
                           valid_idx=splits[1],
                           bs=1024)

eventually what I get in dls.vocab is again ['<50k','>=50k']. The encoding does not change.