TabularLine back to pandas dataframe

Hi all,

I’m working through trying to get top_losses to make sense on tabular data. I can get the index’s of the most lost values in the validation dataset, but I want to try to then make another dataframe of those lost values. Is there a way to convert a TabularLine back to a dataframe? Or to just go find those values in the original dataframe itself? I hate to @sgugger here but I feel like you would know?

Specifically, when I do interp.top_losses(5, True), the output is as such:

torch.return_types.topk(values=tensor([3.3187, 2.9598, 2.9355, 2.6213, 2.2764]), indices=tensor([166, 195, 26, 42, 24])) with indices correlating to the valid_ds’s idx’s.

Thanks

Managed to get this working. If anyone wants the code let me know

Hi Zach, I am interested to see, I had worked on something in v0.7 and it was quite a nightmare to do.

Sure! I was mostly wanting it for my own custom implementation of plot_top_losses() for tabular data, and here’s what I came up with:

def plot_top_losses(self, k, largest = True, return_table:bool=False):
        "Shows the respective rows in top_losses along with their prediction, actual, loss, and probability"
        tl_val, tl_idx = self.top_losses(k)
        classes = self.data.classes
        cat_names = self.data.x.cat_names
        cont_names = self.data.x.cont_names
        df = pd.DataFrame(columns = [['Prediction', 'Actual', 'Loss','Prob'] +  cat_names + cont_names])
        for i, idx in enumerate(tl_idx):
          da, cl = self.data.dl().dataset[idx]
          cl = int(cl)
          t1 = str(da) <-
          t1 = t1.split(';') <-
          arr = []
          arr.append(classes[self.pred_class[idx]])
          arr.append(classes[cl])
          arr.append(f'{self.losses[idx]:.2f}')
          arr.append(f'{self.probs[idx][cl]:.2f}')
          for x in range(len(t1)-1): <-
            _, value = t1[x].rsplit(' ',1) <-
            arr.append(value) <-
          df.loc[i] = arr <-
        display(df)
        if return_table: return df

Focus specifically on when tl and cl are being used. Let me know if you have questions! :slight_smile:

3 Likes

@jeremyeast sorry there was a slight bug. Had to fix the delimiter for getting the value. It’s fixed now.

1 Like

You could make a PR with this function being monkey-patched to ClassificationInterpretation in one of the tabular module (like it’s done in vision).

Thanks @sgugger! I’ll look into this now and let you know if I run into issues :slight_smile:

Posted the PR :slight_smile: