Error Getting Processed Dataframe back from TabularList

whamp · December 1, 2018, 12:14am

oh ok, so I see that happens later on in TabularProcessor.process here:

if len(ds.cat_names) != 0:
        ds.codes = np.stack([c.cat.codes.values for n,c in ds.xtra[ds.cat_names].items()], 1).astype(np.int64) + 1

I guess that answers the original question then that data.train_ds.x.xtra isn’t finished processing yet.

My whole goal is to be able to access the same data that the NN is using so that I can perform model driven EDA using algorithms other than NN. To facilitate that on larger datasets it’d be nice to not have to duplicate the data in order to make final processing changes. I figured there must be a way to access the final data since this was a feature highlighted in this thread.