Multicategory list to one_hot array? [solved]

I’m working on a multi-class labeling task and am doing batch prediction on held out test data for which I have labels, along these lines:

inflearn = load_learner(test=testdata.valid_ds.x)
preds,y = inflearn.get_preds(ds_type=DatasetType.Test)

Unfortunately, y is not an array of one-hot encoded categories of the same shape as preds, which would be easier to work with.

The corresponding mult-category labels in testdata.valid_ds.y are a MultiCategoryList, which doesn’t seem to have a method for converting its items into an array that can serve as truth to compare against the preds.

I don’t see this common use case addressed in the docs. I’m sure I can cobble something together, but wonder if there’s a better, preferably built-in approach, or if anyone has advice on how to proceed.


1 Like

There’s undoubtedly cleaner numpy code for this, but here’s a solution, where mcl_y is a MultiCategoryList target variable y:

def convert_mcl_y_to_onehot(mcl_y):
    res = np.zeros((len(mcl_y.items), mcl_y.c))
    for rowi, idx_list in enumerate(mcl_y.items):
        for classidx in idx_list: res[rowi, classidx] = 1
    return res

I too am working on a Multiclass problem. My labels are of the form
array([0., 0., 0., 0., 0., 0.], dtype=float32) . When I create an ImageList, the y value is truncated to the first two values of the label array.
How can we prevent this from happening?