I can’t seem to get the datablock api to work with this setup using .label_from_df(cols=tar_cols). Is it currently possible or do I have to use another approach?
Since this is multilabel (with always three labels from what you show), you should just your data frame to have all your targets in one column, with the different labels separated by a space (or any delimiter you like). This way, you can use the data block API with label_delim='...' in your call.
But I don’t think this is really a multi-label problem since he isn’t trying to predict just 1 or 0 for each column … it looks more like he’s trying to solve multiple multi-classification problems at the same time. I’m not sure if fastai handles that scenario out-of-the-box.
One idea, is to turn this into a regression problem which I think would work.
I’m trying to model the top 3 positions of a jai alai match. The classes are for the player/team that gets in 1st/2nd/3rd position. In jai alai, the positions that place (top 3) have some interdependencies based on how the games play out since they play in an order.
I’m testing a few different approaches with the targets:
the combination of top 3 positions as a single class (order doesn’t matter), output of bs*56 (# of class combinations)
binary indicators for each position placing in top 3, output of bs*8 (# of teams) with BCEWithLogitsLoss
class foreach of the top 3 in each position, output of bs38 (# of top spots * # of positions)
I got the first two so far, but haven’t been able to do the 3rd with the data block yet. I’m not familiar enough with the api to figure it out at the moment, but can do with a can do manual data class/bunch.
I don’t think doing a regression problem makes sense since its the team numbers 1-8. I’ll try it though since the network might just be able to figure it out if I give it enough layers.
There is no ItemList type in fastai that matches your need so you will need to write your own. Note that you’ll also need to adapt your model to return 3 by 8 probs and write a custom loss function.
It’s been awhile since I looked at this code, but I tried a few ways to model the target variable, but never like the 3rd bullet I had commented on above. Here’s what I had:
# individual tri box cat
# tar_cols = ['ID_exa_box']
tar_cols = ['ID_tri_box']
out_sz = len(win_cats['tri'])
# each position in box
# tar_cols = ['tri_box']
# tar_cols = ['ID_exa_box_0','ID_exa_box_1']
# tar_cols = ['ID_tri_box_0','ID_tri_box_1','ID_tri_box_2']
# out_sz = 3 * len(win_cats['pos'])
# each position indicator
# tar_cols = [f'exa_box_{i}_ind' for i in range(1,9)]
# tar_cols = [f'tri_box_{i}_ind' for i in range(1,9)]
# out_sz = len(win_cats['pos'])
I was testing 3 target variable approaches, on exacta box (top 2 spots any order) and trifecta box (top 3 spots any order). I created the columns for all approaches in the dataframe, then was running the code, testing each block of code.
The first block of code was an ID variable for the combination (e.g. 123=0, 124=1, etc.).
The second block was setting the the ID of the lowest numbered team in the box for each position (e.g. 124=0 1 3.)
The last chunk, I set a binary flag for each team that is 1 if they placed in the box.
With this approach I was able to use the datablock api:
I have a tabular dataset. Training inputs are mostly continuous, but I have some categorical and integer inputs too. Outputs are multiple classes, so each row could have zero, one or many classes.
I’ve managed to convert the outputs to a single delimited column, but I can’t figure out how to use label_delim for tabular data.
How can I use the data block API for a tabular dataset with multiclass outputs? Also, is there a way to make sure that a test dataset is normalized and categorified consistently with the training set?