Multiple Target Regression with Fast AI v2 Tabular?

bilalUWE · May 15, 2020, 9:57am

Hello there,

Does Fast AI v2 support multiple targets in the problem specification?

Has anyone ever tried it? Any examples or guidance to share?

I will appreciate your response.

Cheers

poppingtonic · May 15, 2020, 12:06pm

One way to do this is multi-task learning, which fastai can support with a custom dataloader and custom multi-head architecture. Here’s a face recognition example I learned from: https://gist.github.com/yang-zhang/ec071ae4775c2125595fd80f40efb0d6#file-multi-face-ipynb

bilalUWE · May 15, 2020, 12:30pm

Hi Brian,

Thanks for the response. I really appreciate your input.

The example you shared is combining classification (for gender and race) and regression (for age) prediction. What I am looking for is to have multiple targets in regression which in my use case are the pollutants (such as CO, Co2, NO, No2, PM25, PM10) prediction. The fast AI v2 tabular learner has the parameter to specific target through y_names which by the name means can take an array of target variables. I tried to pass it an array of target attributes but it didn’t work.

muellerzr · May 15, 2020, 12:32pm

If they’re all regression it should work just fine out of the box. The issue comes into play merging regression with classification. Make sure you pass in RegressionBlock() for your y_block parameter

bilalUWE · May 15, 2020, 1:00pm

Hi Zachary,

They are all regression tasks. But what shall I do with the y_range? Do I have to pass an array containing range for all targets or use a single y_range that is wide enough to hold the cover of all targets?

Besides, when I run it with multiple target, the lr_find() and fit_one_cycle() return the following error:

This is how I specified multiple targets:

dep_var = [‘NO2’, ‘OZONE’]

And then used in data loader creation:

dls = TabularDataLoaders.from_df(df, path,
procs=[Categorify, FillMissing, Normalize],
cat_names=cat_names,
cont_names=cont_names,
y_names=dep_var,
y_block=RegressionBlock(),
valid_idx=list(range(15000,20208)), bs=64)

Any idea what might be going wrong?

muellerzr · May 15, 2020, 1:07pm

Try specifying c=2 inside of your call to RegressionBlock. (Working off of memory, if that still doesn’t work I’ll look into it later today)

bilalUWE · May 15, 2020, 1:59pm

Thanks for the suggestion.

The library doesn’t recognize the c=2 argument. There is another that I guessed but that too didn’t work. See the code below:

y_block=RegressionBlock(c_out=2),

vaibhavgupta · March 31, 2021, 9:28am

Hello @bilalUWE,
Did you find an answer for y_range parameter? Does it work for multiple columns?

bilalUWE · March 31, 2021, 11:11am

@viabavgupta, no I couldn’t resolve the issue.

akashpalrecha · November 11, 2021, 7:49am

I realise that this is pretty late in the discussion: I faced the same issue yesterday and was able to find a simple solution for that. Here’s the source code for the sigmoid_range function that does the required scaling:

def sigmoid_range(x, low, high):
    "Sigmoid function with range `(low, high)`"
    return torch.sigmoid(x) * (high - low) + low

Looking at the source code above gave me a direct hint on what the solution would be like in this case:

Let’s say you have 3 required outputs for the regression problem, and you have the minimums and maximums of those variables like so:

mins = [min_1, min_2, min_3]
maxs = [max_1, max_2, max_3]

You can then pass these to the y_range parameter like so and it’ll work:

# make sure the ranges are on the correct device. 
# And unlike the normal case, you need to convert the ranges to tensors instead of passing plain python scalars for this to work.
y_range = (torch.tensor(mins).cuda(), torch.tensor(maxs).cuda())

In my case, I’ve verified that this works correctly by making sure my metrics / losses / etc. make sense.

Hope this helps!