Generic Multitask implementation with Fastai 1.0

denisvlr · May 14, 2019, 10:33am

Hello,
I’ve been working on making it as easy as possible to setup multitask models with Fastai. For reference, in multitasking, you use a single NN to solve several problems at a time (eg several classifications, regressions, etc.). The way to do it is to have the NN output a vector containing the concatenated predictions for each sub-task.

here is a work-in-progress notebook:

The goal is to support it natively in Fastai and make it compatible with all tooling (metrics, top losses interpretation, import/export, etc.). Could you guys have a look and give your feedback on the implementation details?

To maintainers: Do you think this functionality could be brought to the core library? If so I will work on a proper PR, otherwise, I’m thinking about publishing it in on its own repo as a sort of plugin.

Besides, I have a few questions regarding the implementation:

Ideally, we would like to support null values in Y vectors. as Andrew Ng explains in his DL course (https://www.youtube.com/watch?v=UdXfsAr4Gjw), we can just ignore them when computing the loss. This turns out to be tricky to implement:
1. fastai throws an error when it sees null values, we need to disable it
2. pytorch’s long() converts nan to big negative values
3. complex Tensor dimensionality handling across all functions (training, get_preds, etc.)
  Do you have any advice or comment regarding this?
To avoid having to manually adjust weights of sub-losses, we would need to normalize float values in regression sub-tasks, so that the RMSE take values in similar ranges as CrossEntropy, ie between 0 and 1. As far as I can tell there’s nothing in Fastai to do it (the existing Normalize processor is specific to Tabular data and only handles x values). Do you have any suggestions?
1. Generalize the existing Normalize module
2. Write a specific module
3. ask users to pre-normalize their data (which is I assume the current advice for folks running regression NN today?), but it requires saving the normalization stats independently which is a bit annoying.
4. Simply run the RMSE sub-losses through a sigmoid (not sure if this would be ok?)
I ended up subclassing and overriding a lot of classes/methods/functions, often just to work-around Fastai code structure: for example use of global functions, some classes creating instances of other classes (eg “LabelLists" creates instances of “LabelList” but we can’t easily tell it to use a custom “LabelList” class). This raises the broader question of Fastai’s extensibility, in some cases I’m not sure if there are reasons for the existing code structure or if we can suggest some refactoring & improvements to make the lib easier to override and extend.

I would really appreciate hearing your thoughts on this!
Thanks

sgugger · May 14, 2019, 1:15pm

Null values aren’t really supported yes, but you could make your targets something neutral (like all zeros) or impossible that the loss function would recognize and return 0.

For the normalization of your labels, I’d use a custom Processor to do it (if that’s what you meant). If you just want them to range from 0 to 1, I’d use a sigmoid layer.

The way the data block API works, it’s normal you have to subclass ItemBase or ItemList for your purpose, but you shouldn’t subclass LabelList: instead you should have another custom ItemList for your labels and pass label_cls=... when you label your items.

The yb[0] problem you got in loss_batch will disappear in v1.1 normally.

denisvlr · May 16, 2019, 8:39am

Thanks for your answer Sylvain,

I updated the notebook with a new version, with null values handling and normalization of float targets through a Processor.

Regarding null values: I managed to directly handle NaN values for regression, but for classification, I had to set up a “NA” class. It’s not ideal since the network creates an extra dimension in the output layer for this class, and we may actually end up predicting “NA”. However, it indeed looks like too many changes would be required.

Regarding normalization: I had to implement an unprocess_one method in my preprocessor called during reconstruct, i’m not sure if there’s a best practice around this.

Regarding subclassing: I didn’t find how to avoid subclassing LabelList, it seems required at least for state handling during import/export in order to add custom state attributes.

sgugger · May 16, 2019, 12:57pm

You can have state in an ItemList, and just have to add the attributes you’re adding to the list self.copy_new during the initialization (so that they’re passed along when splitting).

Now if you want that state to be kept for export/import, you should put it in a PreProcessor. For instance you have a SegementationProcessor in vision.data which only goal is to store the classes. We’ll think more about serialization in v1.1/v1.2 but for now this is the workaround.

caro · February 5, 2020, 2:22pm

hi @denisvlr
very intersting, did you improve your model since your publication
especially a way to use the labls from the data ?

Richard-Wang · June 2, 2020, 2:54pm

Hi all, I created a MultiTaskLearner that don’t need to concat any thing.

You migh be interested in it.