FMix in fastai2

tank13 · April 11, 2020, 6:42pm

I’ve been working on an implementation of FMix (https://arxiv.org/pdf/2002.12047.pdf) in fastai2. I created a dev notebook (in my local fastai2 it’s 75_callback.fmix.ipynb) that seems to be working – here’s the gist.

I was thinking of submitting a PR to the library, but am not yet confident in the correctness, usefulness, and style, so if anyone has any feedback it would be much appreciated.

I’d also like to add: this was my first time using the nbdev system, and it was AMAZING.

sgugger · April 11, 2020, 8:05pm

There is a lot of callbacks (cutmix recently, bs_finder as a standing PR, captum support etc…) being created (which is amazing!) and with Jeremy, we are starting to find it’s going to be hard for us to maintain all that code (especially with course/book going on).

I think we are going to limit the v2 repo to the core architecture and callbacks that are used all the time, but have a different way of packaging all the other ones.

Which is where I’m wondering how exactly. Either each contributor could host their own(s) callback on a repo that would be a fastai2 extension (I’ll build a tutorial to show how to do that soon). Or someone(s) in the community volunteers to host (and maintain cause that’s the thing we can’t really do) a repo fastai2.callbacks where we redirect every user wanting to add their own.

What do you all think?

boris · April 12, 2020, 2:47am

This is a tricky subject. Here is how I think about it.

I would say that having only one person hosting all callbacks has a few risks:

if all PR are accepted easily, quality of code may go down or even break from original creator and it will be hard to allow only them to update their callback
if everything needs to be checked, it will quickly be a lot to keep up with (similar to your situation)
updates will depend to how responsive the repo owner is
eventually, callbacks will have to be filtered out which is a difficult task as useful ones may not be accepted to do and several repos will be created, bringing more difficulty to navigate through the callbacks

Having each creator handle their own callbacks come with the risk of having some of them “abandoned” while others are actively using them and multiple forks/versions being created. Only those related to other projects (such as WandbCallback) would most likely remain well maintained by their main repos.

Here is a third suggestion:

callbacks will be on a fastai repo fastai2.community (since there are callbacks, interpreters, visualization tools…), maybe even split out further by function (such as .text, .vision…)
the only requirement to merge a PR is that all tests pass
once a callback clearly becomes popular or a main element of your workflow, you can decide to move it out of experimental and include into your main repo
I see the shortcoming in this option as still giving you additional work as everybody may want to push their own very specific callback

Overall I would favor your first suggestion (each contributor manages their callback) or this 3rd suggestion but it would be useful to have a list to explore through existing community callbacks (through keyword and popularity) with maybe a personally curated list.

lgvaz · April 12, 2020, 3:50am

I really like @boris third suggestion, I think it’s the most natural for functionality being incrementally added to the library.

This is also how I saw other libraries doing it, tensorflow had (has?) the .contrib

vferrer · April 12, 2020, 10:23am

I like @boris third suggestion however I would add:

The official community repo should only contain high quality code. So, users can expect that they can use this code without worrying if it will work or not. I don’t see any value in having 100 community callbacks poorly documented / don’t work well / etc. So, for me, there should be 3 requirements to merge a PR:
1. Pass all tests.
2. Be documented.
3. Has a tutorial showing how to used it.
You could easily install fastai2 community along fastai2. For example, fastai2-community package.

On the other hand, I would include the community callbacks, etc in the official fastai2 documentation (¿maybe in fasta2.contrib or in a separated docs?) so users can discover them easily. As they would be well documented, they would be useful.

Finally, fastai2 community docs could have an entry linking to user repositories / forum posts / etc that have a list of fastai2 extensions / useful code snippets / etc that don’t meet the criteria. For example, it could have a link to https://github.com/nestordemeure/fastai-extensions-repository.

¿What do you think?

morgan · April 12, 2020, 12:24pm

I agree with the need for tests, documentation and tutorial to keep quality high. For code review should a community repo:

a) only have a smaller number of people who can approve PRs? 5,10,15, 20 (?) based on forum contribution (or some other meaure), reviewer list could be revised periodically
OR
b) have a looser threshold of approved reviewers but require multiple reviewers to approve a PR, functionality here: https://github.blog/2018-03-23-require-multiple-reviewers/

jeremy · April 12, 2020, 1:07pm

I don’t think I’ve ever seen a contrib repo work in the long term, and I can’t see any upside to them. They don’t solve code quality or project abandonment issues at al AFAICT, whereas there ways that attempt to do that more directly and with less process complexity (e.g. user-driven testing database; project review site; …)

boris · April 12, 2020, 5:38pm

I agree the main issues are preserving code quality and avoiding project abandonment.

Do you have more details or examples of what you were thinking (user-driven testing database, project review site)? Does it mean users adding more tests to proposed callbacks through the fastai repository or is it something completely different?
I’m not familiar with this type of processes so I’m curious about it.

Pablo · April 13, 2020, 3:17pm

I like the spirit of the contrib repo idea, but I have no experience to weigh in on the problems that @jeremy mentioned.

I think what’s really important is to have some centralized place to at least have links to everyone’s contributions. Super naively I am thinking of a simple website with a basic navigation system (types of callbacks) and some kind of community feedback, so that it is easy to tell the difference between well-established contributions and others that may be more… risky in some way.