I wanted to write about my experience using fast.ai in the recent TGS Salt competition - a semantic segmentation challenge with binary mask target. I won a silver finishing 76/3234 and found the competition to be a challenging but accessible one for someone tackling their first image and pure deep learning competition.
I also wanted to give an honest appraisal of using fast.ai for a live kaggle competition as I hope others will find the feedback useful. I should say upfront this is the first time I’ve properly used fast.ai for anything other than the part one lessons and so I accept full responsibility for any shortcomings that were my own (I’m sure it will be evident to those more knowledgeable than me that at least some were) and some of the issues I faced that may be now fixed in fast.ai v1. I was using an older version of fast.ai with PyTorch <0.4 as once I got things running I was too paranoid to change mid-competition in case it broke anything.
Summary of fast.ai usage
In the end I went with minimal usage of fast.ai using it for the following 3 things:
learner object (not conv learner)
augmentations: lr flips, zoom, lighting
This means I didn’t use fast.ai in the end for upsampling images or masks, any learning rate schedule policy, discriminative learning rates, TTA nor using most of fast.ai’s pretrained models except SEResNext50. I am pretty sure my usage of fast.ai was not optimal however as I had a single GPU it wasn’t possible to test ideas in parallel or always get to the bottom of issues as all time spent working interactively was time the models weren’t training (and they typically needed 12+ hours!).
Side note on the nature of kaggle competitions
Almost by their very nature kaggle competitions are like financial markets in their self-adjusting behaviour - advanced techniques that work out the box from libraries or are posted as kernels quickly become the baseline for most of the stronger participants and everything advances. This means we almost necessarily get pushed down somewhat complicated rabbit holes of quirky things that happen to work in this competition but might not be best practice if you were deploying a deep learning system in the wild. I think this point is more fundamental than it seems. If fast.ai wants to be something used for winning kaggle competitions then I think that’s a (not quite orthogonal but pretty) different aim to using it for most mainstream deep learning projects. This means either extending fast.ai to handle the peculiarities of a kaggle competition or having it easily extensible in a manner of weird non-mainstream ways. A tricky situation (which I almost stumbled into a few times) is to invest time heavily in a competition with one library only to find that to score highly everyone is doing technique X and technique X is either not supported by your library or requires a lot of workaround.
I will provide more detail on my observations below.
Reiteration: it may turn out most of these are me being an idiot (which happens far too often), my lack of full familiarity with fast.ai or are fixed in fast.ai v1 - all mistakes are my own and I apologise in advance. I offer this feedback meant in the best possible way - to help improve the fast.ai project. Furthermore, it might (quite rightly in some cases) be that fast.ai has no desire to support any of the things I raise below - that in itself is also fine and very useful to know for the future. It’s almost certainly not a good design philosophy for fast.ai to worry overtly about the individual quirks of a kaggle competition but knowing where those boundaries are is helpful.
Support of k-fold: a reasonably minor issue with a simple enough workaround but does raise other issues which crop up in kaggle competitions such as dropping (training) data per fold. For example, I was dropping images from training data whose masks were less than X pixels - obviously I kept them in the validation set. It would be nice to have this handled naturally and does come up in other non-image competitions quite a bit with poor data.
Resizing binary masks: I’m pretty sure using the default openCV setting causes upsampled masks to not be just 0 or 1 and @wdhorton kindly suggested cv.INTER_NEAREST as a solution to this which I did outside of fast.ai (I don’t actually think this was a big issue for performance).
Modifying pretrained models: it turned out removing the first maxpool layer in the resnet architecture was a good idea for this competition. I found it non-trivial to figure out how to get around this in fast.ai but eventually used the default torchvision models (i.e. dropped the
sfs.featuresstuff) and wrote my own layer group wrapper. This is something I probably found harder due to my lack of fast.ai knowledge but certainly seemed like an extra hurdle. That said I learned a lot doing it.
Cosine annealing: a minor point that I’m sure could be accomodated if people think it’s a good idea but why does cosine annealing drop the learning rate to 0? I would prefer to specify the min and max lr and use the schedule in that manner (note: clr allows a min lr).
Callbacks: once I got my head around how these work I ended up just writing my own for everything I wanted to do but I would say that useful functionality includes things like saving your n best models subject to some specified metric (e.g. not necessarily val loss) in order to average predictions in the fold from each of these. In this competition it was important to monitor the actual competition metric whilst training and use this as your guide (training was very noisy). Also, and something I raised in another thread is that at the moment it’s not clear on what basis models are being saved on (and sometimes there’s an implicit assumption the first metric passed is accuracy and is being maximized and thus models are saved on that basis).
Models with multiple outputs: I got this to work on my fast.ai version by returning a list of outputs from the forward pass which I then parsed in a custom loss function however when I hit
learn.predict()I’m pretty sure it only returns the first list item and not them all. I might be wrong on this as it’s something I didn’t pursue but returning multiple things might be needed in other competitions when jointly training models for multiple purposes and then using upstream.
Having a mask target with 2 channels: this is something I couldn’t figure out in the end (I tried passing masks with 3 channels) without a real hack (i.e. creating the 2 mask channel on the fly inside the forward pass! Interestingly it wasn’t that slow) which I only tried at the last minute and ultimately didn’t use. It was suggested in this competition (at the last minute) that training on the boundary of the mask as an additional task was helping performance.
Reading images and masks as numpy arrays: almost certainly my own limitation here but I couldn’t figure out how to read data from numpy arrays for both the images and the masks. I wanted to do this for stacking the models at the end where my images were now probability predictions per pixel from multiple models and the masks were the original masks. Note in this case all the data fit in memory but that won’t always be the case.
TTA: as far as I am aware it isn’t/wasn’t possible to use TTA straight out the box for fast.ai in this competition and so a workaround was needed.
The notion of an experiment: this is something that I think fast.ai as a community do well but there isn’t a formal way to handle it. In the end I used my own logging and wrappers around the training loop to keep track of the basis of each experiment. I used
attrdictfor the first time - very neat.
Just to be clear, most of the above are minor in the context of a brilliant deep learning library however some of the issues felt like they could have become acute in the context of a kaggle competition.
Something positive now! Overall I did enjoy interacting with the fast.ai library and certainly have learned a great deal from trying to delve under the hood - a lot more than if I’d just run it with no changes. I should also say that I fully buy into the fast.ai philosophy and am really looking forward to taking the fast.ai Live course starting next week from London. I also believe the world needs more people teaching and working in the manner the fast.ai community and @jeremy do. There were others from fast.ai taking part and they almost certainly utilised the library in a more optimal way than I did so I’d be very grateful to hear what they think. Notably @VishnuSubramanian, @wdhorton and @radek.
I hope the above observations don’t sound overly critical but given how awesome I think fast.ai currently is/has the potential to become I think it’s crucial to not just get the the good (but often) echo chamber feedback in order to become stronger.
Update: Oh, I should also say that I recognise fast.ai doesn’t develop itself and so if I were able to help contribute to the fast.ai library in anyway I’d be delighted to (though I’m not a developer by trade I am focusing a fair bit on trying to up my Python skills).
Happy to take feedback myself!