Learning from 50M noisy images (Kaggle Quickdraw competition)

miguel_perez · December 6, 2018, 3:10pm

Hi friends,

I just tried the library in a Kaggle Quickdraw comp, basically @radek 's version of lesson1 achieving a remarkable top 17%. I say remarkable cause this was with a single, not postprocessed model (top solutions postprocessed + emsembled heavily many models).

So, I’d like to share what I have learnt, it’s just my intuitions so feel free to add/discuss any of them:

-1) “Vanilla” Fast.ai handles without problem the huge amount of images converging nicely in a few epochs.

-2) If you have zillions of varied data don’t use augmentation, it will not beat real images. (What zillions means will actually depend on the number of classes but 50 million images is enough to forget about augmentation)

-3) Be careful with momentum if you have noisy data (5% of images where “swap” noise). I benefited a lot from reducing momentum sustancially. My guess is that momentum hurts normal “noise balancing” observed in NNets with swap noise.

-4) related to 3), Im quite curious about Noise handling. Especifically, why with so much data noise could not be completely filtered by my -single- model. Made me wonder about how well adaptative learning rates + cycles filter noise, considering that optimal filtering should be random.

So that’s it, maybe obvious hints but I found them very usefull in this case. Any aditional insight, especially about points 3 and 4) will be welcome, I think noise was the key to that classification and no single model that I know of was able to completely filter it without being ensembled with others, not even the top ones.

marcmuc · December 6, 2018, 10:36pm

Hi Miguel, thank you for sharing this and especially your points on momentum.
I would be very interested in understanding how and by what means you used the data with fastai and what your hardware setup is like, because I have made the absolute opposite experience wrt to 1). And I am not alone, as you can see from this thread and elsewhere on this forum…

marcmuc · December 6, 2018, 10:40pm

And if I may ask, now that the competition is over, what was your exact model architecture and what size images, batchsize and learning schedule/rates did you use? Did you manually set the mentioned momentum in fit_one_cycle? Did you use the full dataset and if so, (how) did you split the data into “sub-epochs”?

pasumarthi · December 7, 2018, 2:17am

Somehow not able to locate the link for today’s session. Please share the link

miguel_perez · December 7, 2018, 9:55am

Hi @marcmuc, will try to answer to all that, about my setup I used a dedicated server with 64GB RAM and a GTX1080, so really nothing fancy or super-powerful.

About code, I used first version of Radeks code to prepare 30% of available data, that is, about 15M images. Then interactively refined solution by cycles (yes, onecycle, tryied also simple fit with fixed lr but unecycle worked better, most of the runs of between 2 and 5 epochs long).

About momentum, I used as low as 0.2 for the small lr part of the cycle and 0.01 (zero, almost) for the big lr part. This seemed to work well but I think this is especific to this competition, because we knew there was a 5% of swap noise, a lot.

Batch size, I tried default 64 but bigger was better so used 256 and 384 with mixed precission.

Architecture used Resnet34, didnt try others. Size 128, but tried smaller to 64, and progressive resizing but just one step to 30% bigger (intuition: more varyiety if pooling division is not odd, so not doubling but increasing)

And well, most of my tinkering with parameters was by intuition, about how “it felt”, with noise on my mind all the time. My main objective was to gain proficiency with the library, and didnt dedicate that much time to the competition so my general feeling was of easiness to achieve a good result. With only a small amount of postprocessing result could have been even better, but for me it was mostly a learning experience about the library and didnt even touch the competition for the last week, as I had not the time…

So that was my experience, super result with a few days of a bit of intuitive tinkering with fastai v1, easy and positive… you left me thinking with your comment, hope my explanation gave you some hints on how to make it work better for this one