We’ve got a bit of a project going on with the SF study group which I figured I’d post about here so that others can get involved and see what’s happening if anyone else is interested.
The goal of the project is to train Imagenet to 93% accuracy as quickly as possible. If you don’t have a bunch of fast GPUs or AWS credits lying around, you can still participate by trying to train Imagenet with 128x128 images to 84% accuracy as quickly as possible - insights there are likely to be transferable.
Here are some ideas we’re working on:
- Use half precision float (only helps on Volta architecture - e.g AWS P3)
- Multi-GPU training with Nvidia’s NCCL library or with Pytorch’s
- Use Smith’s new 1cycle and cyclical momentum (in fastai now as
- Better data augmentation (see separate GoogleNet augmentation project thread)
- TTA every n epochs
And some experiments we plan to run:
- concat pooling
- Larger bs / lr
- Other architectures: dual path net, xception, inception4, inception resnet, yolov3 backbone
- sz 128->224->288
- stochastic weight averaging
- adam (mom,0.9/0.99) / use_wd_sched
- snapshot ensembling
- turn off wd / aug for last few epochs
Let me know if anyone’s interested in more info about any of these, or has any ideas or wants to try anything themselves.