We’ve got a bit of a project going on with the SF study group which I figured I’d post about here so that others can get involved and see what’s happening if anyone else is interested.
The goal of the project is to train Imagenet to 93% accuracy as quickly as possible. If you don’t have a bunch of fast GPUs or AWS credits lying around, you can still participate by trying to train Imagenet with 128x128 images to 84% accuracy as quickly as possible - insights there are likely to be transferable.
Here are some ideas we’re working on:
Use half precision float (only helps on Volta architecture - e.g AWS P3)
Multi-GPU training with Nvidia’s NCCL library or with Pytorch’s nn.DataParallel
Use Smith’s new 1cycle and cyclical momentum (in fastai now as use_clr_beta)
Better data augmentation (see separate GoogleNet augmentation project thread)
There seems to be multiple releases of imagenet dataset.
Is “ImageNet Fall 2011 release” right dataset for object classification benchmark?
Does anyone know ballpark range of training time for imagenet using single 1080Ti?
And I have an idea to I want to try.
During Kaggle CDiscount challege(also had large dataset - 15 million images with 180x180 res), I tried a following method to speed up the training:
1. Train normally for a few epoch.
2. Freeze a first one (or two) layer group and train.
- In my case, I could double the batch size and the training time for each epoch was halved.
3. Unfreeze all layers time to time.
(Or it might be better if gradients from unfrozen layers are accumulated and backproped to froze layers time to time?)
4. Repeat 2-3
The idea is that first/second layer groups take up quite memory/computation, but weight values change very little after some epochs(and especially with pretrained weights), so they don’t need to be updated every weights for all batches.
Since I kept change training parameters during competition, I don’t know whether this actually worked in terms of training time or accuracy.
I would like to confirm this approach while applying all fancy fast.ai features as time allows.
Actually I don’t have much background in ML/DL, so let me know if this approach doesn’t make much sense!
(Edit: I used Xception in the competition since it took almost half time to train than inception4, inception resnet though it had slight less accuracy. It may be just because I used a bad implementation, but it took forever to train DPN)
In terms of architecture experiments, what are your thoughts on incorporating object segmentation feature on fastai. It would include using Mask RCNN model. But that would actually not be training on Imagenet data (which is not theme of this thread, sorry to deviate), but on COCO.
I guess fastai only includes localization using bounding boxes for now, right? Btw it was Kerem’s idea as we were discussing about this yesterday.
Was that with pytorch? I’m finding DPN slow. Inception-resnet-2 is looking hopeful. Planning to try Xception next. I’m planning to create a preact-resnet and yolov3 backbone for pytorch to try do - I’ll create a new thread soon for that in case someone else wants to have a go at it.
I’m getting about 12 hours for rn50 on 8*GPU on volta with half precision. I believe pascal is 3x slower. So that sounds like maybe 12 days on a 1080ti. Possibly with 1cycle learning you can make it 5x faster.
No, that’s the full set. You want the competition subset of 1000 categories, which is called ILSVRC2017 (for classification it’s been the same since 2012 AFAIK, so you can also look for ILSVRC2012).
ImageList.from_folder seems very slow for such a big dataset. ImageList.from_df offers more flexibility than ImageList.from_csv when dealing with csv files that aren’t exactly what you need.
It’s possible to run different methods for labeling the train and validation data. e.g.:
#have to create and label valid and train separately because methods are different
valid = ImageList.from_df(val_df, path=path+’/val’,suffix=’.JPEG’).split_none(
train = ImageList.from_folder(path+’/train’).split_none().label_from_folder() #combine valid and train into train
train.valid = valid.train
Although I wouldn’t use the above for Imagenet since from_folder seems not adapted.
This is what I have now so far for get_data(), using the folder structure and csv files as provided by Kaggle:
def get_data(path, size, bs, workers):
tfms = ([
# only keep the first word of the second column for labels
val_df = pd.read_csv('LOC_val_solution.csv', header='infer')
val_df.PredictionString = val_df.PredictionString.str.split().str.get(0)
train_df = pd.read_csv('LOC_train_solution.csv', header='infer')
train_df.PredictionString = train_df.PredictionString.str.split().str.get(0)
# add label as folder name in training data
train_df.ImageId = train_df.PredictionString+'/'+train_df.ImageId
valid = ImageList.from_df(val_df, path=path+'/val',suffix='.JPEG')
train= ImageList.from_df(train_df, path=path+'/train',suffix='.JPEG')
lls = ItemLists(path, train, valid).label_from_df().transform(
tfms, size=size).presize(size, scale=(0.25, 1.0))
return lls.databunch(bs=bs, num_workers=workers).normalize(imagenet_stats)
Question: I first ran 1 single epoch and it took 27 minutes on 8*V100.
Then the epochs started taking 90 seconds. Is that drastic change likely due to CUDNN benchmarking or was there something else going on with my salamander instance?
Edit that was with smaller 128px images, 224px images take about 3 minutes per epoch
Edit2: My guess is this wasnt benchmarking. When I run multiple epochs, the first one is a few seconds slower, I think that would be benchmarking.
One weird tip: when running fastai distributed through fastai.launch, I sometimes stop whatever training I’m running in the middle of it (CTRL+Z). This then causes a “RuntimeError: Address already in use” when I try to run things again.
What works is closing my console (I use Cmder), relogging in through SSH, and trying again… If there’s a better way, let me know!
Edit: If pressing CTRL+C you can interrupt a training run without the problem above