Re training VGG on imagement with batch norm

Rothrock42 · March 1, 2017, 7:27pm

At about 1h:40m in the lesson 3 video he talks about how VGG didn’t use batch norm because it wasn’t around. At about 1h:46m he mentions that he “…grabbed the entirety of imagenet and trained this model…”

@jeremy – I’m wondering if you can expound upon that a little bit more? What kind of system did you do that on – aws instances or I assume you have access to all kinds of fancy stuff? How long did that take? I see that there are currently 14,197,122 images. What kind of total file sizes are we talking here? Did you just let keras scale them to fit the 224x224 size, etc.?

Just trying to get a sense for what is involved with something that seems simple to say, but probably had a lot more behind it. Thanks for anything you can share about that.

radek · March 1, 2017, 7:37pm

It’s there in the video Two runs through the dataset took under 1 hr on @jeremy’s rig, and that was with horizontally flipping. The weights are still available on the site under models - just downloaded them today

Rothrock42 · March 1, 2017, 8:01pm

@radek can you share the time code for that? I’m not seeing it.

I did download the saved weights in the notebook, but I’m also trying to get an idea of what would be involved if I wanted to try something like that on my own. Still would be interesting to know more details about what lead up to those numbers…

radek · March 1, 2017, 8:15pm

here you go

25 mln images in 2512 seconds

(that is just fine tuning I believe - not training the entire model from scratch)

Rothrock42 · March 1, 2017, 9:13pm

Ah. And that is from lesson 5? I’ve only gone as far as Lesson 3. And of course this raises more questions…like how big must the trn_features.dat file be to contain 2,522,348 decoded jpeg files…

jeremy · March 2, 2017, 12:42am

Note that they’re only 224x224 sized arrays, and they’re compressed thanks to bcolz.

luca · March 2, 2017, 5:21pm

Which notebook is the screenshot from?

I have downloaded 2 files with the batch_norm weights (vgg16_bn_conv.h5 and vgg16_bn.h5). I assume vgg16_bn.h5 contains the VGG net weights only for the dense layer when using batchnorm, is that correct?

Therefore, can we just finetune the network on cats and dogs using these weights, just as we did with the original vggnet pretrained model? Thanks a lot for the clarification!

radek · March 2, 2017, 8:59pm

Not sure if Jeremy shared the notebook with us - this is just a screenshot of the video. But there is more information on the batchnorm model in this notebook.

Yes, I think your idea is spot on - we can use the VGG16 model with batchnorm simlilarly to how we finetuned the model without batchnorm. BTW if the model you are trying to load the weights into is of different architecture than the weights would entail, I think you would get an error, so you should be fine to experiment and see how it goes.

I think much of that has already been scripted for us in vgg16bn.py but I didn’t get around to using it as I was already to far down the road in optimizing my model by the time I found out about Jeremy’s batchnorm experiment and wanted to try out a couple of other things before the competition ends So yeah, looking at the vgg16bn.py file from the repo should be a good starting point.

luca · March 2, 2017, 9:10pm

Thanks @radek I just found the notebook actually and I am going through it. Good luck with your submission!