Project: implement yolo v3 backbone and preact resnet

jeremy · April 8, 2018, 12:02am

If anyone is interested, here’s a little project to test out your skills implementing and testing an architecture. What I’m looking for is pytorch implementations suitable for imagenet of preact resnet and the classifier backbone part of yolo v3 (which the paper calls darknet-53).

Note that there are some pytorch implementations of preact resnet out there already, but:

They’re incomplete implementations of the paper (they don’t have the proper starting/ending blocks, for instance)
They’re for cifar-10, not imagenet (I haven’t looked into it, but there seem to be some little differences).

In particular, I’m interesting in preact-resnet50 - although an implementation presumably would easily allow any of the resnets to be built.

There’s also an implementation of darknet-53 already for pytorch, but it is messy and ugly and uses a config file rather than a normal pytorch approach to defining the network.

I’m looking for implementations that are:

Concise and readable (e.g refactor repeated code into modules)
Tested against the paper to confirm you get the same results (I can help with this if you don’t have access to a suitable machine; but you can do some initial testing by at least running a hundred batches or so and comparing to the reference implementations in darknet / lua. You could also simply compare to regular rn50 for a hundred batches or so and confirm it’s faster.)

For bonus points, do an senet version of preact resnet

Let us know here if you start on this so that we can coordinate.

sgugger · April 8, 2018, 2:06am

I’ve done a bit of research on preact resnet without realizing it when I was looking for resnet 56 (and actually using preact resnet 56) and I was also looking at Yolo v3 to get better results on the pascal notebook, so would love to help on this.
Though my little GPU or my Paperspace machines won’t be able to do experiments on ImageNet so I’d definitely need some technical help.

And if someone else wants to join it’d be great!

jeremy · April 8, 2018, 2:10am

OK let’s do it. I can give you access to a P3 to test with when you’re ready.

jeremy · April 8, 2018, 3:24am

Oops! The pre-act resnet for imagenet idea was a dumb one - sorry. As explained here this actually makes imagenet worse, except for ridiculously large networks. So let’s stick with darknet-53 for this project.

Sorry!

Combalgorythm · April 8, 2018, 3:47am

I would like to join the project

jeremy · April 8, 2018, 4:15am

Just dive in and start implementing! If we end up with multiple implementations we can grab the best of each…

jeremy · April 8, 2018, 4:16am

If darknet-53 looks good, we may want to consider adding the ‘squeeze-and-excitation’ ideas to a later version too. Just an idle thought…

jeremy · April 8, 2018, 6:52am

The notes at the end here are interesting:

The same method is also used in the paper that trains rn50 in an hour:

I’ll try making the change to the stride-2 layer they suggest. I believe @binga is working on the photometric distortions. And we already have a PR for the aspect ratio augmentations. So we’re getting close I hope!

ranakj · April 8, 2018, 7:12am

Interested as well! Whats the timeline?

ranakj · April 8, 2018, 7:14am

Would be awesome to share access

piotr.czapla · April 8, 2018, 11:13am

@jeremy are you interested in a paper implementation of RetinaNet as well so that we can compare it darknet-53 and compare it with your tweaks to it presented in the lesson 10? To have a baseline.

jeremy · April 8, 2018, 2:53pm

I’m interested in retinanet, but not as part of this project; this project is simply to create an imagenet classifier. darknet-53 is only a classifier, but retinanet does localization. darknet-53 is interesting since it gives high accuracy classifications but is faster than resnet.

jeremy · April 8, 2018, 2:54pm

Hoping to have something in the next day or two. There’s nothing really to share - just read the papers and try to implement them!

sgugger · April 8, 2018, 4:01pm

Dumb question: is there a small subset of ImageNet accessible somewhere for preliminary tests?

jeremy · April 8, 2018, 4:03pm

I got you covered http://files.fast.ai/data/imagenet-sample-train.tar.gz

sgugger · April 8, 2018, 4:20pm

Thanks!

emilmelnikov · April 8, 2018, 8:29pm

I’ve started an implementation of darknet-53. The model itself is finished, I’ll try to test it against the original darknet implementation and resnet-101/152 (at least for ~1 hour in P4000) and share everything tomorrow.

sgugger · April 8, 2018, 9:30pm

My implementation is here.

I’m not entirely sure of how they connect their blocks since they’re very vague in the Yolov3 paper, and it’s very confusing in the config files we cans elsewhere, but I’ve done it in the same spirit as Preact Resnet.

I’ve tried to fit for a few epochs on the sample set you shared, Jeremy, and I find this with Resnet50:

For my version of Darknet, it gives this:

Also, I’ve just discovered that they apparently use Leaky ReLUs (though it’s said nowhere in the paper) so I’m trying to see if it gives better results.

Any feedback on the notebook is welcome.
Edit There was a typo in my initial notebook so it changed the results a bit.

jeremy · April 8, 2018, 11:52pm

This is looking good! Will be interesting to compare to @emilmelnikov’s results - would be good to use the same dataset to test.

One approach to checking might be to try loading the config file in the existing pytorch version and then print out the network? I do suspect leaky relu will work a bit better BTW.

sgugger · April 9, 2018, 12:43am

What do you mean by that? Just to be sure, are we talking about this one?
Reading further his code, specifically this part in darknet.py

elif block['type'] == 'shortcut':
    from_layer = int(block['from'])
    activation = block['activation']
    from_layer = from_layer if from_layer > 0 else from_layer + ind
    x1 = outputs[from_layer]
    x2 = outputs[ind-1]
    x  = x1 + x2
    if activation == 'leaky':
        x = F.leaky_relu(x, 0.1, inplace=True)
    elif activation == 'relu':
        x = F.relu(x, inplace=True)
    outputs[ind] = x

I think he does BN directly after each conv layer, like in Resnet.