NTS-Net in Fast.AI

Hey everyone, for the past few months I’ve been working on implementing NTS-Net into fastai. The paper is here: https://arxiv.org/abs/1809.00287


Essentially it is a method of using bounding boxes without labeling any on image classification to help with fine-grained subject matter. In this (very) narrow show of what it can do, I compared how a standard resnet50 did against the NTS-Net, both pretrained but I did not do freezing/unfreezing, as I am working on this currently. The resnet got 78% accuracy in 4 epochs, while the NTS-Net got 82%. When it came to a snake classification I have been working on, it performed even better, showing a 10% improvement overall. The next steps are to enable us to use the split() (should be done here in the next day or two), and otherwise I invite everyone that want’s to experiment to try a few ideas!

In the paper, the only ablation study they performed was on the number of boxes formed. Nothing about the size, using Jeremy’s size-up technique, etc. It’s something I’d like to look into myself and plan on but I would like to hear of other ideas as well for fine-tuning this powerful model (currently as of the paper holds the highest accuracy in regards to the UCLA Birds dataset). I already saw an improvement when I included LabelSmoothingCrossEntropy, but I am curious on a few other things as well that I would like opinions on. I feel the loss function is too harsh, when it comes to determining how everything is laid out. Anyone have ideas? I’d like to weight poor choice of focus less than that of a prediction wrong, as the two should be fairly well correlated. In any case! Let me know any questions, the github will be updated constantly with improvements towards implementing in fast.ai.

Here is the PETS notebook for an example:

The source code I have been using is from two different places, one the original implementation, and the other from the pytorchcv library.

Please let me know of any issues, again the split() and gradual unfreezing should be solved in the next day or two. Thank you very much guys and thank you to those who’ve helped me get to this point. This was my first pure pytorch implementation in fastai.


Thanks for the repo. I tried your notebook PETS.ipynb, but I get following error:

NameError                                 Traceback (most recent call last)
<ipython-input-26-46abe5e3d7ff> in <module>
----> 1 net = attention_net(6, data.c, 4, pretrained=True)

NameError: name 'attention_net' is not defined

attention_net is not defined anywhere anymore in your code. Maybe your latest code is not pushed to github yet?

Sorry! I need to update the example notebook :slight_smile: it should be NTSNet now. Give me a moment to double check that was the only change

Yes, I though about it that you changed attention_net to NTSNet. I am looking forward to your update, thanks for your prompt response :slight_smile:

@cahya notebook is updated :slight_smile: Working on a better way to make the NTSNet to where we can explore different back ends but this is what I have for now, you can see the basic functionality in the model.py. The backbone right now is just a resnet50 pretrained. I’ll update with working freeze(), etc. It’s almost done just need to verify something real quick

Great, I will try it. Thanks.

@cahya, I have updated the notebook again with split() and freeze() so we can use differential learning rates :slight_smile: Last challenge I am facing is dealing with cnn_learner… I get a size mismatch that I will investigate later this week

I will say one thing about this model, it does take longer to train, which I believe comes in part to the unsupervised learning aspect. Even with using pretrained weights, my accuracy always starts at 3% in the first epoch. Usually no more than 11-15 epochs I can hit the ‘top’ accuracy. Which is much faster than their paper managed (500 epochs) but it is certainly more than our standard model.

I tried your notebook, the maximal accuracy I can get using standard resnet50 (Learner and without tns-net) is 90.12%. The max accuracy with TNS-Net is 93.91, that means a respectable improvement comparing to pure resnet50 Learner. But if I use the same resnet50 with cnn-learner, I can get easily 95.26%. People can get also 96.4% using resnet50 with cnn-learner according to Lesson-1-pets Benchmarks
Hopefully, if we can use cnn-learner with TNS-Net, we can improve the current “possible maximal accuracy” again.
If I am not mistaken, the difference of model created by cnn-learner and Learner is just about the custom head, isn’t it?
Here is once again the comparison in a table:

Leaner only Learner + TNS-Net cnn-learner (mine) cnn-learner (pets-benchmarks)
90.12 93.91 95.26 96.40
1 Like

Custom head and a few other issues I’m working on. Yes and no. It works but I get a size mismatch which normally shouldn’t be a thing. There’s also a custom forward() to the entire model. I’m yet to try to replicate their CUB-2011 score, so perhaps I may have missed something there. Thanks for trying it out!!!

I’m also an extreme beginner when it comes to custom models so I’m learning as I go. I’ll see on the CUB here as well in the next few days. If I can’t match their score I’ll rework possibilities as to why.

Just want to inform, that I am able to add custom head used in fastai without the error you mentioned (mismatch). I have to change some nts-net parameters to avoid the mismatch. Now it is running the training, wonder about the result. The accuracy after the first 5 epoch is similar like cnn_learner (91.x%)

1 Like

Do you run fit_one_cycle with 15 epochs as you mentioned, or do you space it out into a couple of fit_one_cycles?

I am working with the Stanford Cars dataset and things are going pretty slowly, perhaps I’ll post an update later.

I tried jacking up the learning rate but things get a little wild, so I suppose you were right when you said that this takes a little longer than a pure resnet model.

And I realised one other thing, I believe that this
does the exact same job as this function

I run it all at once. We’re working on implementing the slices properly, and we have it working. The most recent notebook you can pass in the differential learning rate in and it should help train a lot faster. And on CRE, it may, I’ll look into that now. They used CRE most everywhere for their loss function, so I did my best to keep it to their original code-base.

After learn.split() and learn.freeze() you are able to do a slice learning rate into the model. I was able to get 82.6% on PETS in about four epochs, and as Cahya mentioned, 91.x after 5 with his work.

I’m starting to think that the ‘slower training’ problem get magnified exponentially when the output classes are significantly more, and I am guessing that’s why progress on the Stanford Cars dataset with 196 classes is a little slow.

Perhaps. I haven’t tried playing with a large dataset yet. I was hoping to have this done before the inaturalist competition was done and try it. But let me know your results and feel free to put any PR’s in :slight_smile:

My best accuracy for pets dataset after the first 5 epochs (frozen body) is 93.7%, which is not bad, but still lower than standard resnet50 using fastai (could get 94.x easily). Unfortunately Further training with unfrozen body doesn’t improve the accuracy at all, so I stopped the training after 10 epochs.

I got 93.7% after I changed the concat_net from just a single Linear layer to the same layer like fastai classifier.

Did you try training past the first 5 epochs without unfreezing?

Yes, I tried it also, but it didn’t help