Current best result using a modified XSE-ResNeXt 50 based on the Imagenette XSE-ResNeXt with a custom (24,32,64) stem and GeM Pooling. Stem inspired by Ross Wrightman and GeM pooling by DrHB.
I also included just the (24,32,64) stem as Submission 1 which had an accuracy of 75.29% ± 1.09%. GeM required lowering the batch size from 64 to 56 with a P100.
Also tested a (24,48,64) stem, less augmentation, more augmentation, and bs=56 with AvgPool, all which scored worse.
Boom! Thats great! Maybe you could post your 2nd best solution on the leaderboard also? I’m thinking that it might help to see other high-performing methods, maybe we could allow/limit up to 2 entries per person on the leaderboard, what do you think @muellerzr?
The idea is that training CNNs at a lower resolution than your test resolution can give better results. When training with the same resolution as the test set, the apparent size of objects in the train set will appear larger than in the test set, at least if you are using a transform like RandomResizedCrop. They address this by:
Training at a lower resolution, e.g. train at 160, test at 224
Fine-tune the classifier layer at the test image resolution
Probably it is a technique to squeeze out the last few % of performance and there are too few epochs in fastgarden to see its benefit. I did do 2 20 epoch runs out of curiosity with FixRes trained with 16epochs at 192 and fine-tuned (classifier layer only) for 4 epochs at 224, still it performed worse than a baseline run of 20 epochs at 224.
Sure, if you have another technique that could be interesting I’m thinking tag it to your highest-placed entry as perhaps interesting idea or something
There’s a range of training times in the notebooks in this thread and I was curious to know what setup’s everyone was using. My training are definitely on the higher(slower) end compared to others.
Wow! now that is fast!. Thanks for sharing. I was using a smaller bs in the original post due to memory issues which have now been resolved so higher bs brought my training times down but still no where near to this.
What can be helpful is seeing here the time is going, whether your are GPU bound (your GPU is fully under load, (nearly) all of the time, using all of its memory) or whether you are CPU bound. There are probably more advanced ways of going about this but this is how I do it.
I use tmux, so I open two new panes. In one, I run nvidia-smi -l - this regularly polls the GPU for stats (other people swear by nvidia-smi dmon but I often find nvidia-smi -l easier to read. In the other pane I start htop(this gives me my RAM/swap and CPU utilization). These stats generally suffice for figuring out how I am using my machine, whether I am using it to full potential. Sometimes, when I do something io heavy, I would also useiotop`.
As for colab, even PRO comes with only two vCPUs. I am not sure if this is what is happening here, but for many scenarios this can be a very limiting factor. File access times from google drive have also been slow for me - its best to copy the data locally before training and when moving files around its much faster to move one single file than multiple small ones.
@radek how would you rate nvtop to monitor your GPU compared to nvidia-smi -l ? Its the main thing I monitor for gpu usuage, but haven’t tried nvidia-smi -l or htop
MK-ResNeXt (Mixed Depthwise Convolutional Kernel or Multi-Kernel ResNeXt) modifies X-ResNeXt by replacing some convolutions with mixed depthwise convolutions (MDConv), which were proposed by Mingxing Tan & Quoc V. Le for MixNet.
The notebook contains two early variants, D & E, each with multiple configurations which usually score over 78%.
I also did some unoptimized tests at 20 epochs with MK-ResNeXt and my prior tweaked X-ResNeXt submission. MK-ResNeXt performed better but the lead dropped to 0.5-1%, depending on the configuration.
Based on these current experiments, I believe there is more tweaking to be done to find the best performing configuration. If anyone knows of good resources that cover handcrafting neural network architectures please send them my way.
Novel to the best of my knowledge. If anyone knows of an existing paper or implementation, please let me know. It would save me a bunch of work. ↩︎
I’ve been looking in a somewhat orthogonal direction using network deconvolution, which I’ve described briefly in another forum post. Through some early tests, I’ve found that these deconv (FastDeconv implemented from the official repo by Ye et al. 2020) layers can be substituted into the (x)resnet stem, where they can be drop-in replacements for the Conv2d layers and completely obviate BatchNorm layers!
The results are also quite nice, too. Using an xse_resnext34 model with FastDeconv, I’ve gotten 79.92% ± 0.72%. This can be found near the bottom of this notebook. I also think that this model can be combined with your mixed depth-wise kernels, and might achieve some really nice results
I saw the paper and your post on it the other day and was wondering how well network deconvolutions would work in practice. The answer from this competition seems to be pretty well
At least for ResNet, the deconv networks in their repo appear to replace all the conv layers with deconv layers. Have you tried that substitution with X-ResNeXt? If not, that is probably another avenue to explore.
You’re right, deconvs look like something worth experimenting with in MK-ResNeXt. I will take another look at the paper and code. Already have multiple ideas to test.
Also, isn’t your best result 79.95% +/- 0.23% from the deconv xse-resnet50?
I can’t remember if I tried used their models out of the box, but I swear there was a good reason for not doing so! I think when I tried a small version (xse_resnext18) with all deconv layers, I ended up getting ~65% accuracy. But that was again my own implementation, so it’s worth revisiting. I’ll probably do that next.
You’re right, deconvs look like something worth experimenting with in MK-ResNeXt. I will take another look at the paper and code. Already have multiple ideas to test.
Sounds awesome! I look forward to seeing you crush the high score as usual
Also, isn’t your best result 79.95% +/- 0.23% from the deconv xse-resnet50?
I suppose so, but I decided to not include it cause the accuracy is within the xse_resnext34+deconv score, and I had to use mixed precision in order to train it. (I can’t recall if this was permitted by the rules.)
This version of MK-ResNeXt-50F modifies MK-ResNeXt-50E by adding the deconv stem and moving the squeeze and excite layer back to its original location.
I did have one higher result of 83.22% ± 0.14% (Experiment 5.1 here) by adding a deconv stem to MK-ResNeXt-50E, but was unable to reproduce any similar result so am not officially reporting it. For example, a rerun on a different day, Experiment 5.1.2 in the same notebook, resulted with 82.09% ± 0.26%.
Hey guys, so I have been trying to follow this setup for converting tfrecords to images, but I have hit a snag, this produces an error for me: from tfrecord.tfrecord import
The error:
ModuleNotFoundError: No module named ‘tfrecord.tfrecord’
--------------------------------------------------------------------------- NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt. To view examples of installing some common dependencies, click the “Open Examples” button below. ---------------------------------------------------------------------------
I’m using colab, and I cloned the repo and installed TensorFlow.