Jeez they got some nice results! Thanks for sharing
This is my personal reading list, has some very theoretical and math intense papers which provide much better insight into why things work the way they do - https://github.com/digantamisra98/Library
Funnel Activation for Visual Recognition (@Diganta might be interesting for you)
What Makes Training Multi-Modal Classification Networks Hard?
The code was also released.
If anyone is interested in trying to tackle porting this over, I’d be interested in helping/leading (when I have time to lead the push). The code is in Caffe2 so it requires a bit more effort to do, but this could be extremely valuable combined with the MixedDL
project: GitHub - facebookresearch/VMZ: VMZ: Model Zoo for Video Modeling
Interesting read but seems high FLOPs because of the way they calculate the spatial conditioning. Anyway, a really nice approach.
DeLighT: Very Deep and Light-weight Transformer
Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on machine translation and language modeling tasks show that DeLighT matches the performance of baseline Transformers with significantly fewer parameters.
On the WMT’14 En-Fr high resource dataset, DeLighT requires 1.8 times fewer parameters and 2 times fewer operations and achieves better performance (+0.4 BLEU score) than baseline transformers. On the WMT’16 En-Ro low resource dataset, DeLighT delivers similar performance with 2.8 times fewer parameters than baseline transformers
Thank you for sharing!
It seems like Transformers are taking over the world, or at least a newly starting with computer vision:
Great survey paper here of a selection of the efficient transformer that have come out in the past few years
Efficient Transformers: A Survey
has anybody tried using SwAV , any experience to be shared ?
That’s a great overview, thanks for sharing! Easy to lose the overview with all the different transformer architectures getting published recently
Just adding the arxiv link too (without going through twitter): https://arxiv.org/abs/2009.06732
ShapeAssembly: Learning to Generate Programs for 3D Shape Structure Synthesis
paper presents a deep generative model which learns to write novel programs in ShapeAssembly, a domain-specific language for modeling 3D shape structures. Executing a ShapeAssembly program produces a shape composed of a hierarchical connected assembly of part proxies cuboids. Our method develops a well-formed latent space that supports interpolations between programs. Above, we show one such interpolation, and also visualize the geometry these programs produce when executed. In the last column, we manually edit the continuous parameters of a generated program, in order to produce a variant geometric structure with new topology.
code and paper https://rkjones4.github.io/shapeAssembly.html
MeLIME: Meaningful Local Explanation for Machine Learning Models
Most state-of-the-art machine learning algorithms induce black-box models, preventing their application in many sensitive domains. Hence, many methodologies for explaining machine learning models have been proposed to address this problem. In this work, we introduce strategies to improve local explanations taking into account the distribution of the data used to train the black-box models. We show that our approach, MeLIME, produces more meaningful explanations compared to other techniques over different ML models, operating on various types of data. MeLIME generalizes the LIME method, allowing more flexible perturbation sampling and the use of different local interpretable models. Additionally, we introduce modifications to standard training algorithms of local interpretable models fostering more robust explanations, even allowing the production of counterfactual examples. To show the strengths of the proposed approach, we include experiments on tabular data, images, and text; all showing improved explanations. In particular, MeLIME generated more meaningful explanations on the MNIST dataset than methods such as GuidedBackprop, SmoothGrad, and Layer-wise Relevance Propagation. MeLIME is available on this https URL.
https://arxiv.org/abs/2009.05818
@muellerzr might want to see code also exists.
Self-supervised Single-view 3D Reconstruction via Semantic Consistency
code https://github.com/NVlabs/UMR
paper https://arxiv.org/pdf/2003.06473.pdf
Great posts about GPT-3 that are very interesting and thought provoking for AI development in general:
Best start with this section:
Then this:
And finally the entire long article if you want go for more details:
SCOUTER: An explainable image classifier using a modified version of Slot Attention
Interesting report:
One-sentence Summary: Transformers applied directly to image patches and pre-trained on large datasets work really well on image classification
the code for An Image is Worth 16x16 Words: Transformers for Image Recognition…