I add every paper I’m interested in to my Twitter favorites list so that anyone interested can simply follow that. There are 2700 tweets there now! https://twitter.com/jeremyphoward/likes
Also, if anyone is interested in trying to implement one of these papers in pytorch (especially with fastai integration), please do post on the forum and at-mention me after you’ve started, since I’d love to see what you’re up to - and help if possible.
This looks like a good thread! Here is my first paper add for now. Would be awesome to implement this!
Group Normalization
(published march 22, 2018 by FAIR - Facebook AI Research - the infamous Kaiming He) https://arxiv.org/abs/1803.08494
The general idea is that performance of BN can suffer from small batch sizes which GN is designed to do well with (excerpt below):
On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2. GN can outperform or compete with its BN-based counterparts for object detection and segmentation in COCO.
Here is a recent paper on YOLO v3, the way its written is a joke but quite an entertaining read especially if you have read a lot of papers before it will resonate even more
I just started reading a paper about RetinaNet because it was better than YOLO2. But YOLO3 looks really interesting.
BTW, is anybody working on building YOLO or RetinaNet or any other modern object detection models with PyTorch?
I want to try. Looks like the most challenging part will be defining the loss function.
Thanks James for this link ! Kaiming He is one my heros in computer vision ! R. Girshick and C. Szegedy are the others.
This is a very promising normalization technique for high res images (i.e medical imaging).
Last year, the RSNA bone age challenge was won by a team (http://16bit.ai) that used a Tesla M40 24 GB GPU to train 500x500 images with BN and larger batches than available with a more standard 12 GB GPU. Multi-gpu is not a good workaround for this intrinsic BN problem because it is very hard to update BN layers dynamically between different gpus. I really look forward to try GN.
Lastly I want to extend the approached to classification based off of multiple text corpora, multiple input features, and e-commerce type applications and features (product descriptions, review texts, prices, review stars, etc).
I am familiar with Jeremy’s paper on this also.
Likely this will mean moving away from a CNN approach to an RNN approach, or an ensemble approach.
Encoder-Decoder structures fascinate me, especially as they are applied to image segmentation. The way they work and not work is like magic to me. Right now, I’m interested in how to disambiguate the various factors that go into a particular encoding. Still looking for research papers that focus on this.
They tackle the problem of learning very long sequences, even upto 16000 tokens with traditional RNNs including LSTM. The trick seemed to be a very simple but a super effective one. From the abstract,
We present a simple method to improve learning long-term dependencies in recurrent neural networks (RNNs) by introducing unsupervised auxiliary losses. These auxiliary losses force RNNs to either remember distant past or predict future, enabling truncated backpropagation through time (BPTT) to work on very long equences. We experimented on sequences up to 16 000 tokens long and report faster training, more resource efficiency and better test performance than full BPTT baselines such as Long Short Term Memory LSTM) or Transformer.
Very Interesting!
I think they could have come up with a generalised model for deep neural networks. For example, replacing a single letter or a word in the corpus can lead to drastic changes in the output. Basically, ‘word’ or a ‘pixel’ is a weight relative to the input-output function.
Am I missing some details, do you think one can build a DNN model which can find these adversaries?
Actually, it’s not what adversarial examples mean.
Adversarial examples mean some very little perturbations to the input to the model so that it’s actual label doesn’t change(means it appears same to humans) but model gets fooled.
Changing the character in a word may change it’s actual meaning as well to the humans. For example - (Fall-Tall) and certainly it can change the meaning of the sentence.
At a high level intuition… Cnn’s try to recognise patterns across ‘space’. Rnn’s try to recognise patterns across ‘time’. Most of the papers that I came across, try to tackle adversaries in ‘space’. It makes sense to say that there can be adversaries that can occur in ‘time’. That is what i meant by stating to change a letter. A better intuition would be to change the sentence structure or arrangement. For example, a sarcastic sentence. Would that classify as an adversary?
or try this…
I notice a discrepancy in thoughts, when I come across this topic.
This paper does a controlled experimental analysis of some of the recent object detectors and compares the performance of Faster R CNN’s, R FCN and SSD architectures.
Might help someone choose an appropriate object detection method depending on speed, accuracy or memory footprint requirements.