Kaggle: NIPS 2017 Research Competition Adversarial Attack

Hello together,

there is a research competition at Kaggle about Adversarial Attack.
I think the format is really great to learn something (Ian Goodfellow supports the forum).


I’m quite interested in the 3rd kaggle NIPS 2017 GAN sub-competition: Building adversarial hack resistant models.

One of the things that I keep thinking about is how knowledge is hierarchically represented in images. edges->simple shapes->complex shapes->objects->scene.
A cat looks like a cat because it has cat like eyes/nose/whiskers in the previous layer…and each of those things are made up of further cat like things in the lower layers.
Even humans use this conceptual knowledge of object hierarchy to identify objects.

But in the GAN hacks, we seem to tweak just pixels, and try to fool the model into predicting an object of the wrong class…this happens only because we do not ‘conceptually’ store the knowledge of object hierarchies. While CNNs learn the object hierarchies at a pixel level, they seem to ignore the conceptual meaning of objects found in the lower layers. Only the last prediction layer neurons are mapped to labels…

What if we gave labels to objects found in the earlier layers as well? For example, if we use Matt Zeiler’s deconv idea or any other top-down visualization idea, we can take ‘cat eyes’ and ‘cat nose’ and give them labels. We can separately maintain a label hierarchy to say that ‘cat eyes’ and ‘cat nose’ are great indicators that the next layer actually contains a ‘cat’, rather than a ‘toaster’.

I’m talking about moving from pixel manipulation to better knowledge representation (conceptually).

Conceptual knowledge representation is being done in other areas as well…for example RL for game play.

There is no way any adversarial hack can be done purely by pixel manipulation if we conceptually understood the image.

Heck, we can even take this a step further and map the label hierarchy we have learned to word2vec labels. We should be able to get synonyms, hypernyms and make a really secure adversarial hack resistant system.

Let me know what you guys think…thanks for your time!


Since the time I wrote the above post, I’ve been following research in this area and come across some fascinating work. Here are some links that others interested in this topic might find useful:

  1. Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning by Nicolas Papernot, Patrick McDaniel : They try to map the labels ‘y’ to representations learnt in each layer of the network. They then setup an artificial boundary using the fast approximate nearest neighbors (see lecture 10 for more info) to check if the model doesn’t veer off too much during inference time.

  2. Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples by Jake Zhao, Kyunghyun Cho : In this work, the authors use a similar nearest neighbors approach but nicely create a prior that applies pressure on the model to stay within semantic boundaries and as a result creates a more robust model than vanilla CNNs.

1 Like