I was having some troubles understanding the interactions between backend variables and the BFGS optimizer and reading the code wasn’t helping. I decided to code something from scratch to get my ideas straight unfortunately I have not managed to get my code working so I am asking for your help.
I tried to implement the ‘fast gradient sign method’ from this paper Explaining and Harnessing Adversarial Examples. The goal of this algorithm is to make changes to image imperceptible to the naked in order to fool the classifier. From what I understand, the algorithm is pretty simple:
- take a image I and classify it using the neural network
- compute the cross entropy loss with the wrong class
- take the gradient of this loss according to the pixels of I
- compute the sign of the gradient value for each pixel
- multiply this gradient sign matrix by a very small number, this is our pertubation P
- add P to I
My notebook implementation is on my Github page: Adversarial-examples
I manage to do all the steps but the model still does not make any mistake. There are a few things that I didn’t manage to figure out:
- The documentation of keras.metrics.categorical_crossentropy state that the first argument should be the ground truth and the second one the predictions but If I call it this way, all my gradients are 0.
- I have to take big values of epsilon otherwise nothing changes, I think this might have to do with the preprocessing step.
- When I try to plt.imshow my modified picture, it is in negative, I have to call plt.imshow(256 - modified_array).
My final goal was to build a transformation network to create adversarial samples really quickly (even if the method described earlier is fast).
Does anyone have an idea on how to fix my code ?