Using sum of wrong class probs as loss for adversarial generation

I’m basing this on this paper.

In equation 3 they define the loss as:

When given an input x to the DNN, G is trained to generate outputs that fool F (·) and are inconspicuous by minimizing,

\text{Loss}_G(Z,D) - \kappa \sum \text{Loss}_F(x + G(z))

For untargeted attacks, we use:

\text{Loss}_F(x + G(z)) = \sum F_{c_i} (x + G(z)) - F_{c_x} (x + G(z))

where F_c (·) is the DNN’s output for class c.

So if I’m reading this right this is simply the sum of the wrong probabilities minus the probability of the correct class.

But to implement this do I simply do a forward pass of the example x through F() and then sum the wrong probabilities and subtract the correct probabilities for the mini-batch? Then perform err.backward() on that?

I tried this but could not get it to converge. I’m not sure if I’m thinking about it wrong or I just have wrong hyperparameters.