When you train a critic, it use an AdaptiveLoss with BCEWithLogitsLoss.
This is because the output of the model of the critic spits out a dynamic size tensor as an output.
For example an input tensor of shape [4, 3, 128, 128] will be outputted as [4, 25]
here is the code for the model
def gan_critic(n_channels=3, nf=128, n_blocks=3, p=0.15):
"Critic to train a `GAN`."
layers = [ _conv(n_channels, nf, ks=4, stride=2), nn.Dropout2d(p/2), res_block(nf, dense=True,**_conv_args)]
nf *= 2 # after dense block
for i in range(n_blocks):
layers += [ nn.Dropout2d(p), _conv(nf, nf*2, ks=4, stride=2, self_attention=(i==0))]
nf *= 2
layers += [ _conv(nf, 1, ks=4, bias=False, padding=0, use_activ=False), Flatten()]
return nn.Sequential(*layers)
Then later the Adaptive loss will expand say, a tensor of shape [4,1] which for example [1, 0, 1, 0] will correspond to [real_img, gen_img, real_img, gen_img] into [4, 25] to match the output size.
This will look something like [[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]
of which we then take the BCEWithLogitsLoss of. Why not just have the gan_critic output something of shape [4,1] and then take the loss of a Non Adaptive Loss? What benefit has the current approach have against the other?
Here is the AdaptiveLoss code
class AdaptiveLoss(Module):
"Expand the `target` to match the `output` size before applying `crit`."
def __init__(self, crit):
self.crit = crit
def forward(self, output, target):
return self.crit(output, target[:, None].expand_as(output).float())