When you train a critic, it use an AdaptiveLoss with BCEWithLogitsLoss.
This is because the output of the model of the critic spits out a dynamic size tensor as an output.
For example an input tensor of shape [4, 3, 128, 128] will be outputted as [4, 25]
here is the code for the model
def gan_critic(n_channels=3, nf=128, n_blocks=3, p=0.15): "Critic to train a `GAN`." layers = [ _conv(n_channels, nf, ks=4, stride=2), nn.Dropout2d(p/2), res_block(nf, dense=True,**_conv_args)] nf *= 2 # after dense block for i in range(n_blocks): layers += [ nn.Dropout2d(p), _conv(nf, nf*2, ks=4, stride=2, self_attention=(i==0))] nf *= 2 layers += [ _conv(nf, 1, ks=4, bias=False, padding=0, use_activ=False), Flatten()] return nn.Sequential(*layers)
Then later the Adaptive loss will expand say, a tensor of shape [4,1] which for example [1, 0, 1, 0] will correspond to [real_img, gen_img, real_img, gen_img] into [4, 25] to match the output size.
This will look something like [[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]
of which we then take the BCEWithLogitsLoss of. Why not just have the gan_critic output something of shape [4,1] and then take the loss of a Non Adaptive Loss? What benefit has the current approach have against the other?
Here is the AdaptiveLoss code
class AdaptiveLoss(Module): "Expand the `target` to match the `output` size before applying `crit`." def __init__(self, crit): self.crit = crit def forward(self, output, target): return self.crit(output, target[:, None].expand_as(output).float())