I am working on an image2image translation project with various style transfer GANs (CycleGAN, MUNIT, StarGAN, and DiscoGAN). I am working with the handbags2shoes data used in the DiscoGAN paper and I am not yet pleased with the results. I am getting some good results for individual shoes or handbags, but a large portion is still returning semi-shoe/bag-shaped blobs. E.g.
Has anyone had any success tuning any training hyper-parameters to optimize the performance of these models?
So far I have tried the following:
Deeper generator architectures (any tips here would be appreciated),
I spent some time expanding my dataset and trying to scape shoes and bags that are oriented the same as the original dataset (around 150’000 images for each category),
Replace my transposed convolutions with upsampling operations to remove the checkered patterns (https://distill.pub/2016/deconv-checkerboard/)
Since there are a lot of potential changes that could be made and evaluation of the performance is less straight forward than checking the loss functions, any help would be greatly appreciated!
All of my current implementations start from the official GitHub repos and take their training params (as outlined in their respective publications).