images in photo_jpg v.s. monet_jpg are imbalanced, with photo_jpg having more images
the images are unpaired and they have unique name
In the Datablock, I want to maintain 2 pools of image files without truncation (monet and photo). For each epoch, I want to randomly pair up them (different pair-up across different epochs) and train the CycleGAN model. At best, I would like to do it in DataBlock level without resorting to middle-level data API.
I started watching your cyclegan video, and really appreciate it. I’m hoping to get more into GANs and continue watching your vid once I have a better understanding of how GANs work, but since there’s nothing in the fastai course, I was wondering if you have any advice? If you don’t mind, did you get your initial understanding of GANs from fastai or did you use other resources?
Yes the current fastai lectures do not have any information on GANs.
I knew about the overall concept of GANs before, probably from some online article. The basics aren’t pretty hard, the generator tries to fool the discriminator with fake images.
IIRC the Part 1 course of 2019 actually had a lecture on GANs which was pretty good and I got a better understanding. The 2018 version also has a lecture I think, but I haven’t taken that version of the course.
Regarding CycleGANs specifically, reading the original paper really helped me, as well as checking the author’s very well-documented PyTorch implementation as well as fastai v1’s implementation. Searching for any issues and questions I had often yielded answers coming from the GitHub repo of the PyTorch implementation, since many people ask similar questions and the original authors answer them.
I hope this helps and let me know if you have any other questions!
thanks for the pointer! The get_dls from your repo is the reference I am looking for!
You use Datasets instead of DataBlock to build the data batch. RandPair(filesB) is used to randomly pair up with image in domain B.
I also found this Siamese tutorial pretty relevant. Right now, I tried to built my fastai data from torch.utils.data.Dataset following this tutorial. However, such approach disable show_batch/ show_results
I would like to take a step further to build the same in a more fastai way. (e.g. from mid-level API). I would take your get_dls as a good reference! Seems get_dls currently doesnt have train/ valid partition. I would like to have different pair up scheme depending on train and valid set (e.g. for valid set, an image in A always pair up with the same image in B)
def get_dls(pathA, pathB, num_A=None, num_B=None, load_size=512, crop_size=256, bs=4, num_workers=2):
Given image files from two domains (`pathA`, `pathB`), create `DataLoaders` object.
Loading and randomly cropped sizes of `load_size` and `crop_size` are set to defaults of 512 and 256.
Batch size is specified by `bs` (default=4).
filesA = get_image_files(pathA)
filesB = get_image_files(pathB)
filesA = filesA[:min(ifnone(num_A, len(filesA)),len(filesA))]
filesB = filesB[:min(ifnone(num_B, len(filesB)),len(filesB))]
dsets = Datasets(filesA, tfms=[[PILImage.create, ToTensor, Resize(load_size),RandomCrop(crop_size)],
[RandPair(filesB),PILImage.create, ToTensor, Resize(load_size),RandomCrop(crop_size)]], splits=None)
batch_tfms = [IntToFloatTensor, Normalize.from_stats(mean=0.5, std=0.5), FlipItem(p=0.5)]
dls = dsets.dataloaders(bs=bs, num_workers=num_workers, after_batch=batch_tfms)
I think currently fastai2 has limited examples on GAN training. Data structure-wise, there are many variations need to be covered (e.g. 2 paired real image sets, 2 unpaired real image set, real image set + noise set … etc.)
Given there are significant changes made on fastai2, particularly on data API, the examples offered in v1 may not easily be translated to v2, particularly on how data batch should be built.