Hi everyone! I’ve been working my way through the fastai course and went off on a bit of a side track, experimenting with whether I could improve my results in the Paddy Rice Kaggle Competition using synthetic data generated from Stable Diffusion images.
I created a LORA by training on the real Tungro grass images. To enable some control over the images I used, I clustered them, and when I fed each image into the LORA, I appended its cluster name to the prompt.
For those who don’t know, when you train a LORA with Stable Diffusion, you can feed in a prompt for each image the model is trained on, like “A picture of grass in a field, close up, muddy water, highly detailed veins, etc.” This prompt that you feed in enables you to have good control over the images you generate with the LORA.
So, I decided to feed in the cluster number of the real images that I had grouped. I had hoped that by inputting “cluster1, cluster2, etc.,” I could then generate images that were a combination of the clusters by using prompts like “cluster1 cluster18, tungro grass disease…” for my generation of images.
I found that “cluster1,” “cluster2,” etc., became very strong embeddings with a small number of images, far stronger than I had anticipated. This allowed for a good degree of control over the images generated with the synthetic data.
I spent some time thinking and trying to come up with a way to check the performance of synthetic + real images vs. real images in training. What I ended up with was using Jeremy’s “road to the top” notebook, but replacing his folder split with a grandparent splitter:
python
def train(arch, size, path, train='train', valid='valid', item=Resize(480, method='squish'), finetune=True, epochs=10, accum=1):
dls = ImageDataLoaders.from_folder(
path,
item_tfms=item,
batch_tfms=aug_transforms(size=size, min_scale=0.75),
bs=64//accum,
splitter=GrandparentSplitter(train_name=train, valid_name=valid)
)
cbs = GradientAccumulation(64) if accum else []
learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
if finetune:
learn.fine_tune(epochs, 1e-3) # Use a lower learning rate
tta_result = learn.tta(dl=dls.test_dl(tst_files))
return learn, tta_result
else:
learn.unfreeze()
learn.fit_one_cycle(epochs, 1e-3) # Use a lower maximum learning rate
return learn, None
To make the test as fair as possible, I created two identical training sets with the same test and validation sets, but I added my synthetic Tungro images to one set. Then, I ran each model with the same architectures for 10 epochs.
My synthetic + real set beats the just real set every time I run it with different architectures by about 0.3%, which seems trivial at first glance. However, I have only added synthetic data to 1/10 of the possible categories, and I trained the images using a Stable Diffusion V1.5 model, not an SDXL one, which should produce considerably better results.
For those interested, the results with convnext_large_in22k, 10 epochs, and TTA were:
With synthetic: 97.580%
Non-synthetic: 97.235%
I’m wondering if anyone has any suggestions of what I could try or if anyone wants to know more or wants to get involved?
Cheers
John