I’m trying to implement a continuous training loop so that we can get new images and re-trained the model to get better and better with the passing of time.
At the moment we are working with 3 folders:
/train
/val
/test
My doubt was when the new images arrive, would it be better to move the old /test images to /train and /val and use the new set of images as the /test set? Or maintain the /test images as /test and separate the new images in /train /val and also /test.
We have a similar workflow, where we keep getting new data for our speech recognition models.
I would suggest focussing on maintaining the same distribution of test-valid-train splits. So when you have new data split them and add to the existing corpus maintaining the same ratio of splits.
From there, re-train the model and evaluate on the test-set. You might want to analyze how the model does on the new images in the test set, etc. One thing to note here would be to never mix data from train and test so not move old train data into the test set if you are going to compare the old models with the newly trained models.
This is just a suggestion, like everything else in deep learning I think there would be some testing and poking around before you end with a good recipe.
As suggested by @diptanuc maintain the same distribution, i.e. if you’re doing an 80/10/10 split, do the same with the new images added to the whole set. The drawback is that it becomes slightly complicated to compare older models with new models. This would be what I recommend.
Is there a time-component to your images and predictions that make you want to put the newest images in test? If that’s the case, then I think it makes sense to do what you have described by adding the newest images to test and slowly moving over the older ones to the val/train sets. This will still make comparison across older models and new models difficult.
Create a fixed test set that is manually curated that has particular qualities (image style, size, time, etc.), and always hold that out as a test. Periodically update this test set as the characteristic of your model and performance criteria change. This allows you to compare old and new models.
I usually keep metadata about data(like date of ingestion) in one of the columns , and so even though newer data are split up and land on the existing dataset. I can get test accuracy on newer data/or other attributes by changing the query.
We don’t really have a time-component for the images. The only problem is that we have a lot of different variations in the images. We are currently working with blisters and they can come in a lot of colors both the pill and the blister.