Created my first image classifier & issue with Colab not adhering to "seeding"

I created a simple image classifier, and with limited data (~800 images) I got it up to an error rate of under 5%. All the attempts are in colabs, see my blog post about it: Fly a drone with: Image classification – dronelab.dev (and no, I didn’t use it to fly my drone, yet!).

Something that I found very annoying is that sequential training runs, without changing anything, had different results. The differences were quite substantial, sometimes there was over-fitting, other times not.

I did all the tricks (I think) to seed all random generators:

splitter=RandomSplitter(valid_pct=0.2, seed=42)

and,

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

seed_everything(42)

We now have Gemini inside colab, which is great, and I asked it about it, and this was the answer:

You’ve done a good job providing a seed for reproducibility with seed_everything(42). However, even with careful seeding, variations in results can sometimes occur in deep learning training for a few reasons, especially within an environment like Colab:

  • GPU Non-Determinism: While you’ve set torch.backends.cudnn.deterministic = True, not all operations on the GPU are guaranteed to be completely deterministic across different runs or hardware configurations. Some highly optimized kernels might involve non-deterministic algorithms for performance reasons.

  • Background Processes in Colab: Colab is a shared environment. While unlikely to cause significant variations, very minor background processes or resource allocation differences could theoretically have a minuscule impact on timing and execution flow, though this is generally not the primary cause of noticeable differences.

How can we work with this? Between experiments I expect that everything remains the same, except the thing I want to experiment with.

I hope there is a solution for this?

I found the solution, and I blogged about it here: Getting reproducible training results with Fast.ai + PyTorch – dronelab.dev

Hello!
Thanks for the info I will try to figure it out for more.