I am following the great Road to the top tutorial for computer vision. (If you don’t know about it, it starts here, and there are four parts to it.)
Things are going relatively well (minor problems like not being able to use
report_gpu(), I think due to a slightly old CUDA version).
Before setting up an ensemble we are testing different architectures, and I got the first few working, like:
arch = 'convnext_large_in22k' # accum = 4 ok learn = train(arch, size=320, accum=4, fine_tune=False, epochs=1, batch_size=64)
arch = 'vit_large_patch16_224' # accum = 2 ok learn = train(arch, size=224, accum=2, fine_tune=False, epochs=1, batch_size=64)
But when I try the swim family of architectures I get an error:
arch = 'swinv2_large_window12_192_22k' # accum = ? learn = train(arch, size=192, accum=8, fine_tune=False, epochs=1, batch_size=64)
RuntimeError: running_mean should contain 12 elements not 3072
arch = 'swin_large_patch4_window7_224' # accum = ? learn = train(arch, size=224, accum=8, fine_tune=False, epochs=1, batch_size=64)
RuntimeError: running_mean should contain 14 elements not 3072
The only difference is that I am doing multi-label classification, but since all works fine with the other architectures I don’t think that’s the issue.
Library versions should be good as well, I think, since I set up the whole thing only last week. In any case:
For the time being I will continue without using swin models, but I would appreciate any hints.