Road to the top, Part 3: running_mean error with swin models

Pablo · June 13, 2023, 10:13am

I am following the great Road to the top tutorial for computer vision. (If you don’t know about it, it starts here, and there are four parts to it.)

Things are going relatively well (minor problems like not being able to use report_gpu(), I think due to a slightly old CUDA version).

Before setting up an ensemble we are testing different architectures, and I got the first few working, like:

arch = 'convnext_large_in22k'  # accum = 4 ok
learn = train(arch, size=320, accum=4, fine_tune=False, epochs=1, batch_size=64)

and

arch = 'vit_large_patch16_224'  # accum = 2 ok
learn = train(arch, size=224, accum=2, fine_tune=False, epochs=1, batch_size=64)

But when I try the swim family of architectures I get an error:

arch = 'swinv2_large_window12_192_22k'  # accum = ?
learn = train(arch, size=192, accum=8, fine_tune=False, epochs=1, batch_size=64)

RuntimeError: running_mean should contain 12 elements not 3072

or

arch = 'swin_large_patch4_window7_224'  # accum = ?
learn = train(arch, size=224, accum=8, fine_tune=False, epochs=1, batch_size=64)

RuntimeError: running_mean should contain 14 elements not 3072

The only difference is that I am doing multi-label classification, but since all works fine with the other architectures I don’t think that’s the issue.

Library versions should be good as well, I think, since I set up the whole thing only last week. In any case:

fastai==2.7.12
fastcore==1.5.29
timm==0.9.2

For the time being I will continue without using swin models, but I would appreciate any hints.

jealk · August 6, 2023, 5:51pm

Similar issue and setup

jvinarek · August 14, 2023, 6:46pm

Pinning the timm library to version 0.6.13 worked for me.

marsgrins · March 8, 2024, 10:59pm

Amazing, thank you!

fatdunky · April 3, 2024, 7:51am

For me, i had to set timm to:

"timm==0.6.2.dev0"

shazeghi · May 9, 2025, 7:50pm

I am getting the same error except for my training I am using the following code:

def get_dls(bs=64, itm_sz=224, batch_sz=128):
    dls = DataBlock(
        blocks = (ImageBlock, CategoryBlock),
        n_inp = 1,
        get_items = get_image_files,
        splitter = RandomSplitter(valid_pct=0.2),
        get_y = parent_label,
        item_tfms=Resize(itm_sz),
        batch_tfms=aug_transforms(size=batch_sz, min_scale=0.75)
    ).dataloaders(path/'Train', bs=bs)

    return dls

# let's make a quick function for training
def train(arch, n_epochs=3, lr=0.01, dl=dls):
    learner = vision_learner(arch=arch, dls=dl, metrics=error_rate).to_fp16()

    learner.fit(n_epochs, lr=lr)
    return learner


! pip install -Uq "timm==0.6.2.dev0" # need this for swin to work
swin = train('swin_tiny_patch4_window7_224', dl=get_dls(batch_sz=224))

As you can see I tried pinning timm to 0.6.2.dev0 (as well as 0.6.13), however I still get the error

   2811 
-> 2812     return torch.batch_norm(
   2813         input,
   2814         weight,

RuntimeError: running_mean should contain 14 elements not 1536