Speeding up copying unets between system memory and gpu memory

Has anyone done any work compressing the unet model in memory and/or quickly moving it from system memory to gpu memory?

I am finding it takes about 1.7s to move to system memory and 0.429s to gpu memory using .to(‘cpu’), .to(‘cuda’)

Don’t you just do it once? At init time?

I intend to swap in and out different unets

1 Like