I have written a post where I play around with training multiple models at the same time on a single GPU. Sort of a Pytorch ddp but for one GPU.
Everyone talks mostly about the next 1 billion+ parameter model, but I have lots of small, even tiny, models which still takes a while to train due to large data volumes and inefficient use of the GPU.
I would like to be able to, simply define a model, a learner and just start training:
parallel_models = 10
drm = DataParallelEnsembleModule(n=parallel_models, modelfn=RegModel)
learn = Learner(
data,
drm,
lr=lr,
cbs=[AvgWeightsCallback],
loss_func=partial(xpemloss, lossfn=MSELossFlat()),
opt_func=partial(SGD, mom=0.9),
)
learn.fit_one_cycle(4)
Full implementation in the post - which btw. is written as a notebook and exported using nbdev_nb2md - not quite fastpages - but I believe it still counts.