Really appreciate you posting this.
Minor correction: It’s spelled kwargs_handlers.
from accelerate.utils import DistributedDataParallelKwargs
kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
with learn.distrib_ctx(kwargs_handlers=[kwargs]):
....