Edit: Fixed, found the solution. The pickled learner was performing preprocessing (scaling by 8x) on the inputs when .predict was called, and I have to match that exactly when calling .trace() and when running .forward() on the new module. The nightly version of pytorch indeed lets this work on DynamicUnet.
I’m trying to convert a DynamicUnet (Dynamic UNet – fastai) for production. I’ve tried using TorchScript tracing, but it ends up using over 40+ GB of memory (when tracing is done on the CPU! GPU just crashes), when normally predicting with it uses next to none. Has anyone encounted any fixes for this?
I’m running it like this:
learner = load_learner(os.path.dirname(model_fn), os.path.basename(model_fn))
dummy_img = pt.ones(1, 3, 3264, 2448).cuda()
jit_model = pt.jit.trace(learner.model, dummy_img)
pt.jit.save(jit_model, output_model_fn)
Does anyone have any ideas on why DynamicUnet in particular is blowing up? Calling learner.predict is working fine and uses very little memory.
I’ve been using the nightly of FastAI as the current 1.2.0 has issues with hooks or something which were solved, though you can also manually edit out one line to fix the issue of hooks.
I’m wondering if the DynamicUNet naturally splits the input into tiles to run and somehow torch.jit.trace isn’t doing the same.
More discussion I’ve found on this:
Edit: Fixed, found the solution. The pickled learner was performing preprocessing (scaling by 8x) on the inputs when .predict was called, and I have to match that exactly when calling .trace() and when running .forward() on the new module. The nightly version of pytorch indeed lets this work on DynamicUnet.