GeForce RTX 2080 Ti unexpectedly poor memory usage

I have previously been running predictions on a system with dual GTX 1080 Ti GPUs. Predictions were done using a batch size of 3, using fp16. This configuration fit comfortably within the 11Gb of memory and would typically use a maximum of 9.5 Gb memory.

I now have cloned the software environment of that system on a new machine with dual 2080 Ti GPUs. When running predictions with the same input data and model as on the older system I now get an out of memory error.

RuntimeError: CUDA out of memory. Tried to allocate 1.22 GiB (GPU 1; 11.00 GiB total capacity; 6.97 GiB already allocated; 102.34 MiB free; 1.41 GiB cached)

So it seems there is some interplay with the driver and new card that is causing memory to be more fragmented, or at least less available than on the older GPUs.

I’m curious if anyone else has run across this issue or might have some suggestions or work-arounds.

Current driver is:
| NVIDIA-SMI 418.96 Driver Version: 418.96 CUDA Version: 10.1 |

So far I’ve already tried installing the latest Nvidia driver 425.25, but got the same results.

Thank you,

Full configuration info below:

=== Software ===
python        : 3.7.2
fastai        : 1.0.50.post1
fastprogress  : 0.1.20
torch         : 1.0.1
torch cuda    : 10.0 / is available
torch cudnn   : 7401 / is enabled

=== Hardware ===
torch devices : 2
  - gpu0      : GeForce RTX 2080 Ti
  - gpu1      : GeForce RTX 2080 Ti

=== Environment ===
platform      : Windows-10-10.0.16299-SP0
conda env     : fastai_v1
python        : C:\Users\lproc\AppData\Local\Continuum\anaconda3\envs\fastai_v1\python.exe
sys.path      : C:\Users\lproc
no nvidia-smi is found has some RTX 2080 Ti’s (when they are available). You could run some comparative tests there.