I found a way to speed up our test suite, using parallel testing.
pip install pytest-xdist
$ time pytest
real 0m51.069s
$ time pytest -n 6
real 0m26.940s
half the time - not bad!
We just need to fix the temp files creation to use a unique string (pid?), otherwise at times some tests collide in a race condition over the same temp file path.
And if you have a powerful machine, you might be able to crank it up even more. I run out of 8GB CUDA memory when using more than 8 workers.