Damn! It works now on V100 GPU.
Some points to note:
- learner object creation takes a very long time on V100 (p3). (On a p2 instance, it doesn’t take more than 5 seconds)
- Infact more than 8-10 minutes.
- fit function also takes time the first time it’s run. And it’s faaaaasttt!
When I set precompute=True, the speed improvement for 1 epoch on V100 is around 13%.
When I set precompute=False, the speed improvement for 1 epoch on V100 is around 63%.
Attached screenshots for reference.