GPU Optimizations Central

Has there been any significant changes in the core code running the model recently? Because earlier when I ran this language model code, a single epoch ran in over 2 hours and there was 96% GPU utilization most of the time. This was about 20 days ago. However, when I run the same code now, I can barely get 70% GPU utilization and the same code for one epoch takes over 12 hours.