What does Clearing GPU Cache actually do?

I was training the Language model as shown in Lesson 4 of Part 1 of the Course. At first, after training the frozen model, when I went to finetune it by unfreezing it and started training, I got an error related to Insufficient GPU Memory available.
At this stage, I ran gpu_mem_get_free_no_cache(), which freed up about 6GB of VRAM. I then loaded my saved model, and the finetuning stage ran perfectly. I thought that gpu_mem_get_free_no_cache() removed my model from the memory, and hence after loading the saved model, I was able to train properly.
After completing my finetuning, I decided to predict on a user-input text. I ran gpu_mem_get_free_no_cache() which again freed up about 5.5GB of VRAM. I hoped that trying to predict on a user-text wouldn’t work since the model was cleared by the gpu_mem_get_free_no_cache() code. However, learn_lm.predict(text, words_per_sentence, temperature = 0.5) worked perfectly, even though my VRAM usage was fixed to 2.7GB (while finetuning, the VRAM usage was constant at 7.4GB). So, what does gpu_mem_get_free_no_cache() actually do/clear? Does it free up the space required by the gradients and optimizer states, since it seems the weights aren’t cleared else the model wouldn’t have been able to perform inference?

If you look at the source, it calls the function torch.cuda.empty_cache() .
This topic is discussed here n PyTorch Forum: https://discuss.pytorch.org/t/about-torch-cuda-empty-cache/34232