I have a Geforce 1060 running in my desktop computer. Sure, this isn’t among the top end of graphics cards but it has 6GB memory and is pretty decent. There was an example that Jeremy ran in Lesson 1 which was timed at 52 seconds and my system needed 57. So I’m perfectly happy with that speed.
Now here’s what I don’t understand. In lesson 3 with the satellite images, I can perfectly do size=128, but when I try size=256, I get a CUDA error that it’s running out of memory.
Now I alroundy found that if I create the databunch with bs=32 (I assume 64 is default?), it works, but I want to understand why…
One image 256x256 has 65536 pixels. With 3 bytes for the colors that’s close to 200KB per image. Load 64 of those at the same time and we’re looking at something like 12MB.
I mean, sure, I understand that we’re not just loading these into memory but also performing costly matrix calculations and have to save all the in-between results etc. I also realize we have an OS and an open browser and can’t use the full 6GB… But given that 12MB is such a tiny fraction of the available 6GB, can somebody help me understand WHY we actually run out of memory?
(Clean Ubuntu installation, nothing else running, using Jupyter Notebook)
edit: I think my size calculation is wrong. When these are images, we indeed use 3 bytes per pixel for the color channels… But for any useful calculations we convert them to torch.float, so 32 bits per float. This makes everything 4 times bigger. So as a floattensor, every image should be close to 800KB and a batch of 64 around 50MB. Still… I don’t see where the memory problem arises…
edit2: It’s because of the back propagation, isn’t it? For a moment, I kind of assumed we have 50MB of data, do some operation on it, forget what the input was and just carry on with the result. But of course we’re gonna calculate a lot of partial derivatives and for these we need to remember the in-between results… And this DOES stack up very quickly to a point where we’re actually using gigabytes of memory… Can someone confirm this or am I totally off here?