The Tesla V100 PCIe is currently selling for $10,000 (approx, depending on 12GB or 16GB version, and local currencies).
NVidia announced yesterday the launch, already available for purchase, of a new Titan GPU called “Titan V”, which shares quite a lot of specs with the V100 PCIe version, but for $3000.
My 2 cents:
I would get the 4 units of GTX 1080Ti. Why? I think it’d be much more effective to learn how to write NNs for multi-GPUs architectures, in addition to regular single GPU systems, if you’re already at the point of making such heart-ful investments.
As Jeremy might also tell you, having multiple GPUs allow one to test more than one models at the same time (i.e using different hyper-parameters or architectures).
6 months down the block, Nvidia is probably going to release a newer, faster, more salacious GPU card. Again, I believe if one’s work is important enough to warrant the latest and the greatest, it probably becomes inevitable to program for a GPU clustered environment. So, why not start learning that!!
Again my 2 cents and surely your situation could be different.
It’ll will be interesting to see if the tensor performance is enabled on their next generation of GTX cards (? GTX 1180) when they drop (expected Q1).
I agree that having multiple GPUs is very desirable. But, could the increased tensor processing performance make up that difference? I guess we could benchmark on a AWS P3 vs a Paperspace P6000 and / or a home 1080 TI for the shape of problem we are working on to get an idea.
or torch.cuda.set_device(0) for using a specific GPU in PCIe slot “0” in a first notebook, _device(1) for slot “1” in a second notebook, and so on for running several notebooks at the same time in a multi-GPU rig.
What remains unclear is the impact of “640 tensor cores” in the Titan V/Tesla V100, that don’t exist in the 1080Ti or the Titan Xp.
I would much rather have even 2 1080 Ti. The value just really doesn’t seem like it is there on these at least for what I’m working on. for me it is about value and the 1080 Ti is just a much better value than this beast.
Btw there is something else I didn’t think about at first but which can orient your choice to a GTX Titan V even more (and correct me if I’m wrong).
Apart for the new “tensor cores” (which seems to mix fp16/fp32 cores) the interesting thing is fp16 compute capabilities. As said here V100 architecture (Volta) seems to have twice as much compute power than fp32 operations. Which mean that if you manage to get your deep learning framework work on fp16 matrices you divide by 2 your VRAM usage which translates to bigger batch sizes in our code.
So you can consider your current GTX 1080Ti being stuck with fp32 operations on 11gb VRAM (as it does not work well with fp16) and your GTX titan V to use fp16 compute capabilities on 12gb of VRAM but as matrices are taking twice less memory you can think of your Titan V to have 24gb of VRAM.
I didn’t as I’m not an expert so maybe I’m not 100% correct (hence the “correct me if I’m wrong”). The part which looks suspicious to me is the fact that you can turn you float32 matrices to float16 “for free”. I mean if you take a look at what the tensor cores of the volta architecture is about is actually a mix of fp32 and fp16. So now the question is: Why do they do this mix if we can just use fp16 compute capabilities directly? A lot of people (myself included) claim we can just turn our to float16 but it may be more complicated than that.
In any cases what is sure is that volta architecture is tailored for fp16 capabilities and there is a way to run DL models on that “configuration” which translates in all cases to lesser footprint on the GPU VRAM.
The good thing with KN is you can share questions/ressources on hardware/software & maths/stats claims, there will probably be an advanced user to pick it up and confirm/deny it (like Anokas/CPMP/KazAnova/Laurae & co).
I’d post it with a “does it make sense ?”
As I am using a Titan V (and Titan XP) and am trying to benchmark their performance, I moved a previous post to this thread.
As background, I decided to subsidize/rationalize my DeepLearning GPU purchase of a Titan V and Titan Xp by using them for ethereum crypto-mining. As a result, and discussed below, I have come across some puzzling phenomena.
Puzzling phenomena
When I run the following code without any other jobs running, it is significantly slower than when the GPU is running other process. (Specifically, it is under heavy load running crypto-mining software.) I have repeated the trials numerous times to make sure that there were no differences in pre-computing or caching taking place. Moreover, I have tested this off and on over several weeks with the same result. I have used nvidia-smi to verifying what jobs are running on the GPU. Here are the times:
In trying to figure it out, I was wondering is anyone is using either a Titan V or Titan X. If so, I was wondering if they could let me know how long the above code runs for them. This is right out of Lesson1. Note that in learn.fit(0.01, 5), I am running 5 epochs vs. 3.
That is a pre-trained model in those first few steps. It runs “fast” no matter what. I would be more interested in you running the entire notebook. Then, look at the widgets and processing time for the learn.fit operations that are more computationally intensive in the data augmentation and fine tuning sections of that lesson1 notebook. If you could do that, and let me know or post to this thread, it would be appreciated.