I’m reading a book on computer science called “Computer Systems: A Programmer’s Perspective”. In the first chapter, it talks about 3 kinds of parallelism / concurrency- thread-level parallelism, instruction-level parallelism, and single-instruction, multiple-data parallelism. It defines #3 as follows:
At the lowest level, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel, a mode known as single-instruction, multiple-data (SIMD) parallelism. For example, recent generations of Intel and AMD processors have instructions that can add 8 pairs of single-precision floating-point numbers (C data type float) in parallel.
These SIMD instructions are provided mostly to speed up applications that process image, sound, and video data. Although some compilers attempt to auto- matically extract SIMD parallelism from C programs, a more reliable method is to write programs using special vector data types supported in compilers such as gcc.[1]
I know we often have to pass special params such as device_type='CUDA'
in order to get our FastAI code to compile for a GPU architecture, and after reading the above I’m guessing that this is because we want to take advantage of this SIMD parallelism, which GPUs are architected for in a way (or to a degree) that vanilla CPUs are not. Is this accurate, or am I off the mark?
- Randal E. Bryant and David R. O’Hallaron- “Computer Systems: A Programmer’s Perspective”, 3rd edition, p. 56