So-called "single-instruction, multiple-data (SIMD)" parallelism- is this why GPUs are faster on average than CPUs?

toomanyrichies · July 30, 2021, 9:53pm

I’m reading a book on computer science called “Computer Systems: A Programmer’s Perspective”. In the first chapter, it talks about 3 kinds of parallelism / concurrency- thread-level parallelism, instruction-level parallelism, and single-instruction, multiple-data parallelism. It defines #3 as follows:

At the lowest level, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel, a mode known as single-instruction, multiple-data (SIMD) parallelism. For example, recent generations of Intel and AMD processors have instructions that can add 8 pairs of single-precision floating-point numbers (C data type float) in parallel.

These SIMD instructions are provided mostly to speed up applications that process image, sound, and video data. Although some compilers attempt to auto- matically extract SIMD parallelism from C programs, a more reliable method is to write programs using special vector data types supported in compilers such as gcc.[1]

I know we often have to pass special params such as device_type='CUDA' in order to get our FastAI code to compile for a GPU architecture, and after reading the above I’m guessing that this is because we want to take advantage of this SIMD parallelism, which GPUs are architected for in a way (or to a degree) that vanilla CPUs are not. Is this accurate, or am I off the mark?

Randal E. Bryant and David R. O’Hallaron- “Computer Systems: A Programmer’s Perspective”, 3rd edition, p. 56

Conwyn · July 31, 2021, 3:54pm

Hi Richie

The CPU is a faster processor. The GPU is a collection of slow processor but there a common memory area so basic you prepare a C language program source on the CPU side, load up your data into an array so one entry for each mini CPU on the GPU and say to the GPU run the same program on all your mini CPUs at the same time. On my simple laptop the GPU does 10 times more processing on a formula using floating points operations

Regards Conwyn

toomanyrichies · July 31, 2021, 8:40pm

@Conwyn thanks, I think I understand now.