3D CNNs are used at least for volumetric data and hyperspectral image classification. There is no advantage to use 3D convolutions for imagenet or cifar 10, because they only have 3 channels . You can think of it this way: both approaches find interesting features, but when 2d cnns find those in only two dimensions (x and y axis), 3d filters account also the z-axis (time, volume, spectral dimension).
Some examples where 3D CNN:s are used:
V-Net for volumetric image segmentation https://arxiv.org/pdf/1606.04797.pdf
Smoke detection on Video Sequences https://link.springer.com/article/10.1007/s10694-019-00832-w
Spectral-spatial classification of Hyperspectral Imagery: https://www.mdpi.com/2072-4292/9/1/67