Computer vision API services vs our own vm/server

I finished lecture 1 and am on lecture 2. Just wondering with all the image classification stuff being offered by services such as Azure Computer Vision API, what is the benefit to following this course and creating our own vm with a gpu, etc, and training our own models? Wouldn’t it be easier to just leverage that instead?