Deep convolutional neural networks are very good at learning features, but for most tasks, they have to be several layers deep, which takes a hit on computation time. This is particularly a problem for embedded offline systems, that usually dont have a GPU.
Has anybody tried to squash multiple convolutional layers to produce same (or slightly less accurate) at a huge speed boost?
I was thinking if it was possible to train a 1/3(or something smaller) deep vgg16’s last layer to produce the same output as a full sized vgg 16, which would effectively produce the same result as a full sized vgg 16.
My reasoning is that it may take many layers and millions of images to produce the features of a vgg 16 , but once it ‘knows’ about certain features, the same features can be produced by a much more shallow network.