In terms of light weight take a look here:
On CPU single image inference I could bring it down to 732ms via the techniques mentioned.