Fastai / PyTorch in Production


(Jeremy Howard) #24

Thanks for sharing. Keep us informed about how it goes :slight_smile:


(Alec Rubin) #26

No problem, will do!


(Marco) #27

I have a different approach to this: I am using fastai directly in production on my own server.

I load the model in one process and keep it loaded, using a dataloader which specifies dummy files in the train and test directories. When I receive images, I delete the test directory, write the images to disk and refresh the model with the new data loader. This way, I get all the augmentation steps and can run TTA when predicting. I am sure this a slow way to do it, although it is on an NVMe disk. Round trip time is 700ms (not using TTA).

My method is based on this article: https://www.pyimagesearch.com/2018/02/05/deep-learning-production-keras-redis-flask-apache/


(Alec Rubin) #28

Interesting. Seems like a solid way to do TTA, I could look into adding support for TTA in the repo I set up. I’d recommend you try out the Serverless approach tho, it’s so easy to maintain, and costs like $1 to process ~100k requests. Depending on how many requests you do a month, you may actually be paying more in electricity to power your home server vs having a lambda up.


(Marco) #29

Thanks, I will definitely check it out


(Alec Rubin) #31

@nextM I was able to implement TTA, but seems to have barely any effect (on the dogs/cats, and 2 other datasets) and it more than doubled the time to return result. Have you noticed enough of an accuracy increase when using TTA that it’s worth the extra processing time?


(Subash Gandyer) #32

Hi,
I’ve been monitoring this thread as my immediate work is to push my GPU trained Fastai model into production. @alecrubin, Is your model trained in a GPU or CPU environment? If in GPU, did you take any measures for bringing into CPU powered AWS Lambda environment? I am having problems with taking my GPU trained model into CPU environment for inference. Kindly help me out with any solutions possible.

Subash


Exposing DL models as api's/microservices
(Alec Rubin) #33

@gandyer You’re not going to believe how easy this is… just call model.cpu() before you save your state dictionary and it should work. Then, if you want to bring model back to GPU, just call model.cuda().


(Subash Gandyer) #34

@alecrubin Thanks for the tip. Can you provide me a simple snippet of how to do this of saving a state_dict, loading a dict and predicting it with a single image? Maybe I am saving the model and loading and predicting the model in a different way. Thanks in advance for the solutions.

Subash


(Alec Rubin) #35

@gandyer to save the state dict from the learner, you’ll need to do this.

# Convert your model to CPU mode if you trained on GPU
learner.model.cpu()

# Save your state dictionary
learner.save('model.h5')

# If you converted your model to CPU, bring it back to GPU 
learner.model.cuda()

When you load your model in production, it won’t be wrapped in a learner, so that will look something like this.

# Use torch to load your state dictionary from path
state_dict = torch.load('path/to/model.h5', map_location=lambda storage, loc: storage)

# Then load the state dict into your model
model = YourModel()
model.load_state_dict(state_dict)

To get a prediction from your model, you will need to preprocess your image into a tensor. After that, all you need to do is this.

# This is your image transformed and normalized into a tensor
img = IMAGE_TENSOR

# Your input needs to be a batch wrapped in a Variable
inp = VV_(torch.stack([img]))

# Then just pass your input to the model and it will return the predictions
out = model(inp)

(Alec Rubin) #36

Just an update for anyone who may have tried to implement this but had issues. Turns out that unless you are running Linux on your local machine, you will need to have docker running when deploying your function. So, setting dockerizePip: true in the serverless.yml is not optional as previously stated in the readme.


(Sam) #37

There’s been some talk of how this will transition to PyTorch 1.0, which is meant to use Caffe under the hood. I haven’t tried it, but I think the following should work:

SCAR works with Theano and Darknet, so I think it should be possible to get working.

It would still not have access to any GPUs (yet - I’m sure this is on AWS’ radar)


(Sam) #38

Also, I should have said this right away, but fantastic work @alecrubin