Fastai / PyTorch in Production

Even · May 27, 2018, 5:02am

It seems like Nvidia’s TensorRT might be worth looking into as well in terms of inference of ONNX models in production, although that won’t work with serverless. But it seems like it’s designed for GPU inference at scale.

alecrubin · May 27, 2018, 6:18am

Yeah, if you look back at initial post, I mentioned using a decorator to persist the model in memory between requests. To clear everything up, the 700-900ms is time it takes the lambda to parse the URL from a query string, download the image from the URL (1280x1280, ~1.5mb in my case), resize & preprocess the image input, return a prediction, and format the result into a JSON response.

Here’s how my handler is set up.

class SetupModel(object):
	model = model()
	labels = list(label_indices_dict.values())
	
	def __init__(self, f):
		self.f = f
		file_path = f'/tmp/{state_dict_key_name}'
		s3.download_file(bucket_name, state_dict_key_name, file_path)
		state_dict = torch.load(file_path, map_location=lambda storage, loc:storage)
		self.model.load_state_dict(state_dict)
		self.model.eval()
		os.remove(file_path)

	def __call__(self, *args, **kwargs):
		return self.f(*args, **kwargs)

def build_pred(label_idx, log, prob):
	label = SetupModel.labels[label_idx]
	return dict(label=label, log=float(log), prob=float(prob))

def predict(content):
	batch = []
	with PIL.Image.open(content) as im:
		im = im.convert('RGB')
		batch.append(valid_transform(im))
	inp = torch.autograd.Variable(torch.stack(batch, dim=0), volatile=True)
	return SetupModel.model(inp)

@SetupModel
def handler(event, _):
	try:
		img_qs = event['queryStringParameters'].get('image_url')
		img_url = urllib.parse.unquote_plus(img_qs)
		img = urllib.request.urlopen(img_url)
		out = predict(img).data.numpy()
		logs, probs = out[-1], np.exp(out)[-1]

		n_results = int(event['queryStringParameters'].get('top_k', 0))
		n_results = min(max(n_results, 1), len(SetupModel.labels))
		top_k = out.topk(n_results, sorted=True)[-1][0]
		preds = [build_pred(i, logs[i], probs[i]) for i in list(top_k)]

		response_body = dict(predictions=preds)
		response = dict(statusCode=200, body=response_body)

	except Exception as e:
		response_body = dict(error=str(e), traceback=traceback.format_exc())
		response = dict(statusCode=500, body=response_body)

	response['body'] = json.dumps(response['body'])
	print(response)
	return response

alecrubin · May 30, 2018, 7:27pm

For anyone interested, I put together a base template showcasing this technique for a single label image classifier. We will be using this as our jumping off point for anyone who is interested in contributing or following along. https://github.com/alecrubin/pytorch-serverless/

jeremy · May 30, 2018, 11:33pm

Thanks for sharing. Keep us informed about how it goes

alecrubin · May 31, 2018, 1:33am

No problem, will do!

nextM · May 31, 2018, 6:59am

I have a different approach to this: I am using fastai directly in production on my own server.

I load the model in one process and keep it loaded, using a dataloader which specifies dummy files in the train and test directories. When I receive images, I delete the test directory, write the images to disk and refresh the model with the new data loader. This way, I get all the augmentation steps and can run TTA when predicting. I am sure this a slow way to do it, although it is on an NVMe disk. Round trip time is 700ms (not using TTA).

My method is based on this article: https://www.pyimagesearch.com/2018/02/05/deep-learning-production-keras-redis-flask-apache/

alecrubin · May 31, 2018, 6:16pm

Interesting. Seems like a solid way to do TTA, I could look into adding support for TTA in the repo I set up. I’d recommend you try out the Serverless approach tho, it’s so easy to maintain, and costs like $1 to process ~100k requests. Depending on how many requests you do a month, you may actually be paying more in electricity to power your home server vs having a lambda up.

nextM · June 1, 2018, 10:39am

Thanks, I will definitely check it out

alecrubin · June 3, 2018, 12:38am

@nextM I was able to implement TTA, but seems to have barely any effect (on the dogs/cats, and 2 other datasets) and it more than doubled the time to return result. Have you noticed enough of an accuracy increase when using TTA that it’s worth the extra processing time?

gandyer · June 3, 2018, 7:13pm

Hi,
I’ve been monitoring this thread as my immediate work is to push my GPU trained Fastai model into production. @alecrubin, Is your model trained in a GPU or CPU environment? If in GPU, did you take any measures for bringing into CPU powered AWS Lambda environment? I am having problems with taking my GPU trained model into CPU environment for inference. Kindly help me out with any solutions possible.

Subash

alecrubin · June 3, 2018, 10:55pm

@gandyer You’re not going to believe how easy this is… just call model.cpu() before you save your state dictionary and it should work. Then, if you want to bring model back to GPU, just call model.cuda().

gandyer · June 9, 2018, 6:59pm

@alecrubin Thanks for the tip. Can you provide me a simple snippet of how to do this of saving a state_dict, loading a dict and predicting it with a single image? Maybe I am saving the model and loading and predicting the model in a different way. Thanks in advance for the solutions.

Subash

alecrubin · June 9, 2018, 10:24pm

@gandyer to save the state dict from the learner, you’ll need to do this.

# Convert your model to CPU mode if you trained on GPU
learner.model.cpu()

# Save your state dictionary
learner.save('model.h5')

# If you converted your model to CPU, bring it back to GPU 
learner.model.cuda()

When you load your model in production, it won’t be wrapped in a learner, so that will look something like this.

# Use torch to load your state dictionary from path
state_dict = torch.load('path/to/model.h5', map_location=lambda storage, loc: storage)

# Then load the state dict into your model
model = YourModel()
model.load_state_dict(state_dict)

To get a prediction from your model, you will need to preprocess your image into a tensor. After that, all you need to do is this.

# This is your image transformed and normalized into a tensor
img = IMAGE_TENSOR

# Your input needs to be a batch wrapped in a Variable
inp = VV_(torch.stack([img]))

# Then just pass your input to the model and it will return the predictions
out = model(inp)

alecrubin · June 18, 2018, 6:12pm

Just an update for anyone who may have tried to implement this but had issues. Turns out that unless you are running Linux on your local machine, you will need to have docker running when deploying your function. So, setting dockerizePip: true in the serverless.yml is not optional as previously stated in the readme.

samh · August 29, 2018, 8:55pm

There’s been some talk of how this will transition to PyTorch 1.0, which is meant to use Caffe under the hood. I haven’t tried it, but I think the following should work:

Put all of this in a Caffe docker image https://hub.docker.com/r/bvlc/caffe/
Use SCAR to deploy the docker container to Lambda

SCAR works with Theano and Darknet, so I think it should be possible to get working.

It would still not have access to any GPUs (yet - I’m sure this is on AWS’ radar)

samh · August 29, 2018, 8:56pm

Also, I should have said this right away, but fantastic work @alecrubin

Rhemingway · December 15, 2018, 4:36am

Thank you for the help. Does anybody know how to get “YourModel()” to a new machine that does not have fast.ai installed?

samh · December 17, 2018, 8:57pm

Following up on this. Something I didn’t see coming was Lambda supporting other runtimes. Since there is now a C++ runtime for Lambda, it would make sense to export the model in the C+±only format, then load it into Caffe2 in a Lambda function (or using the Beta Pytorch C++ library - https://pytorch.org/cppdocs/).

ktrivedi · December 21, 2018, 7:01am

I have used Sagemaker to deploy a multi label text classification fastai based model. I had to create a custom docket container that contained the libraries and the Inference code. The output is generally received in about 100milli seconds. Will share snippets of the code soon.

nobita · April 8, 2019, 2:38am

After trying the solutions offered by the big cloud providers, I find that deploying fastai & PyTorch to Azure Functions to be the easiest. I wrote a quick deployment guide that can be found from the course pages (https://course.fast.ai/deployment_azure_functions.html).

Here’s a thread I created with more info, feel free to leave any comments or feedback from your experience: