Understanding learn.predict() using CrossEntropyLossFlat() for binary segmentation

I have followed muellerzr’s tutorial for binary segmentation https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/07_Binary_Segmentation.ipynb and now have a model.

I want to deploy it on ONNX runtime and have converted the model to ONNX for inference, however this requires that I do all pre and post processing myself and I am a bit stuck. I seem to be missing a step somewhere or making a mistake in implementation because the result I get from my implementation on ONNX does not match my fastai result using

_ , _ , x = learn.predict()

my understanding is that when using CrossEntropyLossFlat(), my result from inference will be passed through an activation function - softmax.

And if I were to return a result like this

_ , y, _ = learn.predict()

The result after activation is decoded using an argmax().

My process at the moment is to:

  1. Preprocess my input image by normalising with imagenet and resize it to match my specification in my fastai dataloader.
  2. Run inference which returns a [244,244,2] array
  3. Softmax my result
    (Here my result already diverges from the fastai implementation)
  4. argmax

Maybe I am missing a denormalise somewhere but I’m not sure where it would go or why it would be necessary

This is my code for ONNX

import onnxruntime
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

def normalize(x, mean, std):
  "Normalize `x` with `mean` and `std`."
  for i in range(x.shape[0]):
    x[i,:,:] -= mean[i]
    x[i,:,:] /= std[i]
  return x

def denormalize(x, mean, std):
  "Denormalize `x` with `mean` and `std`."
  for i in range(x.shape[0]):
    x[i,:,:] *= std[i]
    x[i,:,:] += mean[i]
  return x

img = cv2.imread('input.jpg')
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) #to match PIL
img = cv2.resize(img,(224,224), interpolation = cv2.INTER_AREA)

ort_session_lapa = onnxruntime.InferenceSession('.onnx')

img = np.array(img, dtype=np.float32)
img = img.transpose(2, 0, 1) # for the normalise function

img /= 255

imagenet_stats = [[0.485, 0.456, 0.406], [0.229, 0.224, 0.225]]

img = normalize(img, imagenet_stats[0], imagenet_stats[1])

tensorlike = img[None,:, :, :]
ort_inputs = {ort_session_lapa.get_inputs()[0].name: tensorlike}
ort_outs = ort_session_lapa.run(None, ort_inputs)

img_out = ort_outs[0]
img_out = np.squeeze(img_out)

# img_out = denormalize(img_out, imagenet_stats[0], imagenet_stats[1]) ?

# activation
img_out = F.softmax(torch.from_numpy(img_out).float(), 0) 
img_out = np.array(img_out)

# print(img_out.shape)
# img_out = np.argmax(img_out,0)

gen_img = (img_out*255).astype('uint8')

1 Like