Can't quite duplicate results in ONNX

jbrownkramer · January 30, 2020, 9:57pm

I work at a company that uses computer vision to monitor fall-risk patients in hospitals. I trained a model with fastai on one of the tasks we do, and it beat our existing algorithm on accuracy. I want to port it over to our system to run integration tests, but our system is in C#. Following a suggestion I found on these forums, I exported the model as a .onnx file, and did inference in our system using ONNX Runtime from Microsoft. I started by trying to replicate - in ONNX Runtime - the results I got on the validation set in fastai. Initially the results were way off. Long story short, there are many details in the journey from our data to input tensor, and I had to reimplement everything in C# and also reverse engineer the fastai pipeline. In the end, a model that was getting 84% accuracy in fastai was getting 82% accuracy in C#. I am hoping to get some insight into where the difference is coming from.

I have some major clues that point to the issue being image resizing. Notably, I learned a new model that does not resize and I got the exact same accuracy on both sides. Furthermore, I tried digging through the fastai codebase to find the image resizing code, and everything seemed to be using cv2.resize with cv2.INTER_AREA. Indeed, when I replaced my homespun resize function with a call to cv2.resize with cv2.INTER_AREA (using EmguCV to get it in C#), the accuracy went up by 7/10 percent. But not quite all the way to 84%.

Maybe I have the order of the transformations wrong? Right now I am trying this pipeline: Our Data (16 bits) -> 8 bit image -> divide by 255 -> resize image -> normalize.

One note: I had initially been resizing and clipping in fastai but squishing in C#. I am now squishing in both.

Any suggestions?

pcuenq · February 1, 2020, 9:10pm

I found, too, that the resampling method does make a difference. If my reading of the source code is correct, resampling seems to use bilinear interpolation. I would suggest you try bilinear resizing in your inference pipeline and see what happens. I’m not familiar with OpenCV, but a cursory search seems to imply that INTER_LINEAR would be appropriate. You can use any other image processing library, of course - that’s what I did to test my CoreML model and the results were very similar to the ones I saw in my training notebook.

AllenK · April 22, 2020, 4:21am

I’m interested in what you had to do to re-implement in C#. I may have to follow the same route.
I assume you had to replicate some pre-processing on the image?

is there a public repo of your efforts that you can share?

jbrownkramer · April 23, 2020, 4:02pm

@AllenK,

I don’t have a public repo, I’m sorry. Likely, the things you will have to watch out for are that you’re resizing the same way (there are two options in fastai: clipping or squishing, and there are variety of ways of handling interpolation of pixels) and normalizing the same way. There are c sharp wrappers for OpenCV. I have used emguCV. There are also wrappers for numpy. I’ve used SciSharp.

machinethink · April 23, 2020, 4:51pm

There can also be differences in how certain layers are implemented. For example, TensorFlow uses different padding rules than PyTorch.

(I’ve also seen bugs in the PyTorch -> ONNX conversion step.)

AllenK · April 23, 2020, 7:55pm

Thanks for the tips

buxdehude · November 23, 2020, 8:08am

Hello @jbrownkramer i am exporting an cnn learner to an onnx as well and i’m running into similar problems did you figure out how to achieve the same results in onnx as in fastai? I’m not using cv2 in the onnx preprocessing only numpy and PIL.
I’m getting approx. 80% acc on ONNX compared to fastai.
If anybody has suggestions that would be nice?