Post-processing RLE masks after load_learner and get_preds

awgillespie · October 8, 2019, 5:19am

Hi Team,

I am currently tackling the Severstal Kaggle after completing lecture 3 of Part 1.

At this stage, I have trained a UNet/Resnet18 architecture on images of size 64 x 400. I’ve exported the trained model, then used load_learner to load it back up again, and lastly added the test set of images from the competition.

I am currently stuck on the post-processing part for the predictions generated by get_preds on the test images. Given the test images are transformed the same way as my training data (I assume), my predicted masks are of size 64 x 400. However, I need the prediction masks to be post-processed back to the full resolution of 256 x 1600.

Here I take the first image prediction from get_preds and attempt to post-process the tensor back to 256 x 1600:

for pred in preds:
for i,j in enumerate(pred):
    print("Class: " + str(i))
    print("Max prob: " + str(j.max()))
    pred_mask, test_num = post_process(j, 0.2, min_size=min_size)
    print("Max value: " + str(pred_mask.max()))
    rle  = mask2rle(pred_mask)
    print(rle)
    mask_test = (open_mask_rle(rle, shape=(64, 400)).px.permute(0, 2, 1).resize((1,256,1600)))
    ImageSegment(mask_test).show()
    print("\n")
break

I get the following error:

RuntimeError: requested resize to (1, 256, 1600) ((1, 256, 1600) elements in total), but the given tensor has a size of 1x64x400 (25600 elements). autograd’s resize can only change the shape of a given tensor, while preserving the number of elements.

Has anyone dealt with post-processing RLEs in this manner? Any help is much appreciated, thank you.

You can find my code here:
Notebook

TomB · October 8, 2019, 6:11am

As the error is showing resize is just for changing shape. You want something like torch.nn.functional.interpolate (with the default mode='nearest as for segmentation you don’t want values in between the mask values). Or you might try the process in fastai’s PixelShuffle_ICNR which is what it uses for upsampling in the Unet decoder. It’s a trainable layer with a convolution for also changing the number of features and a ReLU activation, but you should be able to pull out just the pixel-shuffling upsample based on torch.nn.Upsample.

awgillespie · October 8, 2019, 10:19pm

Thank you once again TomB.

The post_process function I am using is handling a numpy array initially as the mask before being cast to a tensor/ImageSegment. Instead of working with the tensor functions, I used the following:

pred_mask = pred_mask.repeat(4, axis=0).repeat(4, axis=1)

This scales the mask to the desired dimensions.The functions you mentioned will be handy no doubt.