Depth Segmentation using U-nets

After following the Camvid Tutorial in Lesson 3, I wondered if it was possible to to perform segmentation on images based on they’re depth. I decided to start with the NYU Depth V2 data



I realised the depth of the images was being completely lost when saved. So I attempted to load the images in as numpy arrays but didn’t understand how to set label of the inputs to be another tensor and couldn’t follow suggestions made in this thread.

I did make progess however after loading to images using pil2tensor and then saving them I was able to create a databunch.

The depth readings for each pixel are in meters but when saving the images, they are pushed into a 0 - 255 range. This seems to create some funky segmentation images but you can see that the classes its created are for depth.

I’ve begun experimenting with hyper parameters and I believe that resnet will be useful as the encoder because it has been pretrained on data that would be relavent to this problem.

My first guess is that Mean Squared Error will be the best loss function here but I’ll soon find out.


So I realised as the task is more of a continuous one, Lesson 7 is more appropriate. Instead of using a SegmentationList which forces us to create classes, we can just use an ImageImageList. However the problem still is the labels (y) should really be a tensor of the meters so I needed to do some tweaking for that.

Getting some not to shabby results though now:

epoch train_loss valid_loss time
0 1.614756 1.064392 03:19
1 1.325972 0.807041 03:18
2 1.107102 0.952836 03:18
3 0.795507 0.663783 03:16
4 0.799025 0.618904 03:19

Meaning we’re under a meter wrong on average.

I needed to make sure that the transformations weren’t messing around with the values of the distances. Cropping, rotating, etc are all still applicable.

I also realised that using the vgg net as an encoder might be better because it can detect more than one thing in an image? I may be wrong about this.

The unet_learner complains about this however saying the dimensions don’t match up.

Is it possible to use vgg as an encoder for a unet learner?

I’ve found this paper which reports getting less than .25 loss. I’ve currently got .4. They also input the camera features into the network after the encoder.

Managed to get accuracy very close to state of the art. Looking at the results I think they’re good enough to use in another project that I was hoping to use them in. I’m about 8cm error from a similar paper but haven’t split the data in the same way.

download%20(1) download


Nice work, what camera are you using?
When you are saying 8cm of accuracy is it for all the targets?
I mean, maybe your accuracy depends of how far is your target, no?

I’m using the camera on the drone.

The sqrt of the squared mean error (.36) is .6 meaning 60cm. So its on average 60cm far away from the ground truth.

The paper I was looking at was getting close to 50cm so I am 8-10cm away from the papers results.

Btw, a very relevant paper here from March, 2019 is FastDepth. I’m having problems. I converted the NYUDepthV2 dataset into a - train/valid split (as that’s important here) and under each an image/depth folder. For depth I made a tiff file of 480x640 of type float32. This all works fine, reading in via…
iil =(ImageImageList.from_folder(dataFP,extensions=[’.png’]).split_by_folder()
Except… when open_image is called on the tiff file it divides all pixels by 255.

Any thoughts? Do I need to go back to the torch way of building a dataloader to wrap into a DataBunch? I was hoping to leverage the framework…


Hey Peter,

Except… when open_image is called on the tiff file it divides all pixels by 255.

I mean you can always multiply the tensor by 255 back :slight_smile:

Your welcome to steal what I’ve done in the notebook to get you started?

Also the paper @pgaston was talking about:

Interesting paper on 3d depth

  1. List item

I moved the notebook for depth perception:

interesting projects, do you mind to share the dataset that I can also train it? Maybe I can give you some idea for it.

The 4th cell in the notebook will download the dataset. I’ve also linked the dataset at the top of the file.

1 Like