Depth Segmentation using U-nets

baz · March 18, 2019, 12:07pm

After following the Camvid Tutorial in Lesson 3, I wondered if it was possible to to perform segmentation on images based on they’re depth. I decided to start with the NYU Depth V2 data

Notebook

baz · March 21, 2019, 11:17pm

I realised the depth of the images was being completely lost when saved. So I attempted to load the images in as numpy arrays but didn’t understand how to set label of the inputs to be another tensor and couldn’t follow suggestions made in this thread.

I did make progess however after loading to images using pil2tensor and then saving them I was able to create a databunch.

The depth readings for each pixel are in meters but when saving the images, they are pushed into a 0 - 255 range. This seems to create some funky segmentation images but you can see that the classes its created are for depth.

I’ve begun experimenting with hyper parameters and I believe that resnet will be useful as the encoder because it has been pretrained on data that would be relavent to this problem.

My first guess is that Mean Squared Error will be the best loss function here but I’ll soon find out.

baz · April 5, 2019, 7:55pm

So I realised as the task is more of a continuous one, Lesson 7 is more appropriate. Instead of using a SegmentationList which forces us to create classes, we can just use an ImageImageList. However the problem still is the labels (y) should really be a tensor of the meters so I needed to do some tweaking for that.

Getting some not to shabby results though now:

epoch	train_loss	valid_loss	time
0	1.614756	1.064392	03:19
1	1.325972	0.807041	03:18
2	1.107102	0.952836	03:18
3	0.795507	0.663783	03:16
4	0.799025	0.618904	03:19

Meaning we’re under a meter wrong on average.

I needed to make sure that the transformations weren’t messing around with the values of the distances. Cropping, rotating, etc are all still applicable.

I also realised that using the vgg net as an encoder might be better because it can detect more than one thing in an image? I may be wrong about this.

The unet_learner complains about this however saying the dimensions don’t match up.

Is it possible to use vgg as an encoder for a unet learner?

baz · April 6, 2019, 12:24pm

I’ve found this paper which reports getting less than .25 loss. I’ve currently got .4. They also input the camera features into the network after the encoder.

baz · April 8, 2019, 5:17pm

Managed to get accuracy very close to state of the art. Looking at the results I think they’re good enough to use in another project that I was hoping to use them in. I’m about 8cm error from a similar paper but haven’t split the data in the same way.

download%20(1)

Matthieu · April 17, 2019, 6:39pm

Nice work, what camera are you using?
When you are saying 8cm of accuracy is it for all the targets?
I mean, maybe your accuracy depends of how far is your target, no?

baz · April 17, 2019, 6:50pm

I’m using the camera on the drone.

The sqrt of the squared mean error (.36) is .6 meaning 60cm. So its on average 60cm far away from the ground truth.

The paper I was looking at was getting close to 50cm so I am 8-10cm away from the papers results.

pgaston · May 2, 2019, 10:29pm

Btw, a very relevant paper here from March, 2019 is FastDepth. I’m having problems. I converted the NYUDepthV2 dataset into a fast.ai-ish - train/valid split (as that’s important here) and under each an image/depth folder. For depth I made a tiff file of 480x640 of type float32. This all works fine, reading in via…
iil =(ImageImageList.from_folder(dataFP,extensions=[’.png’]).split_by_folder()
.label_from_func(get_y_fn,convert_mode=“F”))
Except… when open_image is called on the tiff file it divides all pixels by 255.

Any thoughts? Do I need to go back to the torch way of building a dataloader to wrap into a DataBunch? I was hoping to leverage the fast.ai framework…

baz · May 7, 2019, 12:57pm

Hey Peter,

Except… when open_image is called on the tiff file it divides all pixels by 255.

I mean you can always multiply the tensor by 255 back

Your welcome to steal what I’ve done in the notebook to get you started?

baz · May 7, 2019, 1:06pm

Also the paper @pgaston was talking about:

http://fastdepth.mit.edu/

baz · August 5, 2019, 11:28pm

Interesting paper on 3d depth

List item

baz · March 9, 2020, 1:44pm

I moved the notebook for depth perception:

github.com

mogwai/depth_perception/blob/master/Depth Perception.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Depth Segmentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook I attempt to train a unet model to predict depth"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},

This file has been truncated. show original

JonathanSum · March 9, 2020, 2:20pm

interesting projects, do you mind to share the dataset that I can also train it? Maybe I can give you some idea for it.

baz · March 10, 2020, 2:00pm

The 4th cell in the notebook will download the dataset. I’ve also linked the dataset at the top of the file.

baz · July 15, 2020, 7:06pm