After following the Camvid Tutorial in Lesson 3, I wondered if it was possible to to perform segmentation on images based on they’re depth. I decided to start with the NYU Depth V2 data
I realised the depth of the images was being completely lost when saved. So I attempted to load the images in as numpy arrays but didn’t understand how to set label of the inputs to be another tensor and couldn’t follow suggestions made in this thread.
I did make progess however after loading to images using pil2tensor
and then saving them I was able to create a databunch.
The depth readings for each pixel are in meters but when saving the images, they are pushed into a 0 - 255 range. This seems to create some funky segmentation images but you can see that the classes its created are for depth.
I’ve begun experimenting with hyper parameters and I believe that resnet will be useful as the encoder because it has been pretrained on data that would be relavent to this problem.
My first guess is that Mean Squared Error will be the best loss function here but I’ll soon find out.
So I realised as the task is more of a continuous one, Lesson 7 is more appropriate. Instead of using a SegmentationList
which forces us to create classes, we can just use an ImageImageList
. However the problem still is the labels (y) should really be a tensor of the meters so I needed to do some tweaking for that.
Getting some not to shabby results though now:
epoch | train_loss | valid_loss | time |
---|---|---|---|
0 | 1.614756 | 1.064392 | 03:19 |
1 | 1.325972 | 0.807041 | 03:18 |
2 | 1.107102 | 0.952836 | 03:18 |
3 | 0.795507 | 0.663783 | 03:16 |
4 | 0.799025 | 0.618904 | 03:19 |
Meaning we’re under a meter wrong on average.
I needed to make sure that the transformations weren’t messing around with the values of the distances. Cropping, rotating, etc are all still applicable.
I also realised that using the vgg net as an encoder might be better because it can detect more than one thing in an image? I may be wrong about this.
The unet_learner
complains about this however saying the dimensions don’t match up.
Is it possible to use vgg as an encoder for a unet learner?
I’ve found this paper which reports getting less than .25 loss. I’ve currently got .4
. They also input the camera features into the network after the encoder.
Managed to get accuracy very close to state of the art. Looking at the results I think they’re good enough to use in another project that I was hoping to use them in. I’m about 8cm error from a similar paper but haven’t split the data in the same way.
Nice work, what camera are you using?
When you are saying 8cm of accuracy is it for all the targets?
I mean, maybe your accuracy depends of how far is your target, no?
I’m using the camera on the drone.
The sqrt of the squared mean error (.36) is .6 meaning 60cm. So its on average 60cm far away from the ground truth.
The paper I was looking at was getting close to 50cm so I am 8-10cm away from the papers results.
Btw, a very relevant paper here from March, 2019 is FastDepth. I’m having problems. I converted the NYUDepthV2 dataset into a fast.ai-ish - train/valid split (as that’s important here) and under each an image/depth folder. For depth I made a tiff file of 480x640 of type float32. This all works fine, reading in via…
iil =(ImageImageList.from_folder(dataFP,extensions=[’.png’]).split_by_folder()
.label_from_func(get_y_fn,convert_mode=“F”))
Except… when open_image is called on the tiff file it divides all pixels by 255.
Any thoughts? Do I need to go back to the torch way of building a dataloader to wrap into a DataBunch? I was hoping to leverage the fast.ai framework…
Hey Peter,
Except… when open_image is called on the tiff file it divides all pixels by 255.
I mean you can always multiply the tensor by 255 back
Your welcome to steal what I’ve done in the notebook to get you started?
Interesting paper on 3d depth
- List item
I moved the notebook for depth perception:
interesting projects, do you mind to share the dataset that I can also train it? Maybe I can give you some idea for it.
The 4th cell in the notebook will download the dataset. I’ve also linked the dataset at the top of the file.