Localization in 3d images

What architecture should I use to do so
I have seen the following till now:
Fast Image-Based Localization using Direct 2D-to-3D Matching

3D Object Localisation from Multi-view Image Detections

The stuff we learn in part 2 should work fine, just using 3d instead of 2d convolutions.