Neural Radiance Fields

Ha… y I just stumbled on that myself just 15 minutes ago :slight_smile: pretty awesome…

I’d still like to re-implement it myself to go a bit deeper… but I’ll probably spend today looking over that repo and seeing if I can get it to work. Will share things here as I make progress.

Would also love to hear about experiments you’ve run and things you’ve tried etc…


1 Like

Awesome @johnrobinsn :slight_smile:
myself so far Ive been exploring the Instant NGP repo in depth

and did some experiments with friends, this is one of them:

that’s made from a video, all of it done in an agile way, could be done way better with photos rather than video etc but it works really great

In a few weeks I plan to shoot a short film related to AI and I want to try to use Nerf inside the movie, so yeah Im exploring all I can to see what could be incorporated,

It’s funny because a friend of mine Joan will work on the film as well, he is an expert shooting with drones, cameras on drones, mini drones, large drones, and its funny that now with Nerf we can do Drone like effects

Now, the thing is that with nerfs the figure has to be static, but here is where Ive been seeing around some research where people use Nerfs and produce models that they can then animate and move, I would looove to do this, and my focus right now is in trying to find something that I can test in that direction, but I am open to all things Nerf!

regarding that nerf+SD repo, I thought about testing it but the examples are still so limited and the warnings about the unstability of that code… so that’s why I decided to look in other directions but if you try it I would love to see how that goes and potentially give a hand as well yeah

So much interest in Nerf! @jeremy if you find yourself searching for any future course ideas that the masses would be interested to learn more about haha


lets make this clear, Im totally crazy about Nerf :heart_eyes:


haha… there are too many interesting things and not enough time :slight_smile: … did someone say exponential…


absolutely and in the case of Nerf, things get triple interesting, as similar tech can be the answer to so many challenges, including parts of the utopic “that-4-letter-word”+Verse

btw, in the last days I got in touch with Dan Casas:
author with others of:
CVPR 2022 (Oral) paper “SNUG: Self-Supervised Neural Dynamic Garments”.

they are optimizing garments and clothing on top of 3d models, basically the 3d humans aspect of things could take off pretty fast once the actual human models get more optimized

regarding this paper I mentioned previously:

" At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts. Each part is represented by an individual volume. This compositional representation enables 1) inherent human priors, 2) adaptive allocation of network parameters, 3) efficient training and rendering"

and they are planning to release their code soon here:

its interesting the concept of creating a composition of multiple Nerf representations to then allow for posing the human model, specifically they divide it in 16 parts: “We divide the human body into 16 parts and assign each part an individual network, which models the corresponding local volume”

1 Like

Another good and interesting article:

"What makes EVA3D outstanding is that the researchers behind it, almost uniquely in the sector of full-body image synthesis, have realized that a single network (GAN, NeRF or otherwise) is not going to be able to handle editable and flexible human full-body generation for some years – partly because of the pace of research, and partly because of hardware and other logistical limitations.

Therefore, the Nanyang team have subdivided the task across 16 networks and multiple technologies – an approach already adopted for neural rendering of urban environments in Block-NeRF and CityNeRF, and which seems likely to become an increasingly interesting and potentially fruitful half-way measure to achieve full-body deepfakes in the next five years, pending new conceptual or hardware developments.

The EVA3D workflow segments the human body into 16 distinct parts, each of which is generated through its own NeRF network. Obviously, this creates enough ‘unfrozen’ sections to be able to galvanize the figure through motion capture or other types of motion data. Besides this advantage, however, it also allows the system to assign maximum resources to the parts of the body that ‘sell’ the overall impression."

Folks, and what about MDM!
“The new model, called Motion Diffusion Model (MDM), leverages Transformers to interpret priors (extracted features from snippets of movement data, collected into voluminous datasets) and then reproduce them according to several possible input methods;”

“The new architecture, Tevet agrees, is also amenable to third-party technologies that could allow for direct (i.e. manipulated) interaction via interfaces based around SDF and/or 3DMM interfaces, amongst other possible ways that researchers have been applying recently to obtain a finer-grained access to content that’s resident in the latent space of a Generative Adversarial Network (GAN), or of Neural Radiance Fields (NeRF) – or of a latent diffusion network such as Stable Diffusion”

1 Like

So… can you folks explain to me why I should care about NeRF? What are some of the applications? What could I do with it which would be fun and useful?

The applications are basically around being able to create a 3d (volumetric) representation of a scene from 2D images… The original mechanism for doing this mapping from 2D to 3D and generalizing to unseen viewpoints was a MLP (mostly simple linear layers)… I sort of see it like an autoencoder that learns to represent a 3d scene given a loss function between the input images and a render of the 3d representation back to 2D.

I think the main reason I’m interested in the context of this course is the DreamFusion paper which uses a generative 2D diffusion model to learn how to create 3D object models from text prompts.

I just put out a tweet for an experiment, I ran with a recently released implementation of the paper that uses StableDiffusion as the 2D diffusion model… and generated a 3D pumpkin from a text prompt…

Nerfs provide a bridge from these large text/2d image models to the 3D realm… That said might be a bit niche…

@jeremy, going from image to vector graphics (via DL) will open totally new application domain. It may go as deep as replacing Computer Aided Design (CAD) tools and adds new means to 3D vector graphics. Applications to metaverse, restoration of buildings, gaming, even in mechanical, civil, architecture, and geomatics research.


One killer application is the creation of digital twins - virtual 3D models of real cities created from just a bunch of photos of the city, perhaps taken from a drone. DL is fast becoming a new way to create such 3D models, that have traditionally required photogrammetry. This is still an evolving area and the current textured meshes produced aren’t as good as what photogrammetry produces, but this field is changing so rapidly that very soon we might expect that breakthrough … perhaps in a course :wink: I’m quite interested in this application as well and looking to collaborate.

Another related research area is applying diffusion to create 3D objects… imagine creating imaginary buildings as 3D textured meshes, and putting them in virtual cities in the metaverse and in Hollywood. That is a good reason why you should be interested in NeRFs and 3D diffusion @jeremy !

1 Like

Also, 2d and 2.5d MRI to 3d could highly be applicable to health and medical sciences.


agree this would be awesome I started working on a few things to bridge 3d tensors into fast\ai and apply recurrent networks (lstms) to image sequences. and made some progress on it…but do need to get back to it. Would be great when it makes sense to cover this in a future class…

1 Like

I think there are a large number of use cases for reality capture - translating 2D images and video into 3D scenes (digital twins). Some examples are creating 3D assets for use in movies, games, virtual/augmented reality, virtual visits to landmarks across the world, Google street view - except not being constrained to the position of the camera, documenting progress of construction, virtual tours, robotics navigation/simulation, understanding of surroundings, simulation environment generation, as just a few examples.

This is primarily done today using Photogrammetry or fused lidar+photo. It seems very likely that a learning based approaches will outperform previous physics/optics + heuristics based methods and NERFs seem to be the most impressive learning based approach at the moment.

Currently both NERFs and photogrammetry face similar issues, for example neither method works well when some photos are taken in the day and some at night, blurry photos and photos with shallow or even varying depths of field do not work well together, moving objects within the scene (people or flag blowing in the wind) cause issues, photos from different camera models sometimes don’t work well together, etc. Many of these types of issues are now starting to be addressed with NERFs. This video does a great job demonstrating some of these types of issues as well as demonstrating techniques to address these issues.

Another problem with some of the early work with NERFs was that they were extremely slow to train. Many of the early examples took several days to train. Then instant-ngp came along and offered a multi order of magnitude speedup allowing you to train the NERF in seconds to minutes vs hours to days. These result totally blew me away - training a NERF using instant-ngp is about as fast as training the dogs vs cats model. instant-ngp is also dramatically faster than most photogrammetry techniques that I’ve seen. Unfortunately at least some of these performance improvements were due to the majority of instant-ngp being implemented directly in CUDA vs being done in Pytorch which seems to significantly raise the barrier to entry to build upon these results.

Here’s a list of some of the things that I believe need to be incorporated into NERF’s. Many of these are being worked on currently and will need to be brought together once solved:

  1. Camera localization done via learning.
  2. ‘Floater’ removal or elimination.
  3. Camera/lens pixel mappings (intrinsics) - per image preferred
  4. Rolling shutter correction
  5. Ability to export high quality meshes and textures. Mesh results are currently quite poor, at least in instant-ngp.
  6. Work well in more unconstrained environments - currently NERF’s work best when photos circle an object of interest.
  7. Ability to separate geometry from textures - the video I referenced addresses this.
  8. Ability to ignore/filter out moving objects
  9. Improve inference speed - for instant-ngp inference at higher resolutions seems slower than training

Other ideas - may be quite out there/bad ideas:

  1. As NERFs store a compressed representation of a 3D scene, maybe they would be somehow useful as a mechanism for storing/retrieving temporal residuals for previous video frames for video predictions
  2. Ability to segment and temporally encode moving objects. Serves both to reduce artifacts from current NERF implementations for static scenes and as well as enabling moving objects to be encoded and referenced.
  3. Unlimited, more realistic and more diverse data augmentation for image classifiers

As for ideas for the class - I think that NERFs are generally fun and accessible. All you need is a camera and you can create a 3d scene of whatever you want whether it’s a toy, your home, your workplace, a local street or landmark, etc.

  1. Can a Pytorch implementation get close to the speed of instant-ngp that would allow for style fast iteration while being more accessible/hackable than the current CUDA implementation?
  2. Are there pieces of instant-ngp that are able to be incorporated into a Pytorch implementation, whether for NERFs specifically or more generally? Are there useful things to be learned from what makes instant-ngp so much faster than previous NERF implementations?
  3. Do you have any intuition based ideas for NERFs on how they can be improved, similar to your intuition on optimizers for stable diffusion?

EDIT: Oops, I forgot to make this a reply to Jeremy…


Fun with NERF: Why THIS is the Future of Imagery (and Nobody Knows it Yet).

I’ve spent most of the last couple of years in NLP. I’m very new to NerFs. Can you guide me to some resources to go from 0 to 1 in NerFs.