![]() # this makes it possible to train with larger rendering resolution, which leads to better quality (see ) Python main.py -text "a hamburger " -workspace trial -O -vram_O # reduce stable-diffusion memory usage with `-vram_O` # enable various vram savings (). Python main.py -text "a hamburger " -workspace trial -O # `-dir_text` enables view-dependent prompting. # stable-dreamfusion setting # Instant-NGP NeRF Backbone # + faster rendering speed # + less GPU memory (~16G) # - need to build CUDA extensions (a CUDA-free Taichi backend is available) # - worse surface quality # train with text prompt (with the default settings) # `-O` equals `-cuda_ray -fp16 -dir_text` # `-cuda_ray` enables instant-ngp-like occupancy grid based acceleration. The multi-face Janus problem is likely to be caused by the text-to-2D model's capability, as discussed by Magic3D in Figure 4 and Can single-stage optimization work with LDM prior?.The vanilla NeRF backbone is also supported now, but the Mip-NeRF backbone as the paper is still not implemented.The surface normals are predicted with an MLP as Magic3D. We use the multi-resolution grid encoder to implement the NeRF backbone (implementation from torch-ngp), which enables much faster rendering (~10FPS at 800x800).Currently, 10000 training steps take about 3 hours to train on a V100. ![]() Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space.
0 Comments
Leave a Reply. |