|
Kai Zhang
Email: kaiz [AT] adobe [DOT] com or kz298 [AT] cornell [DOT] edu
Google Scholar
 / 
Github
I am a Research Scientist at Adobe Research, working on
3D reconstruction and generation, inverse graphics problems. I recieved my
PhD
from Cornell University in 2022, where I worked with Prof. Noah Snavely.
Before PhD, I got my bachelor at Tsinghua University in 2017.
My latest active research area is generative 3D reconstructor that can 1) reconstruct from sparse
posed/unposed images; 2) hallucinate the unseen regions; 3) be generalizable; 4) be robust to
imperfect inputs, including lighting varitions, motion blur etc; 5) work on both object-level and scene-level.
Adobe internship/University collaboration: feel free to email me, if you share similar interests
in automatic 3D content creation from images, videos and texts!
|
Research
|
[11]  PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction
Peng Wang,
Hao Tan,
Sai Bi,
Yinghao Xu,
Fujun Luan,
Kalyan Sunkavalli,
Wenping Wang,
Zexiang Xu,
Kai Zhang
ICLR, 2024 (Spotlight)  
arxiv /
project page
Given unposed sparse images, we propose a pose-free LRM that jointly reconstruct the full 3D shape
(even though the input images might only partially cover an object) while estimating the relative
camera poses of input images.
|
[10]  DMV3D:Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Yinghao Xu,
Hao Tan,
Fujun Luan,
Sai Bi,
Peng Wang,
Jiahao Li,
Zifan Shi,
Kalyan Sunkavalli,
Gordon Wetzstein,
Zexiang Xu*,
Kai Zhang* (Equal
advisory)
ICLR, 2024 (Spotlight)  
arxiv /
project page
To model the uncertainty in the single-image 3D reconstruction problem, We propose to denoise
multi-view images using a 3D large reconstruction model (LRM) trained on a large-scale multi-view
dataset.
|
[9]  Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model
Jiahao Li,
Hao Tan,
Kai Zhang,
Zexiang Xu,
Fujun Luan,
Yinghao Xu,
Yicong Hong,
Kalyan Sunkavalli,
Greg Shakhnarovich
Sai Bi,
ICLR, 2024 
arxiv /
project page
To achieve fast text-to-3D, we finetune SD-XL to generate multi-view images of an object given a text
prompt, then reconstruct the 3D shape from the generated images using a multi-view version of the LRM
model.
|
[8]  LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong,
Kai Zhang,
Jiuxiang Gu,
Sai Bi,
Yang Zhou,
Difan Liu,
Feng Liu,
Kalyan Sunkavalli,
Trung Bui,
Hao Tan,
ICLR, 2024 (Oral) 
arxiv /
project page
We show that data + transformer paradigm can also naturally work for 3D reconstruction with the help
of large-scale multi-view datasets.
|
[7]  Ray Conditioning: Trading Photo-Consistency for Photo-realism in Multi-view
Image Generation
Eric Ming Chen,
Sidhanth Holalkere,
Ruyu Yan,
Kai Zhang,
Abe Davis
ICCV, 2023 
arxiv /
project page /
code
We propose to add ray conditioning to StyleGAN2 generator to enable 3D-aware view synthesis with high
photo-realism (at the cost of reduced consistency). This particularly suits the task of editing
viewpoints of static images.
|
[6]  ARF: Artistic Radiance Fields
Kai Zhang,
Nick Kolkin,
Sai Bi,
Fujun Luan,
Zexiang Xu,
Eli Shechtman,
Noah Snavely
ECCV, 2022 
arxiv /
project page /
code
We propose to stylize the appearance of NeRF using Nearest Neighbor Feature Matching (NNFM) style loss
to create high-quality artistic 3D contents.
|
[5]  IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from
Photometric Images
Kai Zhang,
Fujun Luan,
Zhengqi Li,
Noah Snavely
CVPR, 2022 (Oral) 
arxiv /
project page /
code
We propose a neural inverse rendering pipeline called IRON that operates on photometric images and
outputs high-quality 3D content in the format of triangle meshes and material textures readily
deployable in existing graphics pipelines.
|
[4]  PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material
Editing and Relighting
Kai Zhang*,
Fujun Luan*,
Qianqian Wang,
Kavita Bala,
Noah Snavely (*Equal contribution)
CVPR, 2021  
arxiv /
project page /
code
We propose an end-to-end differentiable rendering pipeline that jointly estimates geometry, material
and lighting from multi-view images from scratch. It enables not just novel view synthesis, but also
relighting and material editing.
|
[3]  NeRF++: Analyzing and Improving Neural Radiance Fields
Kai Zhang,
Gernot Riegler,
Noah Snavely,
Vladlen Koltun
arXiv preprint, 2020  
arxiv /
code
We analyze the shape-radiance ambiguity in NeRF, and extend NeRF to work with 360 unbounded scenes. At
the core of the method is the Inverted Sphere Parametrization (ISP) contracting an unbounded space to
a bounded one.
|
[2]  Depth Sensing Beyond LiDAR Range
Kai Zhang,
Jiaxin Xie,
Noah Snavely,
Qifeng Chen
CVPR, 2020  
arxiv /
project page /
code
We propose a novel cost-effective camera-based solution to sense the depth of distant objects that are
not reachable by typical LiDARs. This can be particularly helpful for heavily-weighted autonomous
trucks.
|
[1]  Leveraging Vision Reconstruction Pipelines for Satellite Imagery
Kai Zhang,
Jin Sun,
Noah Snavely
ICCV 3DRW, 2019  
arxiv /
project page /
code
We approximate satellite-specific RPC cameras with perspective cameras, and adapt vision
reconstruction pipelines (SfM+MVS) such that they can also process satellite images with competitive
accuracy and increased scalability.
|
Services
Internships: Adobe (2022 Summer), Intel (2021 Summer, 2020 Summer), HKUST (2019 Summer), ICL (2016
Summer)
Paper reviewer: ICCV, ECCV, CVPR, TVGG, TPAMI, SIGGRAPH, SIGGRAPH ASIA, NeurIPS, ICLR
Seminar co-organizer: Cornell Graphics and Vision Seminar, 2021 Fall
Talk speaker: Graphics And Mixed Environment Seminar (GAMES), 2021
Teaching assistant: Deep Learning, 2022 Spring; Applied Machine Learning, 2021 Fall, 2020 Fall;
Introduction to Computer Vision, 2020 Spring, 2019 Spring
|
|