profile photo

Kai Zhang

Email: kaiz [AT] adobe [DOT] com or kz298 [AT] cornell [DOT] edu

Google Scholar  /  Github

I am a Research Scientist at Adobe Research, working on 3D reconstruction and generation, inverse graphics problems. I recieved my PhD from Cornell University in 2022, where I worked with Prof. Noah Snavely. Before PhD, I got my bachelor at Tsinghua University in 2017.

My latest active research area is generative 3D reconstructor that can 1) reconstruct from sparse posed/unposed images; 2) hallucinate the unseen regions; 3) be generalizable; 4) be robust to imperfect inputs, including lighting varitions, motion blur etc; 5) work on both object-level and scene-level.

Adobe internship/University collaboration: feel free to email me, if you share similar interests in automatic 3D content creation from images, videos and texts!

Research
[11]  PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang
ICLR, 2024 (Spotlight)  
arxiv / project page
Given unposed sparse images, we propose a pose-free LRM that jointly reconstruct the full 3D shape (even though the input images might only partially cover an object) while estimating the relative camera poses of input images.
[10]  DMV3D:Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu*, Kai Zhang* (Equal advisory)
ICLR, 2024 (Spotlight)  
arxiv / project page
To model the uncertainty in the single-image 3D reconstruction problem, We propose to denoise multi-view images using a 3D large reconstruction model (LRM) trained on a large-scale multi-view dataset.
[9]  Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich Sai Bi,
ICLR, 2024 
arxiv / project page
To achieve fast text-to-3D, we finetune SD-XL to generate multi-view images of an object given a text prompt, then reconstruct the 3D shape from the generated images using a multi-view version of the LRM model.
[8]  LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan,
ICLR, 2024 (Oral) 
arxiv / project page
We show that data + transformer paradigm can also naturally work for 3D reconstruction with the help of large-scale multi-view datasets.
[7]  Ray Conditioning: Trading Photo-Consistency for Photo-realism in Multi-view Image Generation
Eric Ming Chen, Sidhanth Holalkere, Ruyu Yan, Kai Zhang, Abe Davis
ICCV, 2023 
arxiv / project page / code
We propose to add ray conditioning to StyleGAN2 generator to enable 3D-aware view synthesis with high photo-realism (at the cost of reduced consistency). This particularly suits the task of editing viewpoints of static images.
[6]  ARF: Artistic Radiance Fields
Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, Noah Snavely
ECCV, 2022 
arxiv / project page / code
We propose to stylize the appearance of NeRF using Nearest Neighbor Feature Matching (NNFM) style loss to create high-quality artistic 3D contents.
[5]  IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images
Kai Zhang, Fujun Luan, Zhengqi Li, Noah Snavely
CVPR, 2022 (Oral) 
arxiv / project page / code
We propose a neural inverse rendering pipeline called IRON that operates on photometric images and outputs high-quality 3D content in the format of triangle meshes and material textures readily deployable in existing graphics pipelines.
[4]  PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting
Kai Zhang*, Fujun Luan*, Qianqian Wang, Kavita Bala, Noah Snavely (*Equal contribution)
CVPR, 2021  
arxiv / project page / code
We propose an end-to-end differentiable rendering pipeline that jointly estimates geometry, material and lighting from multi-view images from scratch. It enables not just novel view synthesis, but also relighting and material editing.
[3]  NeRF++: Analyzing and Improving Neural Radiance Fields
Kai Zhang, Gernot Riegler, Noah Snavely, Vladlen Koltun
arXiv preprint, 2020  
arxiv / code
We analyze the shape-radiance ambiguity in NeRF, and extend NeRF to work with 360 unbounded scenes. At the core of the method is the Inverted Sphere Parametrization (ISP) contracting an unbounded space to a bounded one.
[2]  Depth Sensing Beyond LiDAR Range
Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen
CVPR, 2020  
arxiv / project page / code
We propose a novel cost-effective camera-based solution to sense the depth of distant objects that are not reachable by typical LiDARs. This can be particularly helpful for heavily-weighted autonomous trucks.
[1]  Leveraging Vision Reconstruction Pipelines for Satellite Imagery
Kai Zhang, Jin Sun, Noah Snavely
ICCV 3DRW, 2019  
arxiv / project page / code
We approximate satellite-specific RPC cameras with perspective cameras, and adapt vision reconstruction pipelines (SfM+MVS) such that they can also process satellite images with competitive accuracy and increased scalability.
Services

Internships: Adobe (2022 Summer), Intel (2021 Summer, 2020 Summer), HKUST (2019 Summer), ICL (2016 Summer)
Paper reviewer: ICCV, ECCV, CVPR, TVGG, TPAMI, SIGGRAPH, SIGGRAPH ASIA, NeurIPS, ICLR
Seminar co-organizer: Cornell Graphics and Vision Seminar, 2021 Fall
Talk speaker: Graphics And Mixed Environment Seminar (GAMES), 2021
Teaching assistant: Deep Learning, 2022 Spring; Applied Machine Learning, 2021 Fall, 2020 Fall; Introduction to Computer Vision, 2020 Spring, 2019 Spring


Kudos to Dr. Jon Barron for sharing his website template.