ICT is running a series of articles to highlight the work of our Graduate Research Assistants. In this essay we hear from Hanyuan (Cornelius) Xiao, PhD candidate, Computer Science, Viterbi School of Engineering, who works as a Research Assistant in the Vision and Graphics Lab, under the supervision of Dr. Yajie Zhao.
BYLINE: Hanyuan (Cornelius) Xiao, PhD candidate, Computer Science, Viterbi School of Engineering; Research Assistant, Vision and Graphics Lab, ICT.
I have always been captivated by the intersection of computer vision, computer graphics, and stunning visual effects (VFX).
During my undergraduate studies in Computer Science and Electrical Engineering at Rensselaer Polytechnic Institute (RPI) in Troy, New York, I started to see what was possible within simulations, games and films. This deep-seated interest, combined with the rapid advancements in generative models—particularly those unveiled at SIGGRAPH 2023—solidified my current research focus on 3D Artificial Intelligence Generated Content (AIGC) and interactive neural rendering.
Currently, I am a PhD candidate in the Vision and Graphics Lab (VGL) at the USC Institute for Creative Technologies (ICT). Under the guidance of Dr. Yajie Zhao, my research delves into the challenges of 3D scene generation and editing by leveraging large vision models.
While modern methods have enabled scene generation through dense view generation, a persistent challenge remains: ensuring multiview consistency in a video/multiview generative model. Unlike 3D generative models that naturally guarantee multiview consistency, generated 2D outputs without proper constraints in 3D space often suffer from severe inconsistencies—such as a gymnast’s arms morphing into legs during a performance. Addressing these hallucinations is a core objective of my work, as I explore novel methods to enhance the reliability of AI-generated 3D representations.
My research throughout has been driven by three fundamental factors: my early exposure to 3D vision and rendering, my enduring fascination with compelling visual effects, and the unprecedented opportunities emerging from large generative models.
Why ICT?
Choosing to conduct my research at ICT was a pivotal decision. During my PhD orientation, I explored several computer vision and graphics labs, and VGL stood out as the optimal environment. It has a formidable research team, state-of-the-art hardware, and strong ties to the film industry—an ideal setting for pushing the boundaries of AI-driven visual content creation.
Beyond research, my time at ICT has been marked by memorable experiences, including encountering the actor Ryan Reynolds during a light stage scan (but I only realized it was him immediately after our handshake!)
I also got to participate in the AIxVR 2024 conference, which we hosted at ICT, engaging with some of the brightest minds at the forefront of artificial intelligence and immersive technologies.
Research Focus Areas
My research work at ICT spans multiple facets of 3D scene generation, reconstruction, and rendering, and I am currently developing 3D-aware video diffusion models, aiming to achieve multiview-consistent video generation and reconstruction.
Additionally, I have worked on text-guided scene editing, implementing a one-shot editing pipeline that adapts to environmental illumination using a depth-inpainting technique to distill guidance from large vision models.
Beyond scene generation, my contributions to physically-based neural rendering have centered on authoring materials for rendering with physically-grounded properties. I played a role in designing a diffusion model that generates Bidirectional Reflectance Distribution Function (BRDF) materials from text prompts, along with curating a material database captured via the ICT light stage. My research also led to the development of a deep learning framework that disentangles lighting, material, and light transport in a neural volume, enabling highly realistic renderings of complex effects including subsurface scattering on human faces.
Additionally, my work on MVS-PERF introduced a novel pipeline that learns deformable clothing representations by mapping parametric human body models to clothed neural volumes—an effort that was awarded a U.S. patent.
My research has been disseminated through numerous academic publications, including presentations at prestigious conferences such as WACV 2025 and ICLR 2023.
As I look ahead, I aspire to present my work at SIGGRAPH, a conference that has continually inspired my research trajectory.
Future Goals
The pinnacle of my academic career would be developing a method that elegantly resolves a fundamental bottleneck in the field—a breakthrough that advances not only my work but also the broader scientific community. In the near future, I aim to complete my current project with a groundbreaking solution that enhances multiview consistency in 3D generation.
Ultimately, I hope to see my research integrated into an open-source 3D generation tool, making high-fidelity 3D content creation more accessible to researchers and artists alike.
My journey in computer vision and graphics continues to be one of discovery and innovation. With every research endeavor, I strive to push the boundaries of what is possible in AI-driven 3D content generation, bridging the gap between virtual and real-world experiences.
//