The Vision & Graphics Lab (VGL) at the USC Institute for Creative Technologies (ICT) is presenting its latest advances at the 2025 International Conference on Computer Vision (ICCV) in Honolulu this week. Recognized as the premier global forum for computer vision research, ICCV convenes leading scientists, practitioners, and students to examine the latest methodologies, algorithms, and applications shaping the field. This year, VGL has secured two paper acceptances, with three of its student researchers: Wenbin Teng, Gonglin Chen, and Emily Jia, presenting their work to the international community under the guidance of Dr. Yajie Zhao, Director, Vision and Graphics Lab.
ICCV is widely regarded for fostering cross-disciplinary dialogue, uniting diverse strands of computer vision from geometry, physics-based simulation, and machine learning to perception and generative modeling. Within this rich intellectual environment, VGL’s contributions explore both foundational challenges and practical innovations, demonstrating the lab’s commitment to bridging theoretical insight and computational implementation.
VGL: Background and Mission
Founded with the goal of advancing human digitalization, VGL specializes in the creation of production-quality virtual characters—digitally generated humans capable of realistic speech, movement, and interaction. These avatars play a crucial role in entertainment, training, and educational systems, where visual fidelity and behavioral realism are paramount. Central to VGL’s work is the Academy Award-winning Light Stage, an in-house technology that captures and processes production-quality assets, enabling the creation of highly convincing virtual humans. This technology has underpinned contributions to 49 films and earned two Scientific and Technical Awards from the Academy of Motion Pictures.
VGL collaborates with industry leaders including Sony Pictures Imageworks, WETA Digital, Nvidia, Meta, and Digital Domain, integrating academic research with practical production pipelines. The lab maintains an extensive database of human facial data and has contributed the ICT-face morphable model to the broader research community. These resources facilitate AI-driven workflows for rapid data capture, personalized avatar creation, and physically-based rendering. Beyond static avatars, VGL is developing dynamic human capture systems capable of real-time performance tracking for VR and AR applications, laying the groundwork for immersive, interactive digital humans.
In addition to these achievements, VGL has published over 160 top-tier academic papers, pioneering methods in human digitalization and influencing the broader fields of computer vision and graphics. Looking ahead, the lab is expanding its research scope to encompass scene and terrain understanding, reconstruction, and physical interaction, aiming toward a future where digital humans can not only move convincingly but also perceive, interact with, and respond to the world around them.
Learning an Implicit Physical Model for Image-Based Fluid Simulation
Emily Jia, alongside collaborators Jiageng Mao, Zhiyuan Gao, Yajie Zhao, and Yue Wang, will present a poster titled Learning an Implicit Physical Model for Image-Based Fluid Simulation. Scheduled for October 21, from 3:15 pm to 5:15 pm (Hawaii Time) at Exhibit Hall I, Board 651, this work addresses a fundamental challenge at the intersection of perception, physical reasoning, and generative modeling: animating fluid motion from a single static image.
Humans effortlessly anticipate fluid dynamics in real-world scenes, instinctively predicting how water ripples around obstacles or how currents will respond to terrain. Translating this capability to machines has remained an open problem, requiring the integration of visual learning, three-dimensional reconstruction, and physics-based reasoning—domains traditionally treated separately. Previous approaches in video generation from still images have made significant strides, yet when applied to fluids such as water, smoke, or fire, these methods often fail to maintain physical fidelity. Boundaries are ignored, matter flows unrealistically through obstacles, and resulting dynamics frequently violate the physical laws implicit to human perception.
Jia’s approach introduces a physics-informed neural dynamics framework that synthesizes the strengths of data-driven models with the rigor of physical simulation. The framework comprises two components: a 3D Gaussian representation of the input scene, which captures geometry while enabling novel-view synthesis, and a neural dynamics module that predicts velocity fields directly from the static image. Importantly, this module is supervised both by priors learned from real-world video and by constraints derived from the Navier–Stokes equations, ensuring that fluid motion remains both plausible and physically coherent.
Evaluations of the framework included both synthetic benchmarks, where ground-truth velocity fields were known, and perceptual studies with human observers. Across synthetic datasets, the method reduced prediction errors by more than twenty percent relative to previous baselines. Perceptual studies revealed a roughly forty percent preference for Jia’s method over prior approaches, highlighting that integrating physics into neural networks not only improves quantitative accuracy but also aligns with human intuition regarding fluid behavior.
This work reflects a broader trend in computer vision toward integrating domain knowledge with flexible, data-driven models. By embedding physics into neural systems, researchers can produce models that generalize more effectively, remain interpretable, and extend beyond fluids to dynamic phenomena such as deformable objects, smoke, crowds, and other complex environments by injecting corresponding physics laws. Applications range from robotics and autonomous systems to visual effects in creative media, where anticipatory modeling of scene dynamics is crucial.
FVGen – Accelerating Novel-View Synthesis with Adversarial Video Diffusion
VGL’s second contribution, FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation, represents a significant advance in the synthesis of 3D scenes from sparse 2D images. Wenbin Teng, Gonglin Chen, and collaborators Haiwei Chen and Yajie Zhao will present this work in a poster session on October 23, from 2:30 pm to 4:30 pm (Hawaii Time) at Exhibit Hall I, Board 131.
Three-dimensional reconstruction from 2D images is central to numerous fields, including augmented reality, autonomous navigation, and cultural heritage preservation. While recent neural rendering techniques have enabled high-quality reconstructions from dense image datasets, sparse-view settings—where only a few images are available—remain challenging. Prior approaches using Video Diffusion Models (VDMs) have proven effective in synthesizing novel views along camera trajectories, addressing inconsistencies and gaps in multi-view reconstructions. However, these methods are computationally intensive, requiring iterative sampling that significantly slows the generation process.
FVGen addresses this bottleneck with a novel framework that distills the capabilities of multi-step VDMs into a few-step student model, achieving comparable visual quality in just four sampling steps. The main idea is to optimize a distribution matching distillation loss that bridges the distribution between the student and teacher model. However, the experimental results show that the simple optimization will cause training unstable and sometimes fall in mode collapse. The authors propose an adversarial training and softened reverse KL-divergence minimization, allowing the student model to learn the teacher model’s generation capabilities efficiently. Extensive experiments demonstrate that FVGen reduces inference time by over ninety percent while preserving spatial coherence and visual fidelity, making it highly suitable for real-time reconstruction tasks and large-scale deployments.
The method leverages large-scale, scene-level video datasets to create paired point cloud and video sequences for training. By incorporating generative adversarial objectives, FVGen overcomes stability issues seen in prior distillation techniques, producing sharper and more accurate novel views. Its performance represents a step forward in practical 3D reconstruction, enabling both research and application in domains where dense imagery is infeasible.
New Methodologies
ICCV provides a unique forum for cross-pollination of ideas, and VGL’s participation underscores its commitment to advancing both theoretical understanding and practical methodologies in computer vision. By presenting work on fluid simulation from single images and accelerated novel-view synthesis, the lab highlights two complementary facets of modern vision research: combining physical principles with neural networks and enabling high-quality 3D reconstruction from limited input data.
VGL anticipates that discussions at ICCV will not only refine these methodologies but also inspire new directions, from integrating physics-informed models into dynamic scene prediction to expanding real-time applications of diffusion-based generative methods. As the field continues to evolve, VGL remains focused on bridging the gap between computational innovation and human intuition, demonstrating that computer vision can be both scientifically rigorous and broadly applicable.
ICCV & Beyond
The Vision & Graphics Lab’s presence at ICCV 2025 in Honolulu marks a significant moment for ICT and its student researchers. Emily Jia, Wenbin Teng, and Gonglin Chen exemplify the lab’s dual commitment to foundational research and practical innovation, advancing the frontiers of image-based simulation, 3D reconstruction, and generative modeling. Their work reflects a philosophy that marries rigor and imagination: embedding physical principles into learning systems, accelerating generative models without sacrificing quality, and ultimately contributing to the lab’s larger mission of creating accessible, movie-quality digital humans.
By situating these technical innovations within the broader context of VGL’s research infrastructure, industry partnerships, and academic impact, the lab demonstrates that the future of computer vision and digital human creation lies in integration—across methods, modalities, and disciplines. ICCV provides the ideal venue for such exchange, and VGL’s contributions will spark conversation, collaboration, and further progress in the quest to build intelligent, interactive, and visually compelling digital humans.
//
