Project Leader: Paul Debevec
Creating a Real-Time Photoreal Digital Actor
Activision, Inc. and USC Institute for Creative Technologies
In 2008, the “Digital Emily” project showed how a series of high-resolution facial expressions scanned in a light stage could be rigged into a real-time photoreal digital character and driven with video-based facial animation techniques. However, Emily was rendered offline, was just the front of the face, and was never seen in a tight closeup.
In this collaboration between Activision and USC ICT, we tried to create a real-time, photoreal digital human character which could be seen from any viewpoint, any lighting, and could perform realistically from video performance capture even in a tight closeup. In addition, we needed this to run in a game-ready production pipeline. To achieve this, we scanned the actor in thirty high-resolution expressions using the USC ICT’s new Light Stage X system [Ghosh et al. SIGGRAPH Asia 2011] and chose eight expressions for the real-time performance rendering. To record the performance, we shot multi-view 30fps video of the actor performing improvised lines using the same multi-camera rig. We used a new tool called Vuvuzela to interactively and precisely correspond all expression (u,v)’s to the neutral expression, which was retopologized to an artist mesh. Our new offline animation solver works by creating a performance graph representing dense GPU optical flow between the video frames and the eight expressions. This graph gets pruned by analyzing the correlation between the video frames and the expression scans over twelve facial regions. The algorithm then computes dense optical flow and 3D triangulation yielding per-frame spatially varying blendshape weights approximating the performance.
To create the game-ready facial rig, we transferred the mesh animation to standard bone animation on a 4K polygon mesh using a bone weight and transform solver. The solver optimizes the smooth skinning weights and the bone-animated-transforms to maximize the correspondence between the game mesh and the reference animated mesh. The rendering technique uses surface stress values to blend diffuse texture, specular, normal, and displacement maps from the high-resolution scans per-vertex at run time. The DirectX11 rendering includes screen-space subsurface scattering, translucency, eye refraction and caustics, real-time ambient shadows, a physically-based two-lobe specular reflection with microstructure, depth of field, antialiasing, and film grain. This is a continuing project and some ongoing work includes simulating eyelid bulge, displacement shading, ambient transmittance and several other dynamic effects.