Realistic Modeling of Animatable Faces in MPEG4

Realistic Modeling of Animatable Faces in MPEG-4

Marco Fratarcangeli and Marco SchaerfDepartment of Computer and Systems Science

University of Rome ”La Sapienza”

AbstractThe diffusion of MPEG-4 Facial and BodyAnimation (FBA) specification in products andapplications underlines the importance to haverealistic animations of synthetic faces basedon the compact parameter set provided by thestandard. In this paper we propose a methodto build an anatomically based face modelsuitable to produce general facial animationpreviously encoded in a MPEG-4 FBA stream.Our effort focus in conjugating MPEG-4 facialanimation with some well-known techniquesto reproduce the face anatomy on generalstatic 3D meshes. The anatomy of the face ismodeled as a multiple layer 3D mesh deformedthrough muscles that are placed in appropriatepositions guided by MPEG-4 facial definitionparameters (FDP). Such a model is utilized tosynthesize the main part of the MPEG-4 faceanimation parameters (FAP) through a propermuscle-to-FAP mapping. For each FAP, thecorresponding deformed face, namely morphtarget (MT), is built contracting the musclescorresponding to the FAP. The set of MT isstored in a compact and portable format calledAnimatable Face Model (AFM) and it is usedas an input for MPEG-4 FBA commercialplayers. These players uses the AFM to performanimation encoded in a MPEG-4 data stream.

Keywords: MPEG-4, Facial and Body Anima-tion, Animatable Face Model, Morph Targets,Anatomical-based modeling

1 Introduction

One of the most challenging tasks in computergraphics today is modeling and animating ofsynthetic human faces in a realistic manner. Ac-curate reconstruction of talking faces is requiredin a wide range of fields like in virtual so-cial spaces (where people communicate face-to-face), in entertainment, in distance learning aswell as in the advanced human-machine inter-faces. In other words, the need for animation ofgeneric virtual faces arises where there is the re-quirement to transmit information in a straight-forward and natural way to a final user.

The main difficulty in implementing realis-tic models is in the sensitivity of the human vi-sual system to the nuances of facial expressionsthat it can perceive. A great deal of meaning-ful information is conveyed by facial skin defor-mation, particularly around the forehead, eyesand mouth. In order to achieve an overall re-alism of the final model, a good approach forface modeling and animation is the synthesis ofthe anatomy of the human head, especially thearrangements and actions of the primary facialmuscles.

In this paper, we propose a method to pro-duce a MPEG-4 compliant anatomically basedface model from an initially static 3D mesh. Weused modeling techniques already known in lit-erature since time, in order to build the anatom-ical parts of the model like the skin, the musclesand the skull. Our contribution is in devising thisface model compliantly with MPEG-4 multime-dia standard. By doing that we make it possibleto apply algorithms for automatic constructionand mapping on the skin surface of the muscles

1

and the jaw, speeding up the virtual face build-ing process. Furthermore, the use of MPEG-4allows the produced face models to be easily uti-lized with other facial animation related tools re-lying on the same standard.

1.1 Anatomically-based Facial Modeling

Several different methods have been proposedthrough the years to achieve computer facial an-imation. Key-frame interpolation technique isthe first proposed, it is one of the most intuitiveand still widely used. Basic expressions in 3Dare defined at different moments in the animatedsequence and intermediate frames are simplyinterpolated between two successive basic ex-pressions. This method has been introducedby Parke in his pioneering work [1]. Physics-based approaches attempt to animate faces bysimulating the influence of muscle contractiononto the skin surface. Lee, Terzopoulos andWaters [2, 3] automatically construct a three-dimensional model of a general human faceadapting a predetermined triangle mesh usingthe data obtained through a 3D laser scanner.Their model consists of three layers represent-ing the muscle layer, dermis and epidermis. Theelastic properties of the skin are simulated usinga mass-spring system. Due to volume preser-vation and skull rejection constraints, this ap-proach produces realistic effects at interactiveframe rates. To advance the simulation, theirmethod relies on explicit numerical integration(Euler’s steps). This may lead to a slow evolu-tion of the numerical simulation since very smalltime steps are required to ensure stability. Analternative method for the integration of the stiffmass-spring system is proposed by Baraff andWitkin [4, 5]. They provide the theory for astable implicit integration using very large timesteps. Wuet al. [6] focus on generation of ex-pressive wrinkles and skin aging effects. Waters[7] presented a parametric muscle model whichsimulate the behaviour of linear and sphincterfacial muscles. Another complete anatomicallybased face model is provided by Kahler et al.[8, 9]. They propose a muscle structure com-posed by quadric segments. Their method al-lows to easily create animatable facial modelsfrom given face geometry through their editingtool.

1.2 MPEG-4 Multimedia Standard

The recently released MPEG-4 internationalstandard mainly focuses on networking capabil-ities and it therefore offers interesting possibili-ties for teleconferencing, as the requirements forthe network bandwidth are quite low. This spec-ification defines a framework, that allows em-ploying model-based coding of human faces ina systematic manner.

A significant part of such a standard is theFace and Body Animation, or FBA: the speci-fication for efficient coding of shape and anima-tion of human faces and bodies [10, 11]. Insteadof regarding animation as a sequence of frameswith fixed shape and size, the animation of avirtual face is achieved by transmitting only thecoded facial movements and then the animationcan be re-synthesized on the client-side throughthe proper deformation of a virtual face.

MPEG-4 FBA specifies a face model in itsneutral state together with two main data sets:84 facial definition points (FDPs), also calledfeature points, and 68 facial animation points(FAPs). FDPs are control points that are used todefine the shape of a proprietary face model andto provide a spatial reference for defining FAPs(Figure 1). FDPs are arranged in groups such ascheeks, eyes, and mouth. The location of FDPshas to be known for any MPEG-4 compliant facemodel.

FAP values specify precisely how much aFDP of a face has to be moved, deforming aface model from its neutral state and, thus, an-imating it. The FAP set is subdivided in high-and low- level parameters. Low-level FAPs areused to express basic action that the face canperform, like close an eyelid or stretch a cor-ner lip. High-level FAPs are useful to express ina compact way more complex movements likeexpressions and visemes (a viseme is the visualcounterpart of a phoneme). High- and low-levelFAPs represent a complete set of facial actionsincluding head motion, tongue, eyes and mouthcontrol. FAPs allow representation of naturalfacial movements and represent an ideal set ofanimation parameters suitable to define muscleactions.

The FBA parameters, both FDPs either FAPs,can be extracted from visual, audio and motioncapture systems [12, 13, 14, 15], properly com-

2

Figure 1: MPEG-4 Facial Definition Parameters(or Feature Points) [10, 11].

pressed and used to achieve, for example, tele-conferencing with a low bandwidth (< 3 Kbps)[16, 17]. Facial animation (FA) based appli-cations may run on a wide variety of differenthardware and software platforms, like renderingsystems (3D Studio MAX, Softimage, OpenGL)or embedded systems (mobile phones, PDA).MPEG-4 standard can be used to connect thesources to the target platforms allowing for greatflexibility.

1.3 Our Approach

Our final purpose is to animate a generic 3D facemesh in a realistic manner. The facial animationmust be driven by an encoded FAP stream.

In order to do this, we used some already de-veloped techniques to build a face model thatconforms to the anatomical structure of the hu-man head. To be precise, we used the skinmodel and the formulation of the skull penetra-tion constraints proposed by Leeet al. [2, 3].The muscle models have been devised by Waters[7]. Fitting of the jaw bone has been introduced

by Kahleret al. [8]. The numerical integrationtechnique, that we use to deform the face model,is from Baraff and Witkin [4, 5].

Applying these methods to a general 3D facemesh with corresponding MPEG-4 FDP data setincluded, we have been able to design a methodto build an anatomical face model in automaticway and in a short time, without any interactionfrom the final user. Using the FDPs associatedwith the face mesh, we estimate automaticallythe jaw mesh and the muscle map. Every fa-cial muscle is then properly mapped on the skinsurface in order to define its influence area. An-imation of the face is achieved by physics-basedsimulation of the skin model under the influenceof the force field generated by the muscle mod-els.

Once the face model is built, we have mappedeach MPEG-4 FAP to the contractions of a mus-cle group. Through this mapping, for each givenFAP amplitude, we have been able to know pre-cisely which vertices of the face move and howmuch. Using this information, we build an An-imatable Face Model (AFM) of the initial facemesh that can be interpreted by a MPEG-4 FBAplayer to perform facial animation encoded in aFAP stream.

2 Multi-layer Skin Tissue Model

We used techniques, already developed by Leeet al., to model the human skin. We need to de-scribe shortly the skin model to depict how westructured the face model. The reader can referto [2, 3] for a more complete description.

To simulate the visco-elastic behaviour of thehuman skin, the skin tissue is devised as a net-work of masses linked together through springs.These springs simulate the nonlinear behavior ofthe human facial tissue through a proper modu-lation of their stiffness. The geometry data ofthe input face mesh forms the basis for the gen-eration of the multi-layer tissue model and de-fines the epidermal layer, that is the most exter-nal layer. For each triangle of the input mesh,one basic prismatic tissue element, as shownin Figure 2, is created by connecting the massnodes to each other using nonlinear springs.Three kinds of springs connect mass nodes eachothers.Layersprings connect nodes on the same

3

layer; connectingsprings links nodes placedon different layer;crossedspring simulates theskin behavior under shearing or twisting stresses[2, 3]. The topmost surface of the lattice rep-

Figure 2: Multi-layer skin tissue element.

resents the epidermis. It is a rather stiff layerof keratin and collagen and the spring parame-ters are set to make it moderately resistant to de-formation. The springs in the second layer arehighly deformable, reflecting the nature of der-mal fatty tissue. Nodes on the third surface ofthe lattice represent the hypodermis to which fa-cial muscle fibers are attached while the bottom-most surface represent the underlying bony im-penetrable structure. A skull offset surface is es-timated by scaling, according to a proper factor,the initial face mesh around its center of grav-ity. Such a surface permits the simulation of theimpenetrability of the bony structure of the facemaking the skin slide on it.

To a large extent, a face mesh is formed bydifferent parts such as teeth, hair, eyes, etc. Theepidermal surface influenced by the muscles isdetected by considering the face model surfacewhere the FDP2.1 is placed. The positions ofthe mass nodes on the other inner layers arecomputed by tracing a line from the epidermalnodes in the direction of the skin mesh center ofgravity.

In order to be suitable for the skin model con-struction, we require that the initial face meshhas to be compliant with the MPEG-4 neutralstate definition [10].

In Figure 3 it is showed a general input meshwith the corresponding skin model.

3 Modeling of Facial Muscles

In the human head, the facial muscle are super-ficial, they are mostly attached to both the skulland the skin tissue. There are a wide range of

Figure 3: After the adjustment of the input facemesh, the skin model is built. Notethat face parts like eyes, hair, shoul-ders are not affected.

muscle types, rectangular, triangular, sheet, lin-ear and sphincter. To reproduce the action ofthese muscles on the skin tissue, we use a pa-rameterized and flexible muscle model for linearand sphincter muscles developed by Waters [7].We will describe it briefly without discussingmathematical details and then we will explainour method to map this muscle model in auto-matic way on the skin model.

There are two types of modeled muscles, lin-ear and sphincter. Alinear muscle consists of abundle of fibers that share a common emergencepoint in bone and pulls in an angular direction.One of the examples is the zygomaticus majorwhich attaches to and raises the corner of themouth. The end of the facial muscle attachedto the skull is generally considered the origin(namely theattachmentpoint), while the otherone is theinsertionpoint. Normally, the attach-ment is the fixed point and the insertion is wherethe facial muscle performs its action. Asphinc-ter muscle consists of fibers that loop around fa-cial orifices and has an elliptical shape; an ex-ample is the orbicularis oris, which circles themouth and can pout the lips.

In our face model, there are 25 muscles. 23muscles have been selected taking the majorfunctional facial muscle groups according to themuscle map model presented in the Facial Ac-tion Coding System (FACS), developed by Ek-man and Friesen [18]. The remaining pair ofmuscles, namelyupper lip, are located betweenthe upper lip and nose and doesn’t exist in a hu-man face. These muscles have been added to ourmodel to increase the range of allowed move-ments of the skin tissue around the mouth. Threesphincter muscles are used to represent the or-bicularis oris and orbicularis oculi. The other

4

muscles are modelled as linear ones. The com-plete facial muscle structure is shown in Fig-ure 4.

Figure 4: Complete facial muscle structure usedin our animation system.

Furthermore we estimate the shape of themovable part of the jaw. Using the approximatedjaw mesh, we are able to know which skin nodesand which muscles are attached to it. When thejaw rotates during the animation, these parts willfollow it.

3.1 Automatic Muscle Construction

The simulation of facial movements requires ananatomically correct muscle construction in theface model in order to achieve the desired ani-mation activating the proper muscle group.

Once the skin tissue model has been built,the automatic construction of the muscles can beperformed. This process is achieved by consid-ering the MPEG-4 FDP data set available withthe face mesh. As said in the Introduction sec-tion, such data can come from a number of dif-ferent MPEG-4 compliant sources, mostly vi-sual tracking systems. This information is usedto find the correspondence between FDPs andthe muscle key points according to Table 1 and2. In these tables, addictions and subtractionsbetween FDPs are computed considering theirrespective position vectors. For a linear mus-cle, the positions of the attachment and inser-tion points of its central muscle fibre completelydefine the location of the muscle. A sphinctermuscle is characterized by its epicenter and boththe semi-axis. After the muscle key point posi-tion is computed, it is mapped to a mass nodeof the skin tissue model. Generally, there is

Linear Muscle Attachment Point Insertion Point

Frontalis left 11.1 + 16 (11.3− 11.1) 4.1

Inner right 11.1 + 16 (11.3− 11.1) 4.2

Frontalis left 11.1 + 13 (11.3− 11.1) 4.3

Major right 11.1 + 13 (11.2− 11.1) 4.4

Frontalis left 11.1 + 23 (11.3− 11.1) 4.5

Outer right 11.1 + 23 (11.2− 11.1) 4.6

Corrugator left 4.3 4.3 + (4.3− 4.1)

Supercilliary right 4.4 4.4 + (4.4− 4.2)

Nasalis left 9.7 9.1

right 9.6 9.2

Upper Lip left 9.5 + 12 (9.15− 9.5) 8.10

right 9.4 + 12 (9.15− 9.4) 8.9

Zygomaticus left 3.9 8.5 + 12 (2.6− 8.5)

Minor right 3.10 8.6 + 12 (2.7− 8.6)

Zygomaticus left 5.3 8.3

Major right 5.4 8.4

Risorius left 5.1 8.3

right 5.4 8.4

Depressor left n = 8.3 + (8.3− 9.1) 8.3

Anguli right m = 8.4 + (8.4− 9.2) 8.4

Mentalis left 2.1 + (2.1−m) 2.8

right 2.1 + (2.1− n) 2.9

Table 1: Correspondence between MPEG-4FDPs andlinear muscle key points.

not a mass node in the exact position of the keypoint. To find a suitable candidate node to mapthe key points of each muscle, all the nodes inthe proper skin layer are scanned and the keypoint is mapped in the nearest mass node. Thisis done for all the muscle key points except forthe epicenter of the sphincter muscles. This par-ticular kind of key point is not mapped in a massnode, its position is defined in the initializationphase and then it is changed only by rotationsand translations that the head can perform dur-ing the animation.

For the linear muscles, the attachment pointsare mapped in nodes belonging to the skull layerand the insertion points are mapped in the der-mal layer. Sphincter muscle key points aremapped in the hypodermal layer.

We associated all the muscle key points tomass nodes in the skin model (but the sphincterepicenters), because in this way the shape of themuscle change accordingly to the movements ofthe skin. Waters’ muscle model is parametric,

5

Sphincter Muscle Semi-minor Semi-major Epicenter

axis axis

Orbicularis left 3.9 3.7 3.5

Oculi right 3.10 3.12 3.6

Orbicularis Oris n = 8.2 m = 8.3 n+m2

Table 2: Correspondence between FDPs andsphinctermuscle key points.

so it adapts to the dynamic shape that the mus-cle assume during the animation.

Figure 5 shows the outcome of the proposedmethod applied to different test face meshes.Red key points are muscle insertion points, thegreen ones are muscle attachment points. Foreach muscle, its attachment and insertion pointsare connected by a white line. For a sphinctermuscle, a white ellipsis is drawn.

Figure 5: The resulting muscle map applied todifferent face meshes.

In a human face, muscle pairs doesn’t move ina perfect symmetrical manner and, sometimes,muscles forming the pair moves in a completelyindependent way one from the other (see zygo-maticus mayor left and right in disgust expres-sion, Figure 8d). For this reason, each musclein the virtual face framework is built in orderto move independently from the other muscles,achieving in simulating the widest range of pos-sible muscle movements configurations.

3.2 Muscle Mapping on the Skin

From the analysis of various expressions gen-erated by the contraction of the muscles, it canbe noticed that most significant skin deforma-tions on average occur in the direct vicinity ofthe applied external muscle forces, while massnodes further from this region have a smaller in-fluence on the nodal displacement. This is due

to two main reasons. The first one is that facialmuscle action is local and does not involve largeface portions. The second one comes from thevisco-elastic properties of the human skin: be-ing composed by 90% of water, the skin tissueabsorbs, through friction, the elastic wave pro-duced by the movement of the muscles. In thebio-mechanical skin model, the damping param-eters are with relatively high value to reflect thisproperty. This makes the basis for the assump-tion that the velocity of the nodes at a proper dis-tance from the muscle influence areas is smallenough so that it can be approximated to0.

Based on the above facts, our strategy is touse the numerical integration scheme describedby Baraff and Witkin [4, 5] to solve the govern-ing motion equation [2, 3]only where needed,that is, only in the influence area of the movingmuscles and where the elastic wave significantlydisplaces the mass nodes. In order to do this,for each facial musclei, the skin node setV isdivided into two subsets:Vi

d andVis. Thedy-

namicnode setVid corresponds to the portions

of the skin that are displaced when the facialmusclei moves. The large number of nodes thatare not moved during the musclei movement isincluded in thestatic node setVi

s. By cuttingout the evolution of these nodes, a large amountof computation is saved.

In the initialization phase,Vid is computed

automatically by considering the anatomicalshape of the muscle and the skin area influencedby the muscle action. The influence area foreach musclei is obtained according to the mus-cle type.

3.2.1 Linear Muscle Mapping

For eachlinear musclei, the central fiber lengthis computed as the distance between the attach-ment point and the insertion point. By doingthat, it is possible to calculate the width of themuscle by multiplying it for themuscular widthcoefficientω. The muscular width coefficientis a pre-calculated constant computed observ-ing the ratio between each muscle width withthe relative central fiber length in the real humanhead. Each mass node in the facial skin structureis then scanned. If the distance of the nodenfrom the central fiber is less than the computedmuscular width, thenn is included inVi

d.

6

Since the wide variety of possible geometri-cal shape of the input mesh, such a mapping isnot always perfect, particularly for thementalismuscle pair andupper lip muscle pair, both ofthem displaced in the mouth area. These mus-cles affect, respectively, the lower part and theupper part of the lip. Applying the above men-tioned mapping, some of the nodes of the upperpart of the lip could belong to the mentalis pairand, vice versa, the upper lip muscle pair couldbe mapped in some of the nodes of the lowerpart of the lip. This is due to the thin border be-tween upper and lower lip. This border often hasan irregular shape. Such a problem will cause awrong behavior during the animation because,for example, when one of the mentalis musclemove, the corresponding nodes in the upper lipwill move too.

To fix the problem, we apply a breadth-firstsearch algorithm that explores the skin topol-ogy in the mouth area removing the nodes erro-neously mapped. First, a ”mouth window” is de-fined in the preprocessing phase. Then startingfrom a proper FDP, the skin node graph is ex-plored. If the exploration go outside the mouthwindow then that path is closed. The mouthwindow is defined as:

• top = 9.15.y

• bottom = 2.11.y − |(2.11.y − 2.2.y)/2|

• left = 2.4.x− (MW0)/20)

• right = 2.5.x + (MW0/20)

whereMW0 is the mouth width. For, exam-ple, assume we want to delete the nodes in theupper part of the lip belonging to the mentalisleft muscle due to the imprecise muscle map-ping. The algorithm starts from the node wherethe FDP2.2 (Figure 1) is mapped. If one of theexplored node, belongs toVmentalis left

d , then itis removed from this set. For one of the upper lipmuscle pair, the search will start from the nodewhere the FDP2.3 is mapped.

3.2.2 Sphincter Muscle Mapping

Suppose(nx, ny, nz) and(epix, epiy, epiz) arethe 3D vectors identifying the position of themass noden and the sphincter muscle epicen-ter epi, respectively. For eachsphinctermuscle

i, thus,n belongs to the muscle influence area,if satisfy(

nx − epixa

)2

+(

ny − epiyb

)2

≤ 1 (1)

wherea andb are the length of the semi-majoraxis and the semi-minor axis, respectively. Notethat the gaze of the face model is in the direc-tion of z-axis according to MPEG-4 neutral statedefinition, sonx andny are the only interestingcoordinates here. Furthermore, to be included inVi

d, n must have position in the front side of theface. In other words, it must satisfy this condi-tion:

vn · vepi ≥ 0 (2)

wherevn is the nodal normal ofn andvepi isthe nodal normal of the muscle epicenter.

3.2.3 Lower Jaw Mapping

Human skull is composed by two parts: upperskull and jaw. The movable part is thejaw.

In order to automatically distinguish the jawpart on the skin, we use some of the FDPs as areference to estimate the shape of the jaw bone.Some of these reference points are linked to-gether to estimate a sort of approximated jawmesh. In Figure 6, the involved FDPs areshowed together with the way they are linkedeach other to form such a mesh. Once the jaw

Figure 6: Cylindrical projection of the estimatedjaw mesh through the use of some ofthe MPEG-4 FDPs.

mesh has been estimated, we map it to the skinby using a method stated by Kahleret al. [8]. Asegment is traced starting from each skin vertex,normal to the facial surface and pointing towardsthe interior of the face. As the segment from the

7

skin vertex is traced, all the faces of the gener-alized jaw mesh are scanned. If one of such tri-angular faces is intersected by the segment, thenthe relative face skin vertices belongs to the jawinfluence area. So they will be moved when thejaw activates.

As for the linear muscles, the mapping of thejaw could be unperfect. For example, nodes inthe upper lip could be mapped as belonging tothe jaw, while nodes in the chin could be notmapped to the jaw. This is solved using the samebreadth-first search algorithm applied to the skintopology in order to add the chin nodes to thearea of influence of the jaw and to remove thenodes from this latter located in the upper lip.

3.2.4 Skin Nodes Displaced by the ElasticWave

Some of the remaining nodes on the face do notreceive any force directly from the single musclei but are still displaced to new positions due tothe propagation of the elastic wave through theskin tissue. Such nodes are also inserted into thedynamic node setVi

d. These nodes are detectedin the preprocessing, by measuring the deforma-tion of the skin mesh caused from the action ofthe musclei. After contractingi, we apply thenumerical integration to all the skin nodes. Ifthe position of a node is modified by more thana small pre-specified threshold, then it is consid-ered as a node ofVi

d, otherwise our system en-gine stops propagating the deformation further.

In Figure 7, there is the dynamic node setautomatically detected for two muscles of thebeta model, zygomaticus major (linear) andorbicularis oris (sphincter). The red dots are themass nodes in the influence area of the muscle,the green dots are the nodes displaced by theelastic wave caused by the muscle movement.

Not all regions of the face are linked with thebones underneath. To simulate this feature, theskin nodes belonging to the area of influence ofthe muscles in the lips and in the cheeks are notconnected to the underlying bony structure. Thisis achieved by setting to0 the stiffness coeffi-cient of the corresponding springs that connectthe hypodermal layer nodes to the skull surface.Furthermore the skull rejection force [2, 3] is notapplied to these nodes.

Figure 7: Dynamic node sets inbeta facemodel for two muscles: left: zygo-maticus major right (linear); right: or-bicularis oris (sphincter).

In order to perform the numerical integrationin an efficient way, at each iteration, the numeri-cal simulation is applied only to the nodes be-longing to the influence area of the currentlymoving muscles. If a musclei stops, the al-gorithm still applies numerical simulation to thecorrespondingVi

d set until all its nodes have ve-locity equal to0. In this way, the number ofnodes treated in dynamic fashion adapts in anautomatic way and the computational cost de-crease.

3.3 Auto-Calibration of the MuscleContractions

The initial face meshes can differs greatly eachother for polygon number, for spatial dimen-sions, etc. So, the same force field, correspond-ing to a given muscle contraction value, canproduce very different results on the skin tis-sue model depending from its geometry. Sincewe need to have the better possible control onthe face mesh to foresee the facial movementscaused by the muscle contractions, we apply analgorithm in order to calibrate the minimal andmaximal contraction value for each muscle. Inthe initialization phase, such an algorithm mea-sure the displacement caused by increasing con-tractions applied to the muscle. The displace-ment is compared with precalculated minimaland maximal elongation of the muscle. If themuscle reach one of these fixed values the corre-sponding contraction is stored. Thus we can ex-press the muscle contraction in percentage andwe are able to know how much the muscle willdeform the face.

8

4 Facial Animation ParametersSynthesis

In this section we explain how the developedface model can be used to synthesize the mainpart of the MPEG-4 FAPs. For each FAP, amorph targetis created. A morph target is theface performing a particular FAP. The whole setof morph target can be used to perform generalfacial animation.

4.1 Morph Targets

A morph target is a variation of an object thathas the same mesh topology, but different vertexpositions. Essentially, it is a deformed object.Smooth morphing between a neutral object andthe morph target is achieved by simple linear in-terpolation of vertex positions. Interpolation isused extensively for facial animations as men-tioned in [10]. The use of interpolation betweenmorph targets is preferred because its implemen-tation simplicity ensures fairly easy portabilityto various platforms and low cpu requirementsensure good performances even on modest plat-forms (mobile phones, PDA).

In the system developed here, each FAP is de-fined as a morph target. Then, through the in-terpolation between the produced morph targets,facial animation is achieved. This has been suc-cessfully accomplished by using the set of pro-duced morph targets as an input for the MPEG-4 players provided by Visage Technologies AB[19]. In particular, we utilized two of them. Avery light and fast MPEG-4 FBA player usingOpenGL graphic libraries, useful to achieve fa-cial animation at interactive rate, and a plug-infor 3D Studio MAX, very slow in performingthe animation but very accurate in the rendering.

The single morph targets are stored in a com-pact way in a semi-proprietary format contain-ing the final model, namely Animatable FaceModel (AFM). A valid AFM may be producedwith very few morph targets, e.g. only withvisemes if the model is animated using onlyhigh-level speech animation.

For each morph target corresponding to a low-level FAP, areference valuemust be defined,that is, the value of the parameter that the morphtarget represents. This is necessary in order tohave a reference for interpolation. For exam-

ple, if the reference value for morph targetopenjaw is 1024, it means that the morph target con-tains the face with the jaw opened exactly somuch that it corresponds to the value ofopenjaw FAP set to1024. If during animation theopen jaw parameter is set to512, the weightof the morph target will be set to0.5 to achievethe correct interpolation.

4.2 Producing the Animatable FaceModel

Because the Facial Animation Players providedby Visage Technologies AB are based on morphtargets (MTs), the building of a new Animat-able Face Model (AFM) consists of producinga deformed face corresponding to each of theMPEG-4 high- and low-level parameters. TheAFM is a standard VRML file containing thewhole set of MTs stored in a compact way thatcan be interpreted by the players, independentlyfrom the platform where the players are imple-mented. This means that the model will be visi-ble in any VRML player, albeit static; in order toview the animations, the Facial Animation Play-ers have to be used. The AFM construction andthe FAP decoding are, thus, two separate pro-cesses. This allows for keeping simple the per-forming of the animation (through the linear in-terpolation) and, at the same time, makes possi-ble to use always more sophisticated techniquesto build the anatomical face model.

In order to produce the AFM from a staticface model, the morph targets must be created.This is achieved modifying the muscle configu-ration in order to properly deform the face mesh.

We give an example of the correspondencebetween the low-level FAPs and the mus-cle contractions. In Table 3 there are thethe first seven low-level FAPs. Each morphtarget is defined for the corresponding refer-ence value of the FAP. We synthesized themorph targets for the set of FAP needed toperform a human facial animation. We havenot synthesized the FAP need for cartoon-style animation, like ears movements oreyeballs thrust. The set of tables for the map-ping of all the FAPs is available at the web sitehttp://legion.gibbering.net/marfr960 .

9

No. FAP name Muscles Contr.s (%) Ref.Value

3 openjaw jaw 45 2048

4 lower t midlip Upper lip left 150 1024

Upper lip right 150

Zygomaticus minor left 1

Zygomaticus minor right 1

5 raiseb midlip Mentalis left 150 2048

Mentalis right 150 512

6 stretchl cornerlip Risorius left -350 512

7 stretchr cornerlip Risorius right -350 1024

8 lower t lip lm Upper lip left 150 1024

Zygomaticus minor left 1 2048

9 lower t lip rm Upper lip right 150 2048

Zygomaticus minor right 1

Table 3: Correspondence example between thefirst seven low-level FAPs and musclecontractions.

5 Results and Conclusions

We have developed a method to produce anAnimatable Face Model (AFM) starting froma generic 3D face mesh with the correspond-ing FDP data set available. We used somewell-known techniques to reproduce an anatom-ical model of the face. The facial muscles aremapped at anatomically correct position usingthe MPEG-4 FDPs as reference. The face modelis thus used to build the AFM. In the AFM thereare stored all the information to deform the ini-tial face mesh. Such a AFM is used to per-form general face animation encoded in a FAPdata stream. The animation decoding process isachieved in a proper MPEG-4 FBA player. Sucha player can be implemented on a home PC aswell as on embedded systems like cell phonesor PDAs, applying the same concepts with rela-tively low effort. The AFM produced with ourmethod can be inserted in a MPEG-4 compliantframework for the production and delivery of fa-cial animation [19, 20]. Our method has beenimplemented and tested in a win32/OpenGL ap-plication. Figure 8 shows some of the AFMthat we have built using our method. Picturesfrom a. to d. shows some test AFMs perform-ing high-level expression FAPs rendered usingan OpenGL-based player (a. smile. b. fear. c.anger. d. disgust). Pictures e. and f. shows

Figure 8: AFMs in action.

two frames from an animation rendered with 3DStudio MAX.

Another possible approach for the FAP-drivenfacial animation is to decode the FAP valuesfor each animation frame and move directly themuscles of the anatomical face model withoutproducing any morph target. This possibilityshould be explored but an easily imaginabledrawback is in the computational effort that thenumerical simulation requires, not supportableby low-capabilities hardware.

Acknowledgements

Authors wish to thank Igor Pandzic, RobertForchheimer, Jorgen Ahlberg and Ingemar Rag-nelmalm for useful discussions and support inmaking this work possible.

References

[1] F. Parke. ”Computer Generated Animationof Faces”. InACM National Conference,pages 451–457. ACM, 1972.

[2] Y. Lee, D. Terzopoulos and K. Waters.”Constructing Physics-Based Facial Mod-

10

els of Individuals”. InProceedings of theGraphics Interface ’93 Conference, pages1–8, Toronto, ON, Canada, May 1993.

[3] Y. Lee, D. Terzopoulos and K. Waters.”Realistic Modeling for Facial Anima-tion”. In Computer Graphics Proceed-ings, Annual Conference Series, pages 55–62, Los Angeles, CA, August 1995. SIG-GRAPH 95, ACM SIGGRAPH.

[4] D. Baraff and A. Witkin. ”An Introductionto Physically Based Modeling”. InSIG-GRAPH Course Notes, 1993.

[5] D. Baraff and A. Witkin. ”Large Steps inCloth Simulation”. InComputer Graph-ics Proceedings, Annual Conference Se-ries, Orlando, July 1998. SIGGRAPH 98,ACM SIGGRAPH.

[6] Y. Wu, P. Kalra, L. Moccozet andN. Magnenat-Thalmann. ”SimulatingWrinkles and Skin Aging”. The VisualComputer, 15(4):183–198, 1999.

[7] K. Waters. ”A Muscle Model for Ani-mating Three-Dimensional Facial Expres-sions”. In Proceedings of Siggraph,volume 21, pages 17–24, Toronto, ON,Canada, July 1987.

[8] K. K ahler, J. Haber and H.-P. Seidel.”Geometry-based Muscle Modeling forFacial Animation”. InProceedings Graph-ics Interface 2001 (GI-2001), pages 37–46, Ottawa, Ontario, June 2001. MorganKaufmann.

[9] K. K ahler, J. Haber, H. Yamauchi and H.-P. Seidel. ”Head shop: Generating ani-mated head models with anatomical struc-ture”. In Proceedings of the 2002 ACMSIGGRAPH Symposium on Computer An-imation, pages 55–64, San Antonio, USA,October 2002. ACM SIGGRAPH.

[10] I.S. Pandzic and R. Forchheimer, editors.”MPEG-4 Facial Animation – The Stan-dard, Implementation and Applications”.John Wiley & Sons, LTD, Linkoping, Swe-den, 1st edition, 2002.

[11] Moving Pictures Expert Group. MPEG-4 International standard. ISO/IEC 14496.URL: www.cselt.it/mpeg, May 2002.

[12] J. Ahlberg and F. Dornaika. ”EfficientActive Appearance Model for Real-TimeHead and Facial Feature Tracking”. InIEEE International Workshop on Analy-sis and Modelling of Faces and Gestures(AMFG), Nice, October 2003.

[13] J. Ahlberg. Model-based Coding –Extraction, Coding and Evaluation ofFace Model Parameters. Phd thesisno. 761, Department of Electrical Engi-neering, Linkoping University, Linkoping,Sweden, September 2002.

[14] T. Goto, M. Escher, C. Zanardi andN. Magnenat-Thalmann. ”MPEG-4 BasedAnimation with Face Feature Tracking”.In Computer Animation and Simulation’99, pages 89–98.

[15] W.-S. Lee, M. Escher, G. Sannier andN. Magnenat-Thalmann. ”MPEG-4 Com-patible Faces from Orthogonal Photos”. InCA, pages 186–194, 1999.

[16] J. Ostermann. ”Animation of SyntheticFaces in MPEG-4”. InComputer Anima-tion, pages 49–51, Philadelphia, Pennsyl-vania, June 1998.

[17] T.K. Capin, E. Petajan and J. Ostermann.”Very Low Bitrate Coding of Virtual Hu-man Animation in MPEG-4”. InIEEE In-ternational Conference on Multimedia andExpo (II), pages 1107–1110, 2000.

[18] P. Ekman and W. Friesen.”Manual for theFacial Action Coding System”. ConsultingPsychologist, Palo Alto, California, 1977.

[19] Visage Technologies AB. URL:www.visagetechnologies.com, march2004.

[20] I.S. Pandzic. ”Facial Animation Frame-work for the Web and Mobile Platforms”.In Web3D Symposium, Tempe, AZ, USA,2002.

11

Realistic Modeling of Animatable Faces in MPEG4

Documents

Transcript of Realistic Modeling of Animatable Faces in MPEG4