Workshop 2: Functional Markup

The Seventh International

Conference on Autonomous

Agents and Multiagent Systems

Estoril, PortugalMay 12-16, 2008

Workshop 2:Functional Markup Language

Dirk Heylen

Stefan Kopp

Stacy Marsella

Catherine Pelachaud

Hannes Vilhjálmsson

(Editors)

Why Conversational Agents do what they doFunctional Representations for Generating

Conversational Agent BehaviorThe First Functional Markup Language

Workshop

Dirk Heylen (University of TwenteStefan Kopp (University of Bielefeld)

Stacy Marsella (University of Southern California)Catherine Pelachaud (Paris VIII University and INRIA)

Hannes Vilhjalmsson (Reykjavk University)

April 9, 2008

!"#$%&'( #&)( *&+,"-,"./&#0( 1+#&2,( %&( 3,0#+%/&#0( 4',&+.!

!

"#$%&'(!)#*+$%,-!

.%,&'-/0&-,1!21#3-,0#&(!4%55-6-!%7!4%$89&-,!/1:!;17%,$/&#%1!<*#-1*-!

=>?!@91&#16&%1!A3-B!CD@E?EB!)%0&%1B!FA!?EGGH!

I#*+$%,-J**0K1-9K-:9!!

!!

!

!

ABSTRACT

In this paper, I describe the concepts of interpersonal stance and

conversational frame, and why they are important functions to

represent for embodied conversational agents in general, but

especially those designed for social and relational interactions

with users.

Categories and Subject Descriptors

H5.2 [Information Interfaces and Presentation]: User Interfaces—

Evaluation/methodology, Graphical user interfaces

General Terms

Algorithms, Design, Human Factors, Standardization, Theory.

Keywords

Relational agent, embodied conversational agent, virtual agent,

conversational frame, contextualization cue.

1. INTRODUCTION The vast majority of dialogue systems developed to date

(embodied and otherwise) have been designed to engage people in

a strictly collaborative, task-oriented form of conversation in

which the communication of propositional information is the

primary, if not only, concern. As our agents expand their

interactional repertoires to include social interaction, play, role-

playing, rapport-building, comforting, chastising, encouraging

and other forms of interaction they will need both explicit ways of

representing these kinds of conversation internally and ways of

signaling to users that a new kind of interaction has begun.

Various social scientists have coined the term conversational

frame to describe these different forms of interaction .

Another emerging trend in embodied agent research is the desire

to model relational interactions, in which one of the objectives is

the establishment of rapport, trust, liking, therapeutic alliance, and

other forms of social relationship between a user and an agent [6]. One of the most important types of behavioral cues in these

interactions are those which display social deixis, or interpersonal

stance, in which one person exhibits their presumed social

relationship with their interlocutor by means of behaviors such as

facial displays of emotion, proxemics, and overall gaze and hand

gesture frequency.

While these two phenomena—framing and interpersonal stance—

have very different conversational functions, their effect on verbal

and nonverbal conversational behavior is very similar. As with

the effects of affective state or attitude, they have a global impact

on verbal and nonverbal behavior, affecting both the choice of

whether a given behavior is exhibited or not (e.g., a particular

hand gesture) as well as the quality of behaviors selected.

In this paper, I discuss each of these conversational functions,

reviewing work from linguistics, sociolinguistics and the social

psychology of personal relationships, and motivate their use in

virtual agents in a health counseling domain. I then discuss the

current implementation of these conversational functions in the

health counseling agents my students and I have been developing

over the last several years, and desiderata for including these in

the emerging Functional Markup Language (FML) specification.

2. FRAMING Gregory Bateson introduced the notion of frame in 1955, and

showed that no communication could be interpreted without a

meta-message about what was going on, i.e., what the frame of

interaction was [4]. He showed that even monkeys exchange

signals that allow them to specify when the "play" frame is active

so that hostile moves are interpreted in a non-standard way.

Charles Fillmore (1975) defined frame as any system of linguistic

choices associated with a scene (where a scene is any kind of

coherent segment of human actions) [11]. Gumperz (1982)

described this phenomena (he called contextualization) as

exchanges representative of socio-culturally familiar activities,

and coined "contextualization cue" as any aspect of the surface

form of utterances which can be shown to be functional in the

signaling of interpretative frames [14]. Tannen went on to define

conversational frames as repositories for sociocultural norms of

how to do different types of conversation, such as storytelling,

teasing, small talk, or collaborative problem-solving talk [24].

The sociocultural norms take the form of assumptions, scripts

(prototypical cases of what one should do), and constraints (what

one ought not to do). These parallel the description of topics that

can be taken for granted, reportable (talked about; relevant) or

excluded based on sociocultural situations, as described in [22]. Scripts can dictate parts of interactions explicitly (as in ritual

greetings), describe initiative or turn-taking strategies (e.g., the

entry and exit transitions in storytelling and the imperative for the

storyteller to hold the floor [15]) or describe the obligations one

has in a given situation (as done in [26] for the collaborative

task-talk frame).

Padgham, Parkes, Mueller and Parsons (eds.): Proc. of AAMAS 2008,

Estoril, May, 12-16, 2008, Portugal, pp. XXX-XXX

Copyright ! 2008, International Foundation for Autonomous Agents

and Multiagent Systems (www.ifaamas.org). All rights reserved.

The contextualization cues must be used by an agent to indicate

an intention to switch frames. While many contextualization cues

are nonverbal (see [14] and most of [3]), there are many

examples of subtle linguistic cues as well (people rarely say “let’s

do social chat now”). Often these can be ritualized or

stereotypical opening moves or topics, for example a question

about the weather or immediate context for small talk [21], or a

story initial cue phrase ("Oh, that reminds of when…") [10].

In many ways, frames act like recipes in the SharedPlans theory

[12]. They are instantiated in response to a shared goal of the

interlocutors to work towards satisfaction of the goal. They

specify sub-goals that must satisfied, and they are placed in the

intentional structure (plan tree) in the same manner as recipes to

indicate embedding relationships among discourse segments.

However, there are many significant differences between frames

and recipes. First, while the discourse segment purposes [13] associated with recipes are pushed onto the focus stack, and (in

most situations) only the top of this stack is inspected during

generation or interpretation, the same is not true for frames.

Frames can also be nested (for example, interlocutors within a

storytelling frame embedded in a small talk frame embedded in a

doing conversation frame). But, whereas the intentional state

reflects the why of a dialogue and the attentional state reflects the

what, the set of nested frames reflects the how; that is the set of

sociocultural norms and conventions in use. Further, the top-level

frame is not the only one used; conventions and assumptions from

all enclosing frames are still in effect unless overriden by those

higher up on the stack.

3. INTERPERSONAL STANCE One way in which language can be used to set relational

expectations is through social deixis, or what Svennevig calls

“relational contextualization cues” [23], which are “those aspects

of language structure that encode the social identities of

participants…or the social relationship between them, or between

one of them and persons and entitites referred to” [17]. Politeness

strategies fall under this general category (facework strategies are

partly a function of relationship [7]), but there are many other

language phenomena which also fit, including honorifics and

forms of address. Various types of relationship can be

grammaticalized differently in different languages, including

whether the relationship is between the speaker and hearer as

referent, between the speaker and hearer when referring to

another person or entity, between the speaker and bystanders, or

based on type of kinship relation, clan membership, or relative

rank [17]. One of the most cited examples of this is the tu/vous

distinction in French and other languages. For exmaple, Laver

encoded the rules for forms of address and greeting and parting in

English as a (partial) function of the social relationship between

the interlocutors, with titles ranging from professional forms (“Dr.

Smith”) to first names (“Joe”) and greetings ranging from a

simple “Hello” to the more formal “Good Morning”, etc [16].

Forms of language may not only reflect existent relational status,

but may be used to negotiate changes in the relationship, by

simply using language forms that are congruent with the desired

relationship. Lim observed that partners may change their

facework strategies in order to effect changes in the relationship

[18]. And, according to Svennevig:

The language forms used are seen as reflecting a

certain type of relationship between the interlocutors.

Cues may be used strategically so that they do not

merely reflect, but actively define or redefine the

relationship. The positive politeness strategies may

thus … contribute to strengthening or developing the

solidarity, familiarity and affective bonds between the

interactants. The focus is here shifted from

maintaining the relational equilibrium toward setting

and changing the values on the distance parameter

(Svennevig, 1999, pg. 46-47).

In terms of nonverbal behavior, the most consistent finding in this

area is that the use of nonverbal "immediacy behaviors"—close

conversational distance, direct body and facial orientation,

forward lean, increased and direct gaze, smiling, pleasant facial

expressions and facial animation in general, nodding, frequent

gesturing and postural openness—projects liking for the other and

engagement in the interaction, and is correlated with increased

solidarity (perception of “like-mindedness”) [2,19]. Other

nonverbal aspects of "warmth" include kinesic behaviors such as

head tilts, bodily relaxation, lack of random movement, open

body positions, and postural mirroring and vocalic behaviors such

as more variation in pitch, amplitude, duration and tempo,

reinforcing interjections such as "uh-huh" and "mm-hmmm",

greater fluency, warmth, pleasantness, expressiveness, and clarity

and smoother turn-taking [1]. Research on the verbal and

nonverbal cues associated with conversational “rapport” has also

been investigated [8,25]..

4. CURRENT IMPLEMENTATION The health counseling agents we are currently developing (e.g.,

[5]) use the BEAT text-to-embodied-speech translator [9]. However, the concepts and features we have implemented within

BEAT would map equally well to FML.

4.1 Interpersonal Stance Functions As discussed above, one of the most consistent findings in the

area of interpersonal attitude is that immediacy behaviors—close

conversational distance, direct body and facial orientation,

forward lean, increased and direct gaze, smiling, pleasant facial

expressions and facial animation in general, nodding, and

frequent gesturing—demonstrate warmth and liking for one’s

interlocutor and engagement in the conversation. BEAT was

extended so that these cues would be generated based on whether

the agent’s attitude towards the user was relatively neutral or

relatively warm.

Since BEAT is designed to over-generate, and produce nonverbal

behaviors at every point in an utterance that is sanctioned by

theory, attitudes are effected primarily by reducing the number of

suggested nonverbal behaviors, as appropriate. For example, in a

warm stance (high immediacy), fewer gaze away suggestions are

generated, resulting in increased gaze at the interlocutor, whereas,

in the neutral stance (low immediacy), fewer facial animation

(eyebrow raises and headnods) and hand gesture

Table 1. Effects of Stance and Frame on Nonverbal Behavior.

Frequencies are relative to baseline BEAT behavior.

Proximity of 0.0 is a full body shot (most distant); 1.0 is a close up shot on the face.

Relational Stance

Frame High Immediacy

(Warm)

Low Immediacy

(Neutral)

TASK Proximity=0.2

Neutral facial expression

Less frequent gaze aways

Proximity=0.0

Neutral facial expression

Less frequent gestures

Less frequent headnods

Less frequent brow flashes

SOCIAL Proximity=0.2

Smiling facial expression


Proximity=0.0





EMPATHY Proximity=1.0

Concerned facial expression

Slower speech rate


Proximity=0.5

Concerned facial expression

Slower speech rate




ENCOURAGE Proximity=0.5



Proximity=0.1





Low Immediacy

Task Frame

High Immediacy

Empathy Frame

Low Immediacy

Encourage Frame

High Immediacy

Task Frame

High Immediacy

Encourage Frame

High Immediacy

Social Frame

Low Immediacy

Task Frame

High Immediacy

Empathy Frame

Low Immediacy

Encourage Frame

High Immediacy

Task Frame

High Immediacy

Encourage Frame

High Immediacy

Social Frame

Figure 1. Example Effects of Stance and Frame on Proximity and Facial Expression for

the “Laura” Health Counseling Agent

Low Immediacy

Task Frame

High Immediacy

Empathy Frame

Low Immediacy

Encourage Frame

High Immediacy

Task Frame

High Immediacy

Encourage Frame

High Immediacy

Social Frame

Low Immediacy

Task Frame

High Immediacy

Empathy Frame

Low Immediacy

Encourage Frame

High Immediacy

Task Frame

High Immediacy

Encourage Frame

High Immediacy

Social Frame

Figure 1. Example Effects of Stance and Frame on Proximity and Facial Expression for

the “Laura” Health Counseling Agent

suggestions are generated. Such cues that are encoded through

relative frequency of behavior are currently implemented by

means of a StanceManager module which tracks the relational

stance for the current utterance being processed, and is consulted

by the relevant behavior generators at the time they consider

suggesting a new behavior. Centralizing this function in a new

module was important for coordination—since attitude (and

emotion in general) affects all behaviors systemically.

Modifications to baseline BEAT behavior were made at the

generation stage rather than the filtering stage, since at least some

of the behaviors of interest (e.g., eyebrow raises) are generated in

pairs and it makes no sense to filter out a gaze away suggestion

without also filtering out its accompanying gaze towards

suggestion.

Relational stance affects not only whether certain nonverbal

behaviors occur (i.e. their frequency), but the manner in which

they occur. To handle this, the behavior generation module

consults the StanceManager at animation compilation time to get

a list of modifications that should be applied to the animation to

encode manner (the “adverbs” of behavior). Currently, only

proximity cues are implemented in this way, by simply mapping

the current relational stance to a baseline proximity (camera shot)

for the agent, however, in general these modifications should be

applied across the board to all aspects of nonverbal behavior and

intonation (ultimately using some kind of animation blending, as

in [20]).

Currently, interpersonal stance is indicated functionally via an

attribute in the root-level UTTERANCE tag that simply specifies

what the relational stance is for the current utterance being

generated. For example:

<UTTERANCE STANCE=”WARM”>Hi there.</UTTERANCE>

The generators for gaze, gesture, headnods, and eyebrow

movement consult the StanceManager at the time they are about

to suggest their respective behaviors, and the StanceManager tells

them whether they can proceed with generation or not.

4.2 Framing Functions As mentioned above, people clearly act differently when

they are gossiping than when they are conducting a job interview,

not only in the content of their speech but in their entire manner,

with many of these “contextualization cues” encoded in

intonation, facial expression and other nonverbal and paraverbal

behavior.

Contextualization cues are currently implemented in the

StanceManager. Conversational frames are marked in the input

text using XML tags, such as the following:

<UTTERANCE><EMPATHY/>Sorry to hear that you’re stressed out.</EMPATHY></UTTERANCE>

During translation of the utterance into “embodied speech”, the

behavior generation module keeps track of the current frame and

when it detects a change in frame it consults the StanceManager

for the animation instructions which encode the requisite

contextualization cues. We have implemented four conversational

frames for our health counseling agents, based on empirical

studies of human counselor-patient interactions: TASK (for

information exchange), SOCIAL (for social chat and small talk

interactions), EMPATHY (for comforting interactions), and

ENCOURAGE (for coaching, motivating and cheering up

interactions).

4.3 Combined Influence The interpersonal stance and conversational frame specifications

are combined within the StanceManager to yield a final set of

modifications to behavior generation and animation modulation,

as shown in Table 1. Figure 1 shows several examples of the

effects of stance and frame on proximity and facial expression.

For example, in the high immediacy, ENCOURAGE frame

condition (lower left cell of Table 1) the agent is displayed in a

medium shot (half way between a wide, full body shot and a close

up shot), has a smiling facial expression, and does 50% fewer

gaze aways than the default BEAT behavior (thereby spending

more time looking at the user). Most of the parameters specified

in Table 1 are design-based (i.e. ad hoc) and ultimately need to be

grounded in human behavior from relevant empirical studies. In

addition, more general and principled methods for combining the

influence of such functional specifications need to be developed.

5. SUMMARY As our agents leave the confined world of information-deliverers,

they will need the ability to signal the kinds of interactions they

are initiating with users and the level of relationship they are

expecting to participate in. The representation of conversational

frame and interpersonal stance are thus important elements of a

Functional Markup Language for current and future

conversational agents.

6. REFERENCES [1] Andersen, P. and Guerrero, L. 1998. The Bright Side of

Relational Communication: Interpersonal Warmth as a

Social Emotion. In P. Andersen and L. Guerrero, Eds.

Handbook of Communication and Emotion. Academic Press,

New York, pp. 303-329.

[2] Argyle, M. 1988 Bodily Communication. Methuen & Co.

Ltd, New York.

[3] Auer, P. and Luzio, A. d. 1992 The Contextualization of

Language. John Benjamins Publishing, Philadelphia.

[4] Bateson, G. 1954 A theory of play and fantasy. Steps to an

ecology of mind. Ballantine, New York.

[5] Bickmore, T. and Pfeiffer, L. 2008. Relational Agents for

Antipsychotic Medication Adherence CHI'08 Workshop on

Technology in Mental Health.

[6] Bickmore, T. and Picard, R. 2005. Establishing and

Maintaining Long-Term Human-Computer Relationships.

ACM Transactions on Computer Human Interaction. 12, 2,

293-327.

[7] Brown, P. and Levinson, S. C. 1987 Politeness: Some

universals in language usage. Cambridge University Press,

Cambridge.

[8] Cassell, J., Gill, A., and Tepper, P. 2007. Coordination in

Conversation and Rapport. ACL Workshop on Embodied

Natural Language, pp. 40-50.

[9] Cassell, J., Vilhjálmsson, H., and Bickmore, T. 2001. BEAT:

The Behavior Expression Animation Toolkit. SIGGRAPH

'01, pp. 477-486.

[10] Ervin-Tripp, S. and Kuntay, A. 1997. The Occasioning and

Structure of Conversational Stories. In T. Givon, Ed.,

Conversation: Cognitive, communicative and social

perspectives. John Benjamins, Philadelphia, pp. 133-166.

[11] Fillmore, C. 1975. Pragmatics and the description of

discourse. In P. Cole, Ed., Radical pragmatics. Academic

Press, New York, pp. 143-166.

[12] Grosz, B. and Kraus, S. The Evolution of SharedPlans. In A.

Rao and M. Wooldridge, Eds. Foundations and Theories of

Rational Agency.

[13] Grosz, B. and Sidner, C. 1986. Attention, Intentions, and the

Structure of Discourse. Computational Linguistics. 12, 3,

175-204.

[14] Gumperz, J. 1977. Sociocultural Knowledge in

Conversational Inference. In M. Saville-Troike, Ed.,

Linguistics and Anthroplogy. Georgetown University Press,

Washington DC, pp. 191-211.

[15] Jefferson, G. 1978. Sequential aspects of storytelling in

conversation. In J. Schenkein, Ed., Studies in the

organization of conversational interaction. Academic Press,

New York, pp. 219-248.

[16] Laver, J. 1981. Linguistic routines and politeness in greeting

and parting. In F. Coulmas, Ed., Conversational routine.

Mouton, The Hague, pp. 289-304.

[17] Levinson, S. C. 1983 Pragmatics. Cambridge University

Press, Cambridge.

[18] Lim, T. 1994. Facework and Interpersonal Relationships. In

S. Ting-Toomey, Ed., The challenge of facework: Cross-

cultural and interpersonal issues. State University of New

York Press, Albany, NY, pp. 209-229.

[19] Richmond, V. and McCroskey, J. 1995. Immediacy. In

Nonverbal Behavior in Interpersonal Relations. Allyn &

Bacon, Boston, pp. 195-217.

[20] Rose, C., Bodenheimer, B., and Cohen, M. 1998. Verbs and

Adverbs: Multidimensional motion interpolation using radial

basis functions. IEEE CGAA(?). Fall,

[21] Schneider, K. P. 1988 Small Talk: Analysing Phatic

Discourse. Hitzeroth, Marburg.

[22] Sigman, S. J. 1983. Some Multiple Constraints Placed on

Conversational Topics. In R. T. Craig and K. Tracy, Eds.

Conversational Coherence: Form, Structure and Strategy.

Sage Publications, Beverly Hills, pp. 174-195.

[23] Svennevig, J. 1999 Getting Acquainted in Conversation.

John Benjamins, Philadephia.

[24] Tannen, D. 1993. What's in a Frame? Surface Evidence for

Underlying Expectations. In D. Tannen, Ed., Framing in

Discourse. Oxford University Press, New York, pp. 14-56.

[25] Tickle-Degnern, L. and Rosenthal, R. 1990. The Nature of

Rapport and Its Nonverbal Correlates. Psychological Inquiry.

1, 4, 285-293.

[26] Traum, D. and Allen, J. 1994. Discourse Obligations in

Dialogue Processing. ACL '94.

(

(

Modular definition of multimodal ECA communication acts to improve dialogue robustness and depth of intention

Alvaro Hernández, Beatriz López, David Pardo, Raúl Santos,

Luis Hernández

Signal, Systems and Radio communications Department, Universidad Politécnica de Madrid

(UPM) - Madrid, 28040, Spain [email protected]

José Relaño, Mª Carmen Rodríguez

Telefónica I+D, Spain [email protected]

Abstract

In this paper we propose a modular structure to define communication acts with verbal and nonverbal elements inspired in the SAIBA model. Our modular structure is a conceptual interpretation of the functional features of a multimodal interaction platform we have developed, with an embodied conversational agent (ECA) that implements verbal and gestural communication strategies aimed at minimising robustness and fluency problems that are typically encountered in spoken language dialogue systems (SLDSs). We conclude that it is useful to add a pre-verbal level on top of the FML-BML scheme in the SAIBA framework, and we propose a category extension for FML to account for communication elements that have to do with the speaker’s non-declared intentions.

Keywords: Embodied conversational agent, communication act, intentions, literality, multimodality, dialogue robustness, functional markup language.

1 INTRODUCTION The increasing presence of spoken language dialogue systems and embodied conversational agents on the interfaces of new “in-home” and videotelephony digital services is bringing to the fore a number of typical problems with dialogue robustness and fluency [1] as well as new ones related to the increasingly multimodal character of these systems. In our research efforts we are paying particular attention to the effects of using visual communication channels, in particular attaching a human-like animated figure to an SLDS (thus upgrading it to an ECA) [2], with a view not only to enrich the overall communication act, but also to improve dialogue flow. A few potential benefits ECAs offer the dialogue, such as increased efficiency in turn management ([3], [4]) and better error recovery ([5], [6], [7]), have already been identified by various leading authors in the field ([8], [9], [10], [11]).

According to Poggi [12], an important thing to bear in mind when designing a system that features a conversational agent with expressive communication abilities is to define how the ECA’s acts of communication are constructed as coordinated verbal and nonverbal messages. This is currently a hot area of research, and a most noteworthy effort is that behind the SAIBA framework [13] to define and standardize ECA verbal and gestural communication.

In this paper we propose a modular structure to define communication acts with verbal and nonverbal elements. Our proposal emerges from efforts to conceptually adapt the definition of an ECA engine we have developed to be used in a variety of application domains to a SAIBA-like modular “model.” As a result we suggest adding a communication definition level above that defined in FML, and we propose a category expansion in FML to express a certain kind of communication intentions, as we shall see in Section 2.

The paper is structured as follows: In Section 2 we describe our modular approach to forming ECA communication acts. In Section 3 we propose an adaptation of our communication act generation scheme to FML, expanding the latter as we have seen appropriate. Finally, in Section 4 we sum up the main points of what our approach offers.

2 SYSTEM STRUCTURE

Overall, a multimodal interaction system can be thought of as a black box that receives input information from the user through a variety of modes of interaction and produces an information output, also choosing a combination of interaction modes, as a reaction to the input. How the system actually reacts, precisely what information is provided as output and how it is provided will depend on a variety of contextual parameters, most importantly, of course, on the application that motivates the interaction and the associated communication goals (by which, here, we mean the major goals directly related to the overall object of the interaction, not the message-bound communication goals that may exist at any particular moment during the interaction). Indeed, we believe that contextual parameters, and not merely the user’s input, affect the communication goals themselves, and therefore should be taken into account when designing the structure of the multimodal output generation system.

2.1 Interaction scenarios

We have designed our conceptual multimodal interaction platform to be flexible enough to be used for a variety of purposes. In particular, we have implemented three distinct (though combinable) functions. The first is to handle a spoken dialogue-driven application to control household devices with user speech as the sole input and a human-like avatar producing speech and gestures (with body and face) as output ([2], [14]). The second is to provide a virtual “companion” (which is being developed within the activities of the COMPANIONS project [15]) with multimodal interaction capability. The idea of “companion” is for it to be a sort of virtual agent that “knows” the user enough to make suggestions in a variety of areas such as what film to watch or what to cook for dinner. Interaction here is more flexible: as input we may have any combination of speech, text, and taps and strokes on a touch-screen, with the user’s physical position in the room as a context parameter. The third scenario is a spoken dialogue-driven biometrics system for secure access to a variety of services and applications such as the two previously mentioned. Here the input is the user’s speech and the output is provided by the ECA’s speech and gestures, just as in the household device control scenario.

2.2 Main interaction modules

The multimodal interaction platform we have conceived in order to handle multimodal expression of communication goals conceptually has three main internal modules, as shown in Figure 1: the interaction manager, the phrase generator and the behaviour generator.

2.2.1 The interaction manager

First the user’s input, which may be simultaneously or sequentially multimodal (users may use speech, text, and haptic modes of interaction), is captured and fed to the interaction manager module, together with the contextual parameters that are to be taken into account. These include the following:

• Knowledge of the particular interaction scenario and application in which the current interaction is framed. • Knowledge of the user’s current interaction capabilities. For instance, the user’s voice may be different than

usual, or he/she might be having difficulty in speaking, in which case it may be wise to adopt an interaction strategy that requires the user to speak as little as possible.

• Knowledge of the user’s emotional state. For instance, it is useful to know whether the user is getting frustrated (by analysing the physical characteristics of the user’s voice, his/her choice of words or the flow of the dialogue) since such knowledge may be used to put together specific interaction strategies (for instance to try to reduce user frustration), which will generally have a gestural component.

Taking this information the interaction manager analyses the specific situation the interaction is in at each particular moment and produces a communication intention base (CIB). The CIB defines the ECA’s response on a pre-verbal level. It is composed of three elements: interaction control, open discourse and non-declared intentions.

• The interaction control element defines turn management (i.e., turn offering, turn giving, turn requesting and turn taking) and theme structure (i.e., how the various content elements will be put together following a discourse strategy).

• The open discourse (or open communication) element deals with the literal meaning the system wishes to communicate. In other words, it defines what is to be communicated in words.

• The non-declared communication element describes an intention –with regard to the system’s interaction goals– behind the literal meaning of the message. Note that there may be hidden intentions that are quite departed from the literal meaning communicated!

Both the open and the non-declared parts may have verbal and gestural components, but in the actual system we have designed the interaction manager module produces a simplified CIB in which the open communication level is associated solely with a verbal intention (an abstract representation of a meaning that the next module, the phrase generator, will put into words) and the non-declared level sets the ECA’s general attitude and emotional response, as well as certain intentions or goals that the gestures ultimately performed should aim to achieve.

Here is a brief example to clarify these ideas. Suppose we are using the biometric access application and trying to verify our identity through voice recognition, and suppose the system doesn’t positively recognize us at the first attempt, after providing the system with a sample of our voice (by answering a question the system asked us, for example). If we are told that there’s been a verification failure we are likely to become somewhat frustrated and perhaps anxious, which besides being undesirable states of mind in themselves may also affect our voice in further attempts making it decreasingly likely that the system will manage to recognise us. In such a situation the best thing the system can do is to have us provide new voice samples while trying to ensure good sample quality by getting us to speak calmly. A possible strategy the interaction manager might adopt to achieve this is, firstly, to hide from us the fact that a recognition failure has occurred and make us speak again as if it were all part of the normal verification process (of course, experienced users might realise that something odd is going on!), and secondly, to make the verbal request with a certain attitude (e.g., calmness) and complement it with an ECA gesture sequence designed with getting the speaker to focus and stay calm in mind. These general attitudinal and general gestural indications would constitute the non-declared part of the CIB provided by the interaction manager when adopting this specific interaction strategy, while the system’s explicit verbal intention would constitute the open part.

Note that the actual gesture sequence corresponding to the general gestural indications given by the interaction manager will be determined later. This provides flexibility, very much in the FML-BML vein, to allow for pursuing the same goal (gestural intention) with different gesture sequences as the context or culture may require.

2.2.2 The phrase generator

The communication intention base generated by the interaction manager is passed on to the phrase generator. Admittedly, this is something of a misnomer because the function of this module is not only to generate phrases but also to tag the text with specific gestural indications (as opposed to the general gestural indications of the non-declared part of the CIB). We call the collection of tags pointing to their corresponding places in the text the behaviour descriptor. As for the verbal intention conveyed in the CIB, it is now converted into text in the form of a set of successive linguistic units.

The output of the phrase generator is, thus, functionally similar to FML, although our implementation doesn’t follow any FML specifications. We may stress two peculiarities of our approach, however:

1. The flow of the interaction is entirely determined by the interaction manager. Hence, although all tags are introduced at this stage (since no text exists before), the composition of interaction-level tags, such as those related to turn management for instance, is already established in the CIB.

2. For our scheme to be in greater harmony with FML, the latter should include tag categories corresponding to our non-declared communication level, so that this information may be carried down with all the rest to the gesture implementation stage. In Section 3 we explain these tag categories in greater detail.

2.2.3 The behaviour generator

Finally, the behaviour generator concatenates the linguistic units produced by the phrase generator and translates the gesture tag structure into specific ECA body gestures and facial expressions to be performed in synchrony with the verbal rendition of the text. Thus the ECA’s behaviour is assembled.

2.3 Example: acknowledgement of misunderstanding

In order to better illustrate the functional flow of the proposed modular structure of ECA verbal + nonverbal communication act generation, we present an example of dialogue a dialogue strategy for use when the system realises it previously misunderstood the user. We describe the overall interaction scenario, and briefly sketch the situation, context of interaction, response strategy and output ultimately offered to the user.

Motivating situation: A critical situation arises when the system fails to correctly understand something the user has said, more precisely, when the system believes it has understood the user’s utterance, but in fact the user has said something else. If the user tries to tell the system that it has misunderstood, or if he/she tries to correct the misunderstanding by repeating or rephrasing, the system will (hopefully!) realise what has happened. This is crucial since in such situations (especially if they occur, as they usually do, fairly often) there is a risk of the user losing confidence in the system’s capabilities and becoming irritated, thus making it more difficult for the system to understand his/her subsequent utterances. This is one reason why error cycles often ensue, and it is important for the system to try to cut them short by adopting an adequate strategy.

Scenario: John wants to cook a Spanish omelette. He doesn’t remember what the measures for the ingredients are, so he asks the cooking assistant to give him the recipe. The fume extractor is on, and in this noisy environment the speech engine recognizes ‘spaghetti.’ The system replies following an explicit confirmation strategy and asks John whether indeed he wants to prepare spaghetti. To this John simply answers “No.”

Response strategy and generation of the corresponding ECA communication act: When the Interaction Manager realizes there has been a speech recognition error, the immediate objective is to try to keep the user in a positive attitude while moving on with the interaction. Taking into account the relevant context variables, it decides to implement a communication act in two parts: the first is to show “remorse” for having misunderstood, the second to encourage to the user to repeat the utterance trying not to annoy him/her. (The system needs the user to repeat the utterance since in this case he/she has only indicated that there was a misunderstanding without correcting the information at the same time, which would call for a different response strategy, such as implicit or explicit confirmation). These guidelines are passed on to the Phrase Generator, which then selects the suitable linguistic units and the associated ECA behaviour descriptors. In this case, the phrases generated might be “I am sorry” and “Could you repeat again?”, and Remorse and Interest their respective behaviour descriptors (tags). Finally, the Behaviour Generator translates these descriptors into specific gesture instructions (movements) for the ECA to perform. A detailed description of the ECA’s two-part verbal + gestural communication act is shown in Figure 2.

Figure 2: Text and ECA behaviour assembly for an acknowledgement of misunderstanding situation

If we were to adapt this example to an implementation scheme based on FML, we would find that no further tags are be needed beyond regular FML to implement the non-declared part of the CIB. It is seems reasonable that both “remorse” and “showing interest” (which stem directly from the non-declared intention of keeping the user in a calm and positive frame of mind) can and should be implemented using literal, text-based tags. Section 3 will make this statement clear. In it we hope to identify the need for new sets of tags that extend the regular text-based FML.

3 ADAPTING TO (AN EXTENDED) FML

In the previous section we described how we define the message at the pre-textual level using a structure we call the communication intention base (which is the output of the interaction manager module). Now we consider how the information contained in the CIB could be carried down into an FML structure. In other words, we describe what the FML structure would look like if we want it to carry through the information determined in the CIB. Figure 1 (a) succinctly illustrates how our ECA’s verbal and gestural behaviour is put together (FML is not used). Figure 2 (b), in contrast, shows the links we propose between the CIB and the FML level.

(a) (b)

Figure 1. Description of the modular verbal and gestural communication act generation: (a) Implemented system; (b) Correspondence between the CIB and FML.

The open discourse part of the CIB is the primary basis on which the textual message is formed together with all

the ECA gesture tags directly associated with the text (for instance, marks to emphasise particular words). In this sense the tags are literal: they express the text through gesture. Non-declared intentions could also modulate the gestures attached to the verbal message, the literal tags, partly determining how the ECA says a message (mainly in order to achieve a certain effect such as influencing the user’s response). For instance, continuing with the example introduced in the previous section, if we want the user to stay calm and unaware of an error we might choose to make the ECA smile while emphasising a particular part of its utterance. Both the smile and the emphasis can be expressed with regular “literal” FML tags like “emphasise” and “affect,” but either their presence or the nature of the smile are influenced by the non-declared intentions.

The most important function of the non-declared intentions, however, would be to determine a non-literal communication level that defines behaviour that is not directly related to a verbal message. This allows two things: a) defining “text-free” ECA behaviour (i.e., gesturing without saying anything); and b) while speaking, displaying a behaviour that is overlapped with the expression of the verbal message, but is semantically independent of it (or at least not directly related to it). In this sense it “overarches” the text-based message.

Allowing text-free behaviour could be useful to specify ECA behaviour during the user’s turn. This could be a waiting gesture if the user remains silent, for instance. By text-free we mean “lacking a verbal basis from the ECA.” However, in especially advanced systems ECA-text-free behaviour could be dependent on the verbal message from the user, as a reaction to it (thus, user-text-dependent). This allows introducing gestural reactions to what the user is saying, while he or she is still speaking.

The interaction control element in the CIB can be carried through in three different ways. Taking turn-giving cues, for instance, they could be defined a) via a literal interactional tag associated with some verbal indication that the ECA wants to give the turn to the user; b) via a non-literal tag overarching a verbal message that has nothing to do with turn management (the ECA gives visual cues to invite the user to speak, while finishing (overarching) an utterance to give the user several response options); or c) via text-free tags (the ECA performs a turn-giving gesture without saying anything).

In the following subsection we will see a few more examples to illustrate these ideas. We now very briefly propose a small set of tags belonging to the non-literal category that expands the regular “literal” FML set of tags.

3.1 Non-literal tags

As mentioned above, we introduce non-literal tags to define behaviour that is not directly related to the verbal message, but may be superimposed on it. Such behaviour may be useful in pursuing intentions that are hidden or not openly declared to the interlocutor. We propose three non-literal tags: empathy, knowledge and persuasion.

Empathy: This tag defines the ECA’s attitude toward the user (kind and understanding, or aggressive, for example). Its main attributes are valence and level:

• Valence values: positive, neutral or negative. • Level: strong, medium, weak.

Knowledge: Deals both with information related to dialogue management that the ECA wants to suggest but not put in words (because this can improve the fluency of the dialogue), and with deceiving or hiding information from the user. We distinguish the following types of non-literal knowledge tags:

• Dialogue stage: defines behavioural cues that may clarify for the user the stage the dialog is in. Examples: If the system is waiting for the user to say or do something, this tag could be used to indicate the generation of a waiting gesture. Behavioural sequences could be defined for dialogue initiation and termination. The ECA could indicate through gestures that it wants to give the turn to the user (as we discussed earlier) or that it wants to take the turn from the user.

• Recognition confidence: defines what sort of visible reactions the ECA should perform to show the user how confident the system is that it is correctly understanding a user’s utterance. This behavioural stance would be superimposed on whatever gestures are performed to express the verbal message the ECA is giving at the time. Another option would be to introduce it in the user’s turn, while he or she is speaking (an instance of user-text-dependent reactive behaviour). Type: high, intermediate, low. (For instance, an indication to perform low confidence cues at the FML level, could be translated at the BML level into a leaning of the head and squinting.)

• Manipulation: to manipulate the information offered to the user. We may distinguish four subtypes of manipulation tags:

o Conceal: to hide information from the user. For instance, hiding the fact that recognition has failed in order to maintain the user’s trust (and then trying to obtain the correct information further along in the dialogue).

o Focus: to draw the user’s attention to, or away from, certain facts the system has to tell the user. o Deceive: to try to make the user believe something that is false. (The difference with outright lying

is that deceit is done through gestural behaviour, not saying something that isn’t true (the latter could be implemented with ordinary literal tags).

Persuasion: Marks behaviour to influence the user in order to persuade him or her to do something, or to do it in a certain manner. There can be many types of persuasion marks. Here we propose two:

• Negotiation: defining behaviour to influence the user in a negotiation. For instance, if in a cooking recipe application we are trying to persuade the user not to take a desert, we might want to perform gestures to put

the user off (perhaps involving an expression of disgust constructed in the BML stage) while the ECA says something on the literal level as innocent as “Are you sure you want to eat that?” The corresponding extended FML section could read:

• Speech: influencing the way the user says something. A relatively simple attribute could be “rhythm”, to try to get the user to follow a certain rhythmic pace when speaking.

3.2 Playing with other interface modes

Beyond the literal and non-literal tags we could think of yet another category to take into account other nonverbal information that may affect the interaction. An example such information would be the presentation of pictures onscreen alongside the ECA. It would be interesting to be able to conveniently define how the ECA reacts to (or uses) these other information sources.

We can think of interaction examples that involve all three kinds of “FML” tags. Take, for instance, the system’s response to the user’s request for a list of recipes containing carrots. A list (perhaps with pictures) might then appear onscreen, and the ECA could start reading it. This would involve gesturing on the literal level (adding expression to the reading of the recipes). Performing deictic gestures to point successively at the different elements of the list the ECA is mentioning would involve tags that link the ECA’s verbal and gestural behaviour to other interaction elements (in this case in visual/textual modes). Finally, non-literal knowledge manipulation tags might be introduced to implement behaviour to draw the user’s attention away from the item “carrot cake” (perhaps by turning the head and looking uninterested) which the system knows is too calorie-laden but which it must mention because it’s on the list requested.

It is, of course, debatable whether such extra-conversational mode driven tags are really needed, or indeed whether any non-literal tags are needed for that matter, or if, on the contrary, everything in the CIB could be translated into gestures tightly synchronised with the verbal message, as specified in ordinary FML. We believe, however, that at the very least the new tag categories we propose add a considerable degree of conceptual clarity.

4 DISCUSSION The ECA communication act generation process described in this paper is part of an effort we are undertaking to design natural, smart and multimodal human-computer interfaces. We have identified a number of correspondences between our scheme and that of the SAIBA framework. We believe the latter provides a clear conceptual demarcation of communication act parts and generation phases, and one that provides considerable implementation flexibility. We are working to adapt our system so that it implements it more closely. Conversely, there are certain aspects which we identified when defining our system that we believe could represent a useful supplement to the SAIBA concept, especially as regards FML specification (as proposed in [16], for instance).

Firstly, in our work on dialogue robustness and fluency it has become clear that interaction-level information such as that relating to turn management and to how the different available modalities for expression are going to be used (ECA voice and gestures, text and other on-screen elements, etc.), should preferably be determined at an early stage that is close to when the next course of action is decided, and which comes before the ECA’s specific verbal intention is formed. Conceptually, this sort of information would belong in a supra-FML level of communication act formation. We suggest this supra-FML level should also include general verbal and gestural indications corresponding both to explicit and non-declared communication intentions, but which are text-independent (indeed, this is a pre-textual stage). Together, all these information elements constitute what we have called the

<persuasion type=“discourage”> <performative type=“enquiry”> Are you sure you want to eat <emphasise> that? </emphasise> </performative>

</persuasion>

communication intention base (CIB). When forming the CIB a number of contextual parameters should be taken into account (scenario, user capability, user preferences and emotional state, modalities in use, interaction history, etc.).

Secondly, we believe FML may be usefully enhanced by including categories to account for what we have called the non-declared communication level, which is the part of the communication act devised to achieve an interaction goal that is not explicitly declared to the interlocutor (for instance, a gesture strategy to produce an effect on, or to induce a certain reaction from, the interlocutor). The interaction goal in question may not only not be declared, but actually intended to remain hidden from the user, a possibility that would allow specifying that a gesture sequence should be implemented, for instance, in such a way as to deceive the interlocutor.

5 ACKNOWLEDGEMENTS This work was carried out with the support of the European Union IST FP6 program through the COMPANION project, IST-34434, and the support of the Spanish Ministry of Science and Technology under project TEC2006-13170-C02-01.

REFERENCES [1] Boyce, S. J., Spoken natural language dialogue systems: user interface issues for the future. In Human Factors

and Voice Interactive Systems. D. Gardner-Bonneau Ed. Norwell, Massachusetts, Kluwer Academic Publishers: 37-62, (1999).

[2] B. López, Á. Hernández, D. Díaz, R. Fernández, L. Hernández, and D. Torre, Design and validation of ECA gestures to improve dialogue system robustness, Workshop on Embodied Language Processing, in the 45th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 67-74, Prague 2007.

[3] T. Bickmore, J. Cassell, J. Van Kuppevelt, L. Dybkjaer, and N. Bernsen, (eds.), Natural, Intelligent and Effective Interaction with Multimodal Dialogue Systems, chapter Social Dialogue with Embodied Conversational Agents. Kluwer Academic, 2004.

[4] M. White, M. E. Foster, J. Oberlander, and A. Brown, Using facial feedback to enhance turn-taking in a multimodal dialogue system, Proceedings of HCI International 2005, Las Vegas, July 2005.

[5] S. Oviatt, and R. VanGent, Error resolution during multimodal humancomputer interaction, Proc. International Conference on Spoken Language Processing, 1 204-207, (1996).

[6] K. Hone, Animated Agents to reduce user frustration, in The 19th British HCI Group Annual Conference, Edinburgh, UK, 2005.

[7] S. Oviatt, M. MacEachern, and G. Levow, Predicting hyperarticulate speech during human-computer error resolution, Speech Communication, vol.24, 2, 1-23, (1998).

[8] Cassell J., Thorisson K.R., The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Applied Artificial Intelligence, vol.13, pp.519-538, (1999).

[9] Cassell, J. and Stone, M., Living Hand to Mouth: Psychological Theories about Speech and Gesture in Interactive Dialogue Systems. Proceedings of the AAAI 1999 Fall Symposium on Psychological Models of Communication in Collaborative Systems, pp. 34-42. November 5-7, North Falmouth, MA, 1999.

[10] Massaro, D. W., Cohen, M. M., Beskow, J., and Cole, R. A., Developing and evaluating conversational agents. In Embodied Conversational Agents MIT Press, Cambridge, MA, 287-318, (2000).

[11] S. Oviatt, Interface techniques for minimizing disfluent input to spoken language systems, in Proc. CHI'94, pp. 205-210, Boston, ACM Press, 1994.

[12] Poggi I., Pelachaud C., De Rosis F., “Eye communication in a conversational 3D synthetic agent”, AI Communications 13, 3 (2000), 169-182.

[13] SAIBA: http://wiki.mindmakers.org/projects:saiba:main/ [14] B. López Mencía, A. Hernández Trapote, D. Díaz Pardo de Vera, D. Torre Toledano, L. Hernández Gómez,

and E. López Gonzalo, "A Good Gesture: Exploring nonverbal communication for robust SLDSs," IV Jornadas en Tecnología del Habla, Zaragoza, Spain, 2006.

[15] COMPANION, European Commission Sixth Framework Programme Information Society Technologies Integrated Project IST-34434, http://www.companions-project.org/.

[16] Van Oijen J. “A Framework to support the influence of culture on nonverbal behaviour generation in embodied conversational agents” Master’s Thesis in Computer Science. HMI - University of Twente; ISI - University of Southern California. August 2007.

A Linguistic View on Functional Markup Languages

Dirk HeylenHuman Media Interaction

University of TwenteThe Netherlands

[email protected]

Mark ter MaatHuman Media Interaction

University of TwenteThe Netherlands

[email protected]

ABSTRACTWe make a start with an inventory of the functionality thata Functional Markup Language needs to cover by lookingat the literature on some forms of nonverbal communicationand discussing the functions these behaviours serve. Also, ananalysis of conversations as they are found in the linguisticand computational linguistic literature provides pointers tothe elements that need to be incorporated in FML.

1. INTRODUCTIONThe functional markup language (FML) as envisioned by

the SAIBA framework1 forms the interface between two lev-els in the planning of multimodal communicative behavioursof embodied conversational agents, viz. the CommunicativeIntent Planning and the Behavior Planning. Descriptions inFML represent communicative and expressive intent withoutany reference to physical behavior. “It is meant to providea semantic description that accounts for the aspects thatare relevant and influential in the planning of verbal andnonverbal behavior.”

In order to arrive at a list of required features of FML weneed to consider the following question. What determineswhat people do in conversations? This may be a more gen-eral question than asking what functions are served by thebehaviours that are displayed in a conversation. The notionof function suggests a purpose and an intent. What fac-tors influence which behaviour is carried out (whether it iscarried out and the specific form of action). What role dointentions play? Rather than talking about functions onecould use the more general term determinant as the factorsthat determine an action are not just conscious intentions.Many elements of communicative actions are not consciouslyintended but result from some automatic mechanism whichmay have arisen from habit (personal), or convention (so-cial/cultural).

If we look at functional aspects of behaviours, or determi-nants more general, we need to look at least at two things.One is the kind of functions that behaviours serve and sec-ond the way in which the behaviours serve a function. Wefirst illustrate this idea and make a start with making aninventory of functions by looking at the functions that havebeen ascribed to two kinds of nonverbal behaviours: variouskinds of head movements and gaze which we take from ourearlier discussion on this topic [4]. Secondly we will lookat some theoretical perspectives on the association of formand function of communicative behaviours. Finally, we look

1http://wiki.mindmakers.org/projects:saiba:main/.

at the architecture of current conversational systems to getanother view on the same subject.

2. THE FUNCTIONS OF NONVERBAL BE-HAVIOURS

In this section we look at two related behaviours, headmovements and gaze, and analyse the functions and deter-minants that have been assigned to particular patterns inconversation by di!erent researchers. Amongst others, headmovements may be used to:

(1) signal yes or no, (2) signal interest or im-patience, (3) enhance communicative attention,(4) anticipate an attempt to capture the floor,(5) signal the intention to continue, (6) expressinclusivity and intensification, (7) control and or-ganize the interaction, (8) mark the listing orpresenting of alternatives, (9) mark the contrastwith the immediately preceding utterances. Fur-thermore, synchrony of movements may (10) com-municate the degree of understanding, agreement,or support that a listener is experiencing. Greateractivity by the interviewer (e.g., head nodding)(11) indicates that the interviewer is more inter-ested in, or more emphatic toward, the intervie-wee, or that he otherwise values the intervieweemore. Head movements serve as (12) accompani-ments of the rhythmic aspects of speech. Typicalhead movement patterns can be observed mark-ing (13) uncertain statements and (14) lexical re-pairs. Postural head shifts (15) mark switchesbetween direct and indirect discourse.

From this list of communicative processes that head move-ments are involved in one can extract a first list of thekinds of ways in which head movements operate in com-municative settings. They signal, enhance, anticipate, ex-press, control, organize, mark, communicate, indicate, andaccompany. These terms are often used in di!erent senses.Some may be used as synonyms in certain contexts and asantonyms in others.

1. Signal, express, communicate are verbs that are mostlyused to express the fact that behaviours can carrymeaning in various ways. The precise meaning of theterms may depend on who is using them as there arevarious technical definitions of these terms. For in-stance, turning to the nominalised equivalents of the

terms, a signal may be used as a synonym of a sign(a form/meaning unit) by some people or only as thephysical realisation (the form).

2. Mark, reflect, indicate are verbs that are similar to theprevious ones in many respects but whereas the previ-ous ones can be paraphrased, in some of their uses, as“means”, this is not appropriate for these kinds.

3. Accompany is still a looser notion than mark, reflectand indicate. An indication is a phenomenon that ac-companies another in such a way that the indicationcan be taken as a clue that the other phenomenon oc-curs which need not be the case for an accompaniment.

The distinction in verbs expresses a distinction in whichthe behaviour signifies something or relates to some otherphenomenon. It is a di!erence in semiotics. Notions thatfigure in this are (1) a notion of intentionality and (2) thenature of the relation between the signifier and what is sig-nified. In the case of “communication”, intentionality on thepart of the producer a) to produce the behaviour is typicallyinvolved and b) the intention for it to mean something forsomeone else. In these cases the sign is a behaviour whosemeaning is shared between the partners that communicatewith each other, by convention, for instance. Another typi-cal relation, which holds for many so-called “cues”, is one ofcausality, comprising the Gricean natural signs.

For the specification of FML it will be useful to delimitmore precisely what one wants the language to cover. Ifone wants FML to account for all the determinants of be-haviours this will also have to include non-intentional fac-tors. On the other hand, if one fixes FML as the “intented”determinants, one will need to find another way to integratethe non-intentional factors in a complete model.

Besides the question of how behaviours provide informa-tion (intended or unintended), one also needs to look at thekinds of information the behaviours provide. For the headmovement functions above this amounts to the following.

1. Yes/No: these are equivalent to propositions or fullutterances (speech acts) or sentences in the linguisticsense (depending on whether one considers the seman-tic, pragmatic or syntactic equivalent).

2. Interest, impatience, attention, understanding, empa-thy, uncertainty: can all be qualified as mental states.

3. Floor-grabs deal with the way the interaction proceeds;the role in which one participates in the conversation.

4. Listing, alternatives, contrast, switches between directand indirect discourse are notions that can be groupedunder the heading of discourse structure: the way inwhich information parts connect.

5. Rhythmic accompaniments of speech are of a di!erentorder. It is not clear whether these accompanimentsare meant to mark the rhythm and thus the informa-tion structure of what is being said or that they facil-itate speaking as some have claimed.

6. Movement during repairs are a reflection of the cogni-tive processes involved in language generation.

The main categories along which the content of the headmovement expressions can be categorised are the following.

1. Equivalents of linguistic expressions.

2. Expressions of mental states.

3. Regulators of interaction/participation.

4. Rhythmic accompaniments.

To further complement this list we look at gaze behaviours.Gaze behaviour has been observed to play a role in (16) in-dicating addresseehood, (17) e!ecting turn transitions, and(18) the display of attentiveness. A typical gaze pattern oc-curs (19) when doing a word search. Gaze (20) may reflectthe social status. Looking away (21) is used to avoid dis-traction, to concentrate, (22) to indicate one doesn’t wantto be interrupted. One looks to the other in order (23) toget cues about mood and disposition of the other, (24) toestablish or maintain social contact. Gazing away (25) mayreflect hesitation, embarrassment or shyness. Furthermore,gaze is used (26) to locate referents in abstract space, and(27) to request listeners to provide backchannels.

The list of verbs indicating the “function” of gaze are:

1. e!ecting

2. displaying

3. occurs when

4. accompany

5. to get cues

6. establish or maintain (contact)

7. locate referents

8. request actions

This list of verbs adds interesting new perspectives to thelist of functions. Several verbs refer to a function of thebehaviour that involves the interlocutor in some way or an-other. Gaze can both be an expressive signal and function asa request for action by the interlocutor. The verbs such asto get cues, establish contact, request actions shows the in-tention of the user behind actions that are directed towardsengaging the action of others. These can be called elicitingactions.

The following lists the domains in which gaze behavioursare involved:

1. Participation (interaction): addressing, floor (conver-sation/interaction management)

2. Cognitive expression (mental states): attentiveness,word search, distraction avoidance, concentration, hes-itation

3. Social (interpersonal) relations: expression of socialstatus

4. Elicitation/Monitoring: to evoke and get cues aboutmood and disposition

5. Contact regulation (part of conversation/interactionmanagement)

To look at specific behaviours in conversation and listwhat is involved in their production is one way to cometo an understanding of what might need to be specified ina functional markup language. Another way is by reviewingthe theoretical literature on conversation. We make a startin the next section.

3. CONVERSATIONAL FUNCTIONSThe linguistics, sociological and psychological literature

on conversation – or interaction more general – is very di-verse. There is an abundance of theoretical positions withinthe various fields which results in many di!erent analyses ofthe same phenomena. For the specification of a functionalmarkup language it will be important to have clear defini-tions of terms, possibly referring to the linguistic traditionthat provides the correct context for a term.

When one tries to determine what motivates a commu-nicative act one should take into account that in most cases,conversation is not an end in itself but part of other jointactivities that people engage in. It is a means to an end: by-ing co!ee, proving your argument, comforting someone, etc.and it is often intrinsically connected to non-linguistic ac-tivities that accompany the speech act: handing over a cupof co!ee and saying “here you are”. Saying “Thank you” inreply is a ritual act that acknowledges the previous act andin particular the fact that the other has done something foryour benefit which you are grateful for. Actions in conver-sation can be said to accomplish three di!erent, interrelatedgoals (often at the same time): taking care that business isexecuted (task), that conversation runs smoothly (system)and that the proper interpersonal relationship is establishedor maintained (social/ritual).

• Task dimension: actions that accomplish the businessat hand

• System dimension: conversation acts performed to makeconversations work properly as conversations

• Ritual dimension: actions handling social commitmentsand obligations

Conversation is an activity that the participants engage intogether. Actions by one participant are intrinsically depen-dent on actions by the other. A linguistic utterance (speechact) is intended to be heard, understood and acted upon byanother. An utterance by one often leads to another by theother participant.

Since Austin [1], it is common to assume that a commu-nicative action can be viewed from di!erent levels (locution,illocution, perlocution). In Clark’s version [2], a speaker actson four levels. (1) A speaker executes a behaviour for the ad-dressee to attend to. This could be uttering a sentence butalso holding up your empty glass in a bar (to signal to thewaiter you want a refill). (2) The behaviour is presented asa signal that the addressee should identify as such. It shouldbe clear to the waiter that you are holding up the glass tosignal to him and not just because of some other reason. (3)The speaker signals something which the addressee shouldrecognize. (4) The speaker proposes a project for the ad-dressee to consider (believe what is being said, except theo!er, execute the command, for instance). In this formula-tion of levels, every action by the speaker is matched by anaction that the addressee is supposed to execute: attend tothe behaviour, identify it as a signal, interpret it correctlyand consider the request that is made.

So the producer of a communicative act, acts on di!erentlevels. For each of these di!erent levels the producer expectsa matching act of the intended recipient of the act. Besidesproducing the act, the producer will therefore also monitorthe recipient who will indicate (through orientation, gaze,

facial expressions, back-channels, and other actions) thathe has or has not been able to hear or see the action, tounderstand what was meant by it, and whether he will fol-low it up as intended or not. In short there are processesof production and monitoring going on in parallel that arecomplemented by processes of reception and feedback. Allof these actions that go on in parallel are associated withdi!erent but related intentions.

In the next section we zoom in at some of the systemconstraints to refine the inventory of functions.

4. SYSTEM CONSTRAINTSGo!man ([3]) lists several kinds of normative principles

that are helpful to ensure e!ective transmission in conver-sations such as the principle that constrains interruptions,or against simultaneous talk, against withholding answers,norms that oblige the use of “back-channel” cues, that en-courage the use of “hold” and “all-clear” cues if the hearer isnot able to attend temporarily, and norms to show whetheror not the message has been heard and understood imme-diately following the utterance. The latter are back-channelcues which consist of nods, facial gestures and nonverbal vo-calizations from hearers during the talk of a speaker thatinform him “among other things, that he was succeeding orfail to get across [...] while attempting to get across.” (page12). Requirements such as these Go!man groups under theheading of “system requirements” or “system constraints”.He provides the following preliminary list of requirements.

1. There has to be a two-way capability for transceiv-ing readily interpretable messages. Interlocutors mayengage in certain actions to establish this capability.

2. Back-channel feedback capabilities “for informing onreception while it is occurring”.

3. Contact signals involve actions such as signalling thesearch for an open channel, the availability of channel,the closing of a channel, etcetera.

4. Turnover signals regulate turn-taking.

5. Preemption signals are “means of inducing a rerun,holding o! channel requests, interrupting a talker inprogress”.

6. Framing capabilities indicate that a particular utter-ance is ironic, meant jokingly, “quoted”, etcetera.

7. Constraints regarding nonparticipants which should noteavesdrop or make competing noise.

The latter constraints follow from social norms. Besidesthese, Go!man also mentions social norms that oblige re-spondents to reply honestly in the manner of Grice’s con-versational maxim. Besides these system constraints thattell how individuals ought to handle themselves to ensuresmooth interaction, an additional set of constraints can beidentified “regarding how each individual ought to handlehimself with respect to each of the others, so that he notdiscredit his own tacit claim to good character or the tacitclaim of the others that they are persons of social worthwhose various forms of territoriality are to be respected.”These, Go!man calls “ritual constraints”.

Several of the system constraints refer to elements in in-teraction that were encountered above already, other arenew. For instance the notion of pre-emption signals, fram-ing and norms were not mentioned above as such, thoughthey are related to some phenomenon discussed above. Forinstance, quoting (the head movements that occur when aperson quotes someone else) can be taken as a kind of fram-ing.

A third way to approach the subject of FML is by look-ing at the computational literature on dialogue systems orconversational agents. This has often borrowed freely fromvarious theoretical models but has given this its own twist.So, what aspects of conversation do current versions of dia-logue systems take into account.

5. MODELING DIALOGUESConversational agents as instantiations of dialogue sys-

tems may deal with more or less of the aspects mentionedin the previous paragraph, depending on their complexity.We give a very short introduction to dialogue systems basedin part on [6]. A conversation manager (dialogue system) isconcerned with the following tasks.

• Interpret contributions to the dialogue as they are ob-served.

• Update the dialogue context.

• Select what to do: when to speak and what to say,when to stop speaking, when to give feedback and how.

In the Trindi conception of conversation management ([5]),the parameters that are relevant for processing are kept ina so-called “information state”. Next, there are all kindsmodules that update the information state (e!ects of dia-logue moves or dialogue inferencing) and modules that selectactions given a particular configuration of the informationstate. These actions may involve updates to the informationstate as well.

The various parameters of the information state can beclassified in a number of ways. Traum partitions the infor-mation state and dialogue moves into a set of layers “eachdealing with a coherent aspect of dialogue that is somewhatdistinct from other aspects”. The following are some of thelayers that are distinguished.

• Contact (whether and how individuals are accessiblefor communication)

• Attention (who is attending to what)

• Conversation (which comprises the following points)

1. Participants (who is there, in what role)

2. Turn (who has the turn, who is claiming it...)

3. Grounding (is the information added to the com-mon ground; an important function of feedback)

4. Topic

5. Rhetorical (the meaning relations that hold be-tween di!erent utterances or between clauses inone utterance)

• Social commitments (obligations)

• Task

One can immediately see that the terms used here overlapto a great extent with the terms used in the previous section.Grounding is a new term, but actions such as back-channelsmentioned in the previous sections typically serve the func-tion to acknowledge that a message has been received andunderstood. Topic management is also a new feature re-lated to the structure of information on a discourse level.The rhetorical structure refers to an element of discoursestructure as well.

6. SUMMARYThe specification of a functional markup language can

build upon the existing analysis of behaviours, theoreticalnotions introduced in several disciplines from the humani-ties and the architecture of current conversational systems.A thorough analysis is required leading to a common under-standing of the central notions.

This kind of analysis will have repercussions on the defini-tion of the SAIBA framework. The notion of function needsto be defined more clearly as there is quite some variation inthe way in which a behaviour can function in conversations.

As the above suggests a participant in a conversation mayproduce a communicative behaviour or a particular part ofthe behaviour with the intention:

• to establish that there is a channel open for communi-cation

• to have the interlocutor pay attention to the commu-nicative behaviour

• to have the interlocutor understand that one is per-forming an informative communicative behaviour

• to inform the interlocutor of something, i.e. to meansomething, to send a message (more or less by defini-tion of communicative behaviour it seems)

• to make clear in what way the message should be in-terpreted (framing)

• to engage the interlocutor in a project (speech act,perlocutionary intent, task)

• to establish that there is a channel open for communi-cation

• to change the participant status of self or other (turn-taking/floor)

• to have the message directed to someone in particular(addressing)

• to show the structure of the message (for reasons ofclarity, emphasis - information structure and discoursestructure)

• to show reception and understanding of and attitudetowards what the other interlocutor is trying to com-municate

• to establish a particular interpersonal relation with theinterlocutor

• to express social status

• to convey a particular impression (impression manage-ment)

• to display a particular mental state

• to hide a subjective state such as an emotion

Etcetera.

AcknowledgementThe research leading to these results has received fundingfrom the European Community’s Seventh Framework Pro-gramme (FP7/2007-2013) under grant agreement number211486 (SEMAINE).

7. REFERENCES[1] Austin, J.A.: How to Do Things with Words. Oxford

University Press, London (1962)[2] Clark, H.H.: Using Language. Cambridge University

Press, Cambridge (1996)[3] Go!man, E.: Replies and responses. Language in

Society 5(3), 2257–313 (1976)[4] Heylen, D.: Head gestures, gaze and the principles of

conversational structure. International Journal ofHumanoid Robotics 3(3), 241–267 (2006)

[5] Traum, D., Larsson, S.: The informaiton stateapproach to dialogue management. In: J. vanKuppevelt, R. Smith (eds.) Current and New Directionsin Discourse and Dialogue, pp. 325–335. Kluwer (2003)

[6] Traum, D., Swartout, W., Gratch, J., Marsella, S.: Avirtual human dialogue model for non-team interaction(to appear)

Functions of Speaking and Acting -

An Interaction Model for Collaborative Construction Tasks

Stefan Kopp & Nadine Pfeiffer-Leßmann

Artificial Intelligence Group

Faculty of Technology, Bielefeld University

{skopp, nlessman}@techfak.uni-bielefeld.de

Abstract. This paper describes how a virtual agent assists a huam interlocutor in collaborative

construction tasks by combining manipulative capabilities for assembly actions with conversational

capabilities for mixed-initiative dialogue. We present an interaction model representing the evolving

information states of the participants in this interaction. It includes multiple dimensions along which

an interaction move in general can be functional, independent of the concrete communicative or

manipulative behaviors comprised. These functions are used in our model to interpret and plan the

contributions that both interactants make.

1! Introduction

Virtual humanoid agents offer an exciting potential for interactive Virtual Reality (VR). Cohabiting a virtual

environment with their human interlocutors, virtual agents may ultimately appear as equal partners that share the

very situation with their partner and can collaborate on or assist in any task to be carried out. To investigate such a

scenario the embodied agent Max (Kopp, Jung, Leßmann, Wachsmuth, 2003) is visualized in human-size in a

CAVE-like VR environment where he joins a human in assembling complex aggregates out of virtual Baufix

parts, a toy construction kit (see Fig. 1). The two interactants meet face-to-face over a table with a number of parts

on it. The human interlocutor—who is equipped with stereo glasses, data gloves, optical position trackers, and a

microphone—can issue natural language commands along with coverbal gestures or can directly grasp and

manipulate the 3D Baufix models to carry out assembly actions. Further, the human can address Max in natural

language and gesture. The agent is, likewise, able to initiate assembly actions or to engage in multimodal dialogue

using prosodic speech, gesture, eye gaze, and emotional facial expressions.

In this setting, the two interactants can become collaboration partners in a situated interaction task as follows; see

(Leßmann, Kopp & Wachsmuth 2006) for a detailed analysis. The partner wants (or is) to construct a certain

Baufix aggregate (e.g. a propeller) and is free to directly carry out assembly steps, either if they are known to her

or if she wants to give certain assembly procedures a trial. At any time, she can ask Max, who shares the situation

with her and attends her actions, for assistance. Max‘s task is then to collaborate with the human user in jointly

walking step-by-step through the construction procedure and to provide support whenever his partner does not

know how to go on constructing the aggregate.

!

Figure 1: In his CAVE-like Virtual Reality environment, Max guides the human partner through interactive construction procedures.

While the overall interaction is guided by the human‘s wish to build a certain assembly, once the human and

Max have engaged in the collaborative construction activity the scenario is symmetric in that roles may flexibly

switch between the interaction partners according to their competences. That is, either the human or Max may

carry out an action or may instruct the other to perform an action. This demands for the agent being able, both to

collaborate by taking actions in the world and conversing about them, possibly in an intertwined manner. The

scenario is hence characterized by a high degree of interactivity, with frequent switches of initiative and roles of

the interactants. The participants hence need to reconcile their contributions to the interaction with the need for

regulating it by performing multiple behaviors simultaneously, asynchronously, and in multiple modalities. That

is, their multimodal contributions are multi-functional, either communicative or manipulative, and the effects of

the latter can only be taken in from the situational context. We subsume each of those contributions that Max or

the human can perform under the term interaction move (as an extension of the common term dialogue move).

Enabling Max to engage in such an interaction requires to embed him tightly in the situational context and,

based on the perception of the human‘s interaction moves, to reason about what the proper next interaction move

may be right now. The result of this deliberation process is passed on to modules that generate the required

behaviors (Leßmann, Kopp & Wachsmuth 2006). This general layout resembles the SAIBA pipeline architecture

(Kopp et al. 2006), with dedicated representations to interface (1) between the planning of an interaction move

and its behavioral realization, and (2) between behavior planning and realization. In the remainder of this paper,

we describe the first representation as part of an information state-based interaction model that attempts to

explicate all the functional aspects of an interaction move which must be taken into account in order to keep track

of the interaction and adequately specify an interaction move independent of its behavioral realization.

2 Interaction Model

Adopting the information state approach to dialogue (Traum & Larsson, 2003), we frame an interaction model

that defines the aspects along which the collaborative interaction evolves from the agent’s point of view. This

model stipulates what facets of interaction Max has to take into account, without making any provisions as to how

these aspects can be fully represented or reasoned about.

2.1! Information state-based interaction model

The information state approach to dialogue modeling provides a framework for formalizing both, state-based/

frame-based as well as agent-based dialogue models in a unified manner (Traum & Larsson, 2003). It assumes

that each dialogue participant maintains information states (IS) that are employed in deciding on next actions and

are updated in effect of dialogue acts performed by either interactant. A particular dialogue manager, then, consists

in a formal representation of the contents of the ISs plus update processes that map from IS to IS given certain

dialogue moves. Several systems have been based on this framework, concentrating on different aspects of

dialogue, e.g., the GODIS system (Larsson et al. 2000), or in the WITAS project (Lemon et al. 2001). Traum &

Rickel (2002) have proposed a model of multimodal dialogues in immersive virtual worlds that comprises layers

for contact, attention, conversation, obligations, and negotiation. The conversation layer defines separate dialogue

episodes in terms of participants, turn, initiative, grounding, topic, and rhetorical connections to other episodes.

Rules state how communicative signals can be recognized and selected to cause well-defined changes to a layer.

However, most existing models have focused either on dialogues where the agents are only planning—with

the plan to be executed at a later time—or on dialogues where the agents are only executing some previously

created plan. As Blaylock et al. (2003) point out, this does not allow for modeling dialogues where mixed-

initiative collaboration and interleaved acting and planning take place, as in our setting. In addition, we posit that

not only spoken communicative acts but also manipulative actions must be characterized as interaction moves. We

therefore introduce a model that includes a layer accounting for the formation, maintenance, overlap, and rejection

of the goals of interactants. Goals cover the rational behind any kind of intentional action and abstract away from

their situational realization. The interaction model consists of the following layers:! Initiative: Who has brought up the goal that the interactants are pursuing in the current discourse segment.! Turn: Who has the turn. We distinguish between four states: my-turn, others-turn (or a unique name for a

specific interaction partner, respectively), gap, overlap.! Goals: The goals that have been pursued so far as well as the, possibly nested, goals that are still on the agent’s

agenda for the remainder of the interaction. Each goal may either have arisen from the agent’s own desires, or

was induced due to obligations following social norms or a power relation Max is committed to (Traum &

Allen 1994).! Content: The propositional content that has been or will be subject of the discourse, defined in a logic-based

notation.

!

! Grounding: The discourse state of content facts, denoting whether the agent assumes a fact to be new to the

conversant or part of the common ground (Clark & Brennan, 1991).! Discourse structure: The organization of the discourse in segments that can relate to goals and refer to or group

the entries in the content and goal layers. Each discourse segment has a purpose (DSP; Grosz & Sidner 1986)

and they are related based on the relationships among their DSPs, all of which are part of one intentional

structure. ! Partner model: What is known about the dialogue partner(s), also covering aspects of negotiation and

obligations. It is the basis of retrospective analysis and thus plays a central role for the agent being situated in

an evolving interaction.

2.2! Interaction moves

Any action by an interactant may alter the agent’s internal representation of the state of interaction. We focus here

on intentional actions, either communicative or manipulative, which make up for most of the progress of

interaction. They are selected and performed in order to achieve a desired state of the world. In mental attitudes

terms, we think of them as being produced following specific kinds of nested intentions, which in turn are

resultant of some goal-directed planning process. For example, when touching a bolt or saying “move the bolt

closer to the bar”, one might have the intention not just of manipulating the object but of mating it with a bar,

building a propeller, and assisting the partner.

Modeling such behaviors requires having an account of their functional aspects, in addition to their mere overt

actions. For manipulative acts, these aspects can easily be defined as the direct consequences (or post-conditions)

of the manipulation in the world, which in turn need to be related to the current intention structure. For

communicative acts, the functional aspects are not so easy to pinpoint despite of a long tradition of viewing

speaking as acting. Austin (1962) pointed out multiple acts (or “forces”) of making a single utterance: the

locutionary act of uttering words, and the perlocutionary act of having some effects on the listeners, possibly even

effecting some change in the world. To its perlocutionary end, an utterance always performs a certain kind of

action, the illocutionary act. Adopting this notion, Searle (1969) coined the term speech act which is supposed to

include an attitude and a propositional content. Other researchers have used terms like communicative acts

(Allwood, 1976; Poggi & Pelachaud, 2000), conversational moves (Carletta et al., 1997), or dialogue moves

(Cooper et al., 1999) for related ideas.

The notion of functional aspects of communicative actions is particularly beneficial to multimodal systems,

for it allows abstracting away from a signal’s overt form to core aspects that only got realized in certain ways

(Cassell et al. 2000; Traum & Rickel 2002). In Cassell et al.‘s model, the function a behavior fulfils is either

propositional (meaning-bearing) or interactional (regulative), and several behaviors are frequently employed at

once in order to pursue both facets of discourse in parallel. Poggi & Pelachaud (2000) define a communicate act,

the minimal unit of communication, as a pair of a signal and a meaning. The meaning includes the propositional

content conveyed, along with a performative that represents the action the act performs (e.g. request, inform, etc.).

Based on the interaction model laid out above, we define interaction moves as the basic units of interaction in

terms of the following slots:! Action: The illuocutionary act the move performs. The act can either be purely manipulative (connect,

disconnect, take, or rotate) or communicative. In the latter case, a performative encodes the kind of action as

described below.! Goal: The perlocutionary force of the move, i.e., what the move is meant to accomplish. This can be either a

desired world state (achieve something), or it can be the mere performance of an action (perform something).! Content: Information conveyed by the move, needed to further specify the action and the goal. This can

accommodate either propositional specifications (e.g. for language or symbolic gestures) but also an analog,

quantitative representation of imagistic content (e.g. for iconic gestures).! Surface form: The entirety of the move’s overt verbal and nonverbal behaviors, employed to convey all of the

aspects represented here. ! Turn-taking: The function of the move with respect to turn-taking, either take-turn, want-turn, yield-turn, give-

turn, or keep-turn.! Discourse function: The function of the move with respect to the segmental discourse structure, either start-

segment, contribute, or close-segment (cf. Lochbaum, Grosz, Sidner, 1999).! Agent: The agent that performs the move.! Addressee: The addressee(s) of the move.

These slots together capture the informational aspects that are relevant about an action in our model of

interaction. They are not independent from each other, nor are they self-sufficient. Instead, the slots are supposed

to mark the specific signification of particular informational aspects. In general, they provide a frame structure

that can be incrementally filled when generating an action through subsequent content planning and behavior

!

planning. Some of the slots may thereby remain empty, e.g., for smallish moves like raising a hand which may

solely serve a turn-taking function.

2.3! Semantics – Expectations – Obligations

The meaning of an interaction move is defined by what it realizes, i.e. by the effects it brings about in terms of the

aspects captured by the interaction model. For a manipulative move, the effects are directly implied by the action

and can be extracted via perception of the environment. Given powerful enough perceptive capabilities, the

current scene thus serves as an external part of the representation of the interaction state for the agent.

The effects of communicative moves can often not be defined in a clear-cut way as they depend on multiple

aspects of the interaction move and the context in which it is performed. For example, even a simple move that

fulfils the turn-taking function want-turn will result in a new state my-turn only when executed in the state gap (no

one has the turn). For this reason, information state-based systems typically employ a large number of update rules

to model the context-sensitive effects of particular moves. Furthermore, the effects of a communicative move

depend on social attitudes like the expectations a sender connects to a move or the obligations it imposes on the

recipient. Traum (1996) argues that obligations guide the agent’s behavior, without the need for recognizing a goal

behind an incoming move, and enable the use of shared plans at the discourse level. As Max is supposed to be

cooperative, obligations are therefore modeled to directly lead to the instantiation of a perform-goal in response to

an interaction move. If this move was a query or request, Max will thus be conducting the action asked for in a

reactive manner. In case of a proposal, he is only obliged to address the request, and his further deliberation

decides upon how to react. We hence explicitly encode the Action of each move by distinguishing between four

types of performative (e.g. cf. Poggi & Pelachaud 2000):

1. inform-performatives: provide informational facts, characterized by the desire to change the addressee’s beliefs

2. query-performatives: procure informational content to establish a new belief or to verify an existing one

3. request-performatives: request a manipulative action

4. propose-performatives: propose propositional content or an action

Derived from one of these general types, a performative can often be narrowed down through subsequent

specification during interpretation or generation (Poggi & Pelachaud 2000). The final performative will be tailored

to the content on which it operates (e.g., whether it asserts or requests a propositional fact, an action, or an

opinion) as well as to contextual factors like the actual situation, the addressee, the type of encounter, or the

degree of certainty. This can even happen retrospectively, when the performative of a previous move is fully

disclosed at a later point in the conversation. We represent these refinements using a compositional notation, e.g.

inform.agree or propose.action.

See (Leßmann, Kopp & Wachsmuth 2006) for a description of how the different performatives are endowed

with semantics, obligations and expectations, and how they are used along with the discourse function of a move

to state rules for identifying the holder of the initiative. To illustrate our model here, we analyse in Table 1 some

interaction moves of an example dialogue. To represent propositional facts in the goal and content slots, we use a

formal, logic-like notation in which predicates/relations are indicated by a capital letter, and unbound variables are

prefixed with a $. For example, “(Inform.ref $s)” represents that a reference will be communicated, with the

unbound variable $s being the referent. An unbound variable indicates underspecification inherent to the content

of the move (in this example, there is no particular entity being referred to, yielding an indefinite article). Note

further that the content and the goal slot together specify the informational aspects of the move, with some of

these aspects marked as being the goal.

3! Modeling situated interaction management

In this section we give a brief overview of how the functional aspects explicated in the move representation

inform interaction management. Interaction moves are the main data structure for interfacing in Max‘s

architecture between modules for perception, deliberation, and behavior planning. On the input side, interaction

moves are used to specify and structure the incoming information, possibly relating the information to external

objects or previous interaction moves; on the output side, they serve as a container which gets filled during the

generation process of an agent’s response. How the agent behaves in the interaction is determined by a Belief-

Desire-Intention control architecture (Bratman, 1987; Rao & Georgeff, 1991), for which we extend JAM/UM-PRS

(Huber 1999, Lee et at. 1994). It draws upon specific plans for analysing input moves as well as generating

responses in a context-dependent manner.

Plans can either directly trigger specific behaviors to act, but may also invoke dynamic, self-contained

planners that construct context-dependent actions or, again, plans. A judicious use of plans allows the agent to

reduce the complexity of controlling dynamic behavior and to constrain itself to work towards goals. Plans are

therefore kept as general as possible, using a constraint-bases representation, and are refined to situational context

!

not until necessary. That is, we too regard plans as mental attitudes, as opposed to recipes that just consist of the

knowledge about which actions might help achieve a goal (cf. Pollack 1992, Pollack 1990). When Max selects a

plan and binds or constrains its arguments, it becomes an intention on an intention stack, encompassing

information about the context and the goals responsible for it to come into existence. In result, the plan is

characterized by the agent’s attitudes towards the realization of his goal.

MAX USER MAX

Interaction Move “Let us build a

propeller.”

“Ok.” “First, insert a bolt in the middle of a

bar.”

Action propose.action inform.agree request.order

Goal (Achieve (Exists prop)) (Perform

(Inform.agree))

(Achieve (Connected $s $b $p1 $p2))

Content (Build prop we) (Build prop we) (Connect $s $b $p1 $p2) (Inst $s bolt)

(Inst $b bar)

(Center_hole $b $p2))

Surface form <words>t <words>t <words>t

Turn-taking take | give take | keep | give

Discourse function start-segment

(DSP=prop)

contribute start-segment

(DSP=prop-s1)

Agent User Max Max

Addressee Max User User

USER MAX USER MAX

“Which bolt?” “Any bolt.” User puts bolt into the first

hole of bar.

“No, that was the wrong hole.”

query.ref inform.ref connect inform.disagree

(Perform (Query.ref

$s))

(Perform (Inform.ref

$s))

(Achieve (Connected bolt-2

bar-1 port-1 port-3))

(Perform (Inform.disagree

(Connect …)))

(Inst $s bolt) (Inst $s bolt) (Connect bolt-2 bar-1 port-1

port-3)

(Not (Center-hole bar-1

port-3))

<words>t <words>t <manipulation> <words>t

take | give take | yield take | yield take | yield

contribute contribute contribute contribute

User Max User Max

Max User User

Table 1: An example dialogue analyzed into formal interaction moves.

3.1!! Dealing with incoming interaction movesA variety of plans are used for handling incoming interaction moves. Turn-taking functions are processed taking

into account the mental state of the agent, the goal he pursues, and the dominance relationship between the

interlocutors. A turn-taking model (Leßmann et al. 2004) is used that consists of two steps: First, a rule-based,

context-free evaluation of the possible turn-taking reactions takes into account the current conversational state and

the action of the partner‘s utterance. These rules are incessantly applied and integrated using data-driven conclude

plans to ensure cooperative dialogue behavior. The second step is the context-dependent decision upon different

response plans, possibly leading to new intentions.

The propositional content of an incoming interaction move, if present, is processed by plans for determining

the context it belongs to (e.g. to resolve anaphora). To this end, it is checked whether the move relates to one of

the current goals, or to an interaction move performed before. This is done by calculating a correlation value

!

between the content facts carried by the interaction move and the goal context. In addition, the agent needs to

resolve references to external objects in the virtual world, which is achieved using a constraint satisfaction-based

algorithm (Pfeiffer & Latoschik 2004).

If the agent succeeds in searching a candidate context, it adds an obligation-goal to handle the interaction

move as a sub-goal to the goal to which the move contributes to. Otherwise, a new obligation-goal is added as a

top-level goal. By associating the interaction move with one of his goals, the agent is able to deal with multiple

threads at the same time and keep apart their individual contexts. In result, incoming information is structured

according to its content, but also depending on the context—an important aspect of being situated in the ongoing

interaction. Finally, different plans are thus used to handle an incoming move depending on the action it performs.

For example, these plans may include verifying a proposition, answering a question, or constraining a parameter

involved in a plan in order to adapt it to the events occurring during the plan execution e.g. the usage of specific

objects or proposals made by the interlocutor.

3.2! Output planning

If the agent has the goal to achieve a certain state of the world he will reason about which courses of action may

lead to this goal. In result, he will formulate and aggregate situated plans that lead to the desired state of affairs.

Each behavior that any of these plans incorporates may be either a communicative or a manipulative move, both

represented as interaction move in terms of the same features defined above. Both kinds of behaviors hence stand

in a meaningful relationship to each other and can be carried out in a nested way when suitable. The decision is

based upon the role that the agent currently fulfils. If Max is to act as the instructor—his initial role upon being

asked by the human—he will verbalize the steps.

When Max has decided to employ a communicative act, he has to decide what meaning should be conveyed

(content planning) and how to convey it in natural language, gesture, etc (behavior planning). In general, the

behavior produced must be tailored to the agent’s current role in the collaboration, expressing his beliefs and

goals. Our approach to natural language generation starts with a communicative goal to be achieved and relies on

various sources of knowledge, including task and conversational competencies, a model of the addressee, and a

discourse history (McTear 2002; Reiter & Dale 2000). Crucially, this process is an integral part of the agent’s

plan-based deliberations and is carried out naturally by dedicated plans when Max’s mental attitudes decline him

towards making a verbal utterance.

The communicative goal derives directly from the currently active intention on the agent’s plan stack, e.g. to

request the interaction partner to connect two objects: request.order “user” “connect” “bolt-2” $obj3. The goal

of the generation process, then, is to determine all information about this communicative move needed to render it

as a multimodal utterance. Starting from a message (goal) as above, the general performative (action) is first

derived directly from the type of the intended act (in our example request.order). Content selection then works to

determine the information that specifies the parameters of the communicative act to concrete values and, if

possible, refines the performative. Discourse planning determines the move’s discourse function and the discourse

segment (DSP) it is contributing to, both being derivable from the actual structure and ordering of the plans on the

intention stack (cf. Lochbaum, Grosz, Sidner, 1999).

The final stages of output generation, not presented here, are multimodal behavior planning and realization; see

(Leßmann, Kopp & Wachsmuth 2006) for details on the methods used in the present scenario.

4! Summary

We have presented an approach to formally model the moves that interactants take in a mixed-initiative,

collaborative construction scenario. This scenario provides both the virtual agent and the human interloctur with

rich possibilities for interaction, comprising praxic actions and conversing about it as equal contributions. We have

defined a formal model that explicates and characterizes dimensions along which these situated interactions

evolve, and which an embodied agent must actively manage in order to be able participate as a collaborative

expert. Following an information state-based approach, we have further laid down in detail the notion of

interaction moves that encapsulate the different functions a contribution can fulfill with respect to the dimensions

of this conceptual framework. As these functions, among others not considered here (e.g. emotional display or

epistemic qualification), bear significant influence on the behavioral realization of an interaction move, FML

should be able to accommodate most or all of them.

ReferencesAllwood, J. (1976). Linguistic Communication in Action and Co-Operation: A Study in Pragmatics. In

Gothenburg Monographs in Linguistics 2, University of Gothenburg, Dept. of Linguistics.Austin, J. L. (1962) How to Do Things with Words. In Havard University Press, Cambridge, MA.

!

Blaylock, N., Allen, J., Ferguson, G. (2003). Managing communicative intentions with collaborative problem solving. In Jan van Kuppevelt and Ronnie"W. Smith, editors, Current and New Directions in Discourse and Dialogue, volume "22 of Kluwer Series on Text, Speech and Language Technology, pages 63-84. Kluwer, Dordrecht

Bratman, M. E., (1987). Intention, Plans, and Practical Reason. Harvard University Press, Cambridge, MACarletta. J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A. (1997) The reliability of a

dialogue structure coding scheme. Computational Linguistics 23:13-31. Cassell, J., Bickmore, T., Campbell, L., Vilhjalmsson, H., & Yan, H. (2000). Human conversation as a system

framework: Designing embodied conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied Conversational Agents, pp. 29-63. Cambridge (MA): The MIT Press.

Clark, H. H. & Brennan, S. A. (1991). Grounding in communication. In L.B. Resnick, J.M. Levine, & S.D. Teasley (Eds.). Perspectives on socially shared cognition. Washington: APA Books.

Cooper, R., Larsson, S., Matheson, Poesio, M., Traum, D. (1999) Coding Instructional Dialogue for Information States. Trindi Project Deliverable D1.1

Grosz, B. J., Sidner, C. L., (1986). Attention, Intentions, and the Structure of Discourse. In Computational Linguistics, Volume 12, Number 3, pp. 175-204: The MIT Press

Huber, M.J (1999). JAM: A BDI-theoretic mobile agent architecture. Proceedings Third Int. Conference on Autonomous Agents, pp. 236-243.

Kopp, S., Jung, B., Leßmann, N., Wachsmuth. I. (2003). Max – A Multimodal Assistant in Virtual Reality Construction. KI-Künstliche Intelligenz 4/03, pp 11-17

Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., Vilhjalmsson, H. (2006) Towards a common framework for multimodal generation in ECAs: The behavior markup language. Gratch, J. et al. (eds.): Intelligent Virtual Agents 2006, LNAI 4133, pp 205-217, Springer.

Larsson, S., Ljunglöf, P., Cooper, R., Engdahl, E., Ericsson, S., (2000). GoDiS - An Accommodating Dialogue System. In Proceedings of ANLP/NAACL-2000 Workshop on Conversational Systems, pp 7-10.

Lee, J., Huber, M.J., Kenny, P.G., Durfee, E. H. (1994). UM-PRS: An Implementation of the Procedural Reasoning System for Multirobot Applications. Conference on Intelligent Robotics in Field, Factory, Service, and Space (CIRFFSS), Houston, Texas, pp 842-849.

Lemon, O., Bracy, A., Gruenstein, A., Peters, S. (2001) Information States in a Multi-modal Dialogue System for Human-Robot Conversation. In Proceedings Bi-Dialog, 5th Workshop on Formal Semantics and Pragmatics of Dialogue, pp 57 - 67

Leßmann, N., Kranstedt, A., Wachsmuth, I. (2004). Towards a Cognitively Motivated Processing of Turn-taking Signals for the Embodied Conversational Agent Max. AAMAS 2004 Workshop Proceedings: "Embodied Conversational Agents: Balanced Perception and Action", pp. 57-64.

Leßmann, N., Kopp, S., Wachsmuth, I. (2006) Situated Interaction with a Virtual Human - Perception, Action, and Cognition. Rickheit, G., Wachsmuth, I. (eds.): Situated Communication, pp. 287-323, Mouton de Gruyter.

Lochbaum, K., Grosz, B. J., Sidner, C., (1999) Discourse Structure and Intention Recognition. In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text. R. Dale, H. Moisl, and H. Sommers (Eds.)

McTear, M. (2002). Spoken Dialogue Technology: Enabling the Conversational User Interface. In ACM Computer Surveys 34(1), pp. 90-169,

Pfeiffer, T. & Latoschik, M., E. (2004). Resolving Object References in Multimodal Dialogues for Immersive Virtual Environments. In Y. Ikei et al. (eds.): Proceedings of the IEEE Virtual Reality 2004. Illinois: Chicago.

Poggi, I. & Pelachaud, C. (2000). Performative Facial Expressions in Animated Faces. . In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied Conversational Agents, pp. 29-63. Cambridge: The MIT Press.

Pollack, M. E., (1990). Plans as Complex Mental Attitudes, P.R. Cohen, J. Morgan, and M. E. Pollack, eds., Intention in Communication, MIT Press, 1990

Pollack, M. E. (1992). The Use of Plans. In Artificial Intelligence, 57(1), pp.43-68Rao, A. & Georgeff, M. (1991). Modeling rational behavior within a BDI-architecture. In Proceedings Int.

Conference on Principles of Knowledge Representation and Planning, pp. 473-484.Reiter, E. & Dale. R. (2000). Building Natural Language Generation Systems, Cambridge University PressSearle, J.R. (1969). Speech Acts: An essay in the philosophy of language. In Cambridge University PressTraum, D. R. & Allen, F. A. (1994). Discourse Obligations in Dialogue Processing. In Proceedings of the 32nd

Annual Meeting of the Association for Computational Linguistics (ACL-94), pp. 1-8Traum, D. R. (1996). Conversational Agency: The TRAINS-93 Dialogue Manager. In Proc. Twente Workshop on

Language Technology 11: Dialogue Management in Natural Language Systems, pp. 1-11Traum, D. & Rickel, J. (2002). Embodied Agents for Multi-party Dialogue in Immersive Virtual World. In

Proceedings of AAMAS 2002, pp. 766-773, ACM Press.Traum, D. & Larsson, S. (2003). The Information State Approach to Dialogue Management. In R. Smith, J.

Kuppevelt (Eds.) Current and New Directions in Dialogue, Kluwer.

!

Functional Mark-up for Behaviour Planning

Theory and Practice

Brigitte Krenn+±

, Gregor Sieber+

+Austrian Research Institute for

Artificial Intelligence Freyung 6, 1010 Vienna

+43 1 53246212

±Research Studio Smart Agent

Technologies Hasnerstrasse 123, 1160 Vienna

{brigitte.krenn, gregor.sieber}@ofai.at

ABSTRACT

We approach the requirements analysis for an FML from a high-

level perspective on communication in general, and the current

state of developments in ECA communication in particular. Our

focus of assessment lies on the two basic units associated with a

communicative event, i.e., the communication partners involved,

and the communication act itself. Apart from coming up with a

selection of properties to be specified in FML, we argue that one

of the major challenges for a widely used FML is how much

freedom the specification leaves in terms of interconnecting

behaviour planning and intent planning, and how much or how

little it enforces the specification of semantic descriptions.

1. INTRODUCTION We approach the discussion of requirements for an FML from a

high-level perspective on communication and the current state of

developments in ECA communication. From a general point of

view questions arise such as: Who is communicating to whom in

which socio-cultural and situational context. What is the overall

interaction history of the communication partners, and what is the

history of the ongoing dialogue. What is the intention of the

communication and what is its content. Transferring these

questions to the ECA domain, at least leads to questions of

modelling the virtual character’s persona including some notion

of personality and emotion, and of modelling the communication

act itself, be it in terms of real-time action and response or in

terms of generating a complete dialogue scene in one go.

Our goal is mainly to come up with open questions and core

topics regarding a possible scope of an FML given the current

state of art in ECA communication. From a practical point of

view, we start from a narrowed down perspective on modelling

the communication partners and the communication act.

In section 2, we give a brief outline of the current state of ECA

development and its implications for the creation of a commonly

used mark-up or representation language at the interface of intent

and behaviour planning. We propose a set of person

characteristics and aspects of communication acts that need to be

considered in the specification of a functional mark-up language.

This is followed by a discussion of some basic building blocks

relevant for the computation of communicative events (section 3).

In section 4, we finally point out that one of the main challenges

of FML lies in finding a trade-off between detailed semantic

descriptions and interoperability of system components. We

round up our considerations with some words of caution

regarding the feasibility and desirability of a clear-cut separation

between intent and behaviour planning.

2. Current Situation in ECA Development --

Implications for the Creation of a Functional

Mark-up Language FML Work on computational modelling of communicative behaviour is

tightly coupled with the development of Embodied

Conversational Characters (ECAs). In ECA systems,

communicative events consist of (i) face-to-face dialogues

between an interface character and a user [1], (ii) an interface

character presenting something to the user [2], (iii) two or more

characters communicating with each other in a virtual or mixed

environment, e.g. [3]. On the one hand, there are ECA systems

where only the generation side of multimodal communicative

behaviour is simulated as it is the case with presenter agents

where the whole dialogue scene is generated in one go, e.g. the

NECA system [4]. On the other hand, there are systems where the

whole action-reaction loop of communication is computed, i.e.,

the system interprets the input of a communication partner and

then generates the reactions of the other communication partner(s)

and so forth. See the REA system [5] as an early example for the

complete process of behaviour analysis and behaviour generation.

Depending on the approaches pursued, the kind and complexity of

information required for processing greatly differs. This

influences the requirements on a functional mark-up or

representation language.

In order to realize communicative behaviour, first of all the

communicative intent underlying the behaviours needs to be

computed. To do this in a principled way requires a good deal of

understanding of the motivational aspects of human behaviour,

i.e., why a human individual (re-)acts in a particular situation in a

certain way. This requires theoretical insights into the underlying

mechanisms that determine the mental, affective and

communicative state of the agent. From psychology and social

sciences we have a variety of evidence that human behaviour is

influenced by such factors as cultural norms, the situational

context the individual is in, and the personality traits and the

affective system of the individual. All of which are huge areas of

research where a variety of models and theories for sub-problems

exist, but where we are still far from modelling the big picture of

how different aspects relate and which mechanisms interoperate

in which way(s). At the same time, we aim at building ECA

applications with characters that display human-like

(communicative) behaviour as naturally and believable as

possible. In other words, we have to smartly simulate human-like

communicative behaviour, which requires shortcuts at various

levels of processing. E.g. somewhere in the system it is stipulated

that, given certain context parameters, some character X wants to

express some fact Y in a certain mood Z. Such an internal state of

the system can be achieved by more or less complex processes.

To which extent these processes influence the inventory and the

mechanisms required for the FML still needs to be discussed.

This directly brings us to another crucial aspect for the design of

representation languages, i.e., the processing components used in

ECA systems. We need to study which subsystems are

implemented, what are the bits and pieces of information that are

required as input to the individual processing components, and

what kinds of information do the components produce as their

output. Especially if we aim at developing representations that

will be shared within the community, there must be core

processing components that are made available to and can be used

by the community. The requirement for reusability of components

touches a crucial aspect of system and application development.

Current ECA systems are built in order to realize very specific

applications. Accordingly all processing components are geared

towards optimally contributing to achieve the goals set out by the

application. In our understanding, this is one of the major reasons

why every group and almost every new ECA project has a

demand for and thus creates their own, very specific

representations. As a consequence the successful development of

representation languages that will be shared and further developed

in the community strongly depends on the ability to develop core

processing components for ECA systems that are flexible enough

to be customized for use in different applications and systems,

and even more important that the customization process of such

components provides a clear advantage over the new development

of specialized ones.

Summing up, we believe successful development of

representations that have a chance to be commonly used must be

flexible enough to allow, on the one hand, in depth representation

of theoretical insights into specific phenomena and, on the other

hand, provide an inventory of high-level representations of core

information that is basic to all systems generating communicative

behaviour. The availability of reusable processing components

that operate on this core is expected to foster the uptake of the

representation language within a wider community. These

considerations equally apply to the ongoing work within the

SAIBA [A] initiative on the development of a common behaviour

mark-up language (BML) [B] as well as to the newly started

endeavour of the development of a functional mark-up language

(FML) for the generation of multimodal behaviour.

In the remainder of the paper, we will start discussing a potential

inventory of an FML from the point of view of two major

building blocks of communicative events, namely the

communication partners and the communicative acts.

3. Some Basic Building Blocks to Realize

Communicational Intent Two basic units associated with a communicative event are the

communication partners involved, and the communication act

itself. See Table 1 for a tentative list of aspects of person

characteristics. The listed characteristics roughly relate to three

dimensions: 1. person information, such as naming, outer

appearance and voice of the character; 2. social aspects, including

the role a character plays in the communicative event, but also

including the evaluation of a character by the others based on the

outer appearance of a character, its gender, and with which voice

the character speaks; 3. personality and emotion. All this

influences how an individual (re-)acts in a certain

(communicative) situation. Even though it is not yet sufficiently

understood how these aspects interrelate to generate

communicative intent, in almost all current ECA systems emotion

plays an important role in intent and behaviour planning as well

as in behaviour realization.

In particular, appraisal models [6] have shown to be well suited

for intent planning, basic emotion categories [7] are widely used

when it comes to facial display, and dimensional models of

emotion have been successfully employed in speech synthesis [8].

Personality models have been integrated in agents to model

behaviour tendencies as well as intent planning, e.g. [9]. The Five

Factor Model of personality [10] is widely used in most of the

works. The interplay between personality and emotion has been

studied. [11], for instance, considers personality to ensure

coherency of reactions to similar events over time.

Thus, information on the emotional state of the communication

partners is important for planning and realization of the

communicative acts. From an emotion theoretical point of view, a

distinction between emotion proper, interpersonal stance, and

general mood of an agent should be possible in the representation

language, as well as a distinction between emotion felt and

emotion expressed. Due to culturally dependent display rules,

individuals will display different emotions depending on the

current social and situational context. A clear separation between

the role of emotion in intent planning versus behaviour planning,

however, is not easy to draw, and depends on the power of both

the intent and the behaviour planner. Some behaviour planners

will be able to make use of different aspects of emotion other

ones will only be able to handle emotion at utterance level.

Looking at a communicative event from a dialogue perspective

(cf. Table 2), we have a structuring of the dialogue into turns, and

a turn into individual communication acts. Communication acts

are either verbal or non-verbal. The verbal communication acts

are assigned with dialogue acts in order to specify communicative

intent, e.g. ask, inform, explain, refuse, etc. As for the non-verbal

communication acts communicative intent can be specified via

backchannel functions such as keep contact, signal understanding,

agree, disagree, etc. For an FML the question arises to which

extent functional labels of verbal and non-verbal communication

acts overlap and where the representational inventory differs. At

the level of communication act different strands of information

come together, such as information on the sender/receiver, on the

emotion expressed, on the communicative intent in terms of

dialogue acts and backchannel functions, as well as on

information structure in terms of links to the previously

communicated information versus providing new information. All

this has a potential to be encoded in FML, core aspects of which

we have listed in the following tables. In Table 1, we have also

included a number of features which are important for the

description of the participants of a communicative event, such as

participants, person, realname, gender, appearance, type, voice,

but which are not core FML features.

Table 1: Aspects of Person Characteristics – An Initial List

for Discussion

Property Description

participants Collection of personal descriptions of all

individuals (characters) that take part in

the communicative event.

person Description of an individual taking part

in the communicative event, including a

unique identifier and a nickname of the

character.

realname Specifies the real name of the character.

Useful in cases where real humans are

represented by avatars, and the

connection to the real person still needs

to be kept.

gender Specifies the gender of the character.

Gender may have various implications

on the behaviour of the character itself

and on how the character’s behaviour is

interpreted by the communication

partners.

type Specifies whether the individual

represented by the character is a human

or a system generated character. Useful

in a mixed environment where user

avatars and system agents interact.

appearance Determines the graphical realization of

the character, i.e. how the character

looks like, how it dresses, what the

neutral posture, the base-level muscle

tone and velocity of the character is.

voice Determines which voice should be used

for the character in speech synthesis and

what the basic prosody parameters are,

such as pitch level and speech rate.

personality Determines the personality type of a

character. The labels and values used

depend on the personality model

employed, e.g. extroversion,

neuroticism, agreeableness in case of a

simple factor model, but also labels such

as politeness and friendliness may be

useful in certain applications. Depending

on the underlying model, values may be

represented by labels or via integers or

floats.

role Role is a domain-specific attribute of the

character and determines the specific

role the character plays in the given

application, such as buyer or seller, pupil

or teacher, bully or bullied, husband or

wife, mother or child, story teller or

hearer etc. Thus role has a variety of

(implicit and explicit) social implications

which may be explicitly specified in the

FML or modelled inside a processing

component.

emotion Depending on the emotion theory (such

as dimensional model, appraisals,

emotion categories) the representations

of emotion differ. As a starting point for

emotion representations related to the

three different models see the work on

the emotion representation language

EARL [3].

emotionFelt Kind and intensity of emotional state of

the character.

emotionExpressed Kind and intensity of emotion displayed.

Felt emotion and displayed emotion are

not necessarily identical, cf. display

rules.

interpersonalStance How the affective relation to the

communication partner is.

mood How the base-level affective state of the

character is.

Table 2: Aspects of Communication Act – An Initial List for

Discussion

Property Description

turn A turn comprises a sequence of

communication acts of one speaker.

Turns are the main building blocks

which describe how the dialogue is

structured.

communicationAct Specifies a communicative act (as

opposed to a non-communicative act).

This may be a verbal or a nonverbal act,

each of which has a communicative

function or goal, and can be colored by

emotion. Note, because of the

embodiment of ECAs verbal acts

inherently contain bodily aspects. A

communication act can be a reaction to

some other communication act, and it

can introduce new information to the

dialogue. A communicative act has its

underlying producer-side intentions and

goals, such as provide or get

information, improve relationship,

maintain or gain power, cheat, lie, etc.

All these may require generalized high-

level representations as well as theory-

dependent in-depth representations.

dialogueAct Refers to a verbal communication act

and may consist of one or more

utterances. As a staring point for the

mark-up of the communicative intent,

models for dialogue act mark-up such as

the DAMSL [D] annotation scheme can

be used, but also agent mark-up

languages such as FIPA ACL [E] should

be taken into account. While DAMSL

(and its extension SWBD-DAMSL,

[12]) is a high-level framework that has

been developed for the annotation of

human dialogue, FIPA ACL has a

defined semantics for each

communicative act that is exchanged

between software agents. In practice,

however, for concrete ECA applications

additional application-specific labels

may be useful.

informationStructure Looking from a high-level and coarse-

grained perspective, information

structure anchors what is being

communicated onto what has previously

been communicated (theme) and what

the new contribution is (rheme).

Information structure also influences

prosody and thus may be a valuable

input for speech synthesis [13].

nonVerbalAct A communication act that entirely

consists of nonverbal behaviour. Typical

non-verbal acts in communicative

situations are backchannels. The

functional labels from Elisabetta

Bevacqua’s feedback lexicon could be a

good starting point here.

producer Who the producer of a verbal or

nonverbal act is.

addressee Who the addressee is. Producer,

addressee and hearer refer to the persons

specified in the participants list of the

communication event.

receiver The individual who feels addressed by

the producer’s utterance or nonverbal

act. Receiver and addressee are not

necessarily identical.

perceiver The overhearer or onlooker of a

communicative act. Perceivers in

contrast to receivers do not feel affected

by the communicative act. Producer,

addressee, receiver, perceiver are the

communication act side of person

characteristics.

4. Further Challenges: Separation of Intent

and Behaviour Planner Apart from coming up with a selection of properties to be

specified in FML, we suppose that one of the major challenges for

the specification of an FML is how much freedom the

specification leaves in terms of interconnecting behaviour

planning and intent planning. Consider the problem of deciding

whether to use a non-verbal act such as an iconic gesture to

convey a certain intention. This could, for example, be a good

solution in a situation where the addressee is busy talking to

someone else, where it would be impolite to interrupt due to

cultural or social restrictions, and where the agent would prefer

not to wait with the communicative act until the addressee has

finished the other conversation.

If completely independent planning components are assumed, a

rather detailed semantic description of the content to be

communicated and of the situation the agent is in is required.

Since FML should not contain information on the physical

realisation, and if intention planning does not get feedback from

behaviour planning, the component has no knowledge whether

there is a certain gesture available to the agent that will serve the

communicative intention. Thus the behaviour planning

component needs to receive input in a detailed enough semantic

description that allows for the decision that a) it would be good to

use a gesture in the current situation, b) there is a gesture that

conveys the meaning of the message such that no essential

information is lost. In contrast, a system with less distinct

boundaries between intention and behaviour planning would

require less detailed semantic descriptions. For instance, given the

intent planner has access to the gestures available in the system,

the intent planner would be able to decide to use a certain gesture

in the moment it defines the agent’s intentions. Thus there would

be no necessity for further serializing the information, reading it

in and interpreting it inside the behaviour planner.

In practice, not every system will be able to provide or process

detailed semantic information as may be required by a strict

separation of intent and behaviour planning. This may be due to

the real-time requirements of ECA systems, a lack of a suitable

semantic representation language, or the lack of suitable and

efficient semantic processing components.

The success of FML within the ECA community, thus, is also

likely to depend on how much - or how little - it enforces the

specification of semantic descriptions: on the one hand leaving

enough flexibility to remain usable in systems that do not make

use of detailed semantic representations, and on the other hand

providing enough semantic detail to ensure interoperability

between conforming components.

5. ACKNOWLEDGMENTS This research is supported by the EU-FP6 Cognitive Systems

Project IST-027596-2004 RASCALLI.

6. REFERENCES [1] Matheson, C., Pelachaud, C., de Rosis, F., and Rist, T. 2003.

MagiCster: Believable Agents and Dialogue, Künstliche

Intelligenz, special issue on “Embodied Conversational

Agents”, November 2003, 4, 24-29.

[2] Nijholt, A. 20076. Towards the Automatic Generation of

Virtual Presenter agents. In: Proceedings InSITE 2006,

Informing Science Conference, Salford, UK, June 2006, CD

Proceedings, E. Cohen & E. Boyd (ds.).

[3] Rehm, M., and André, E. 2005. From chatterbots to natural

interaction - Face to face communication with Embodied

Conversational Agents. IEICE Transactions on Information

and Systems, Special Issue on Life-Like Agents and

Communication.

[4] Krenn, B. 2003. The NECA Project: Net Environments for

Embodied Emotional Conversational Agents Project Note. In

Künstliche Intelligenz Themenheft Embodied Conversational

Agents, Springer-Verlag, 2003, p. 30-33.

[5] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L.,

Chang, K., Vilhjálmsson, H., and Yan, H. 1999.

"Embodiment in Conversational Interfaces: Rea."

Proceedings of the CHI'99 Conference, pp. 520-527.

Pittsburgh, PA.

[6] Ortony, A., Clore, G.L., and Collins, A. 1988. The Cognitive

Structure of Emotions. Cambridge University Press.

[7] Ekman, P. 2007. Emotions Revealed: Recognizing Faces and

Feelings to Improve Communication and Emotional Life.

2nd. ed. Owl Books, New York.

[8] Schröder, M. 2004. Speech and emotion research: an

overview of research frameworks and a dimensional

approach to emotional speech synthesis (Ph.D thesis). Vol. 7

of Phonus, Research Report of the Institute of Phonetics,

Saarland University.

[9] Andre, E., Klesen, M., Gebhard, P., Allen, S. and Rist, T.

1999. Integrating models of personality and emotions into

lifelike characters. In Proceedings International Workshop

on Affect in Interactions. Towards a New Generation of

Interfaces.

[10] Robert R. McCrae, and Paul T Costa, Jr. 1996. Toward a

new generation of personality theories: Theoretical contexts

for the five-factor model. In J. S. Wiggins (Ed.), The five-

factor model of personality: Theoretical perspectives (pp. 51-

87). New York: Guilford.

[11] Ortony, A. 2003. On Making Believable Emotional Agents

Believable. In R. Trappl, P. Petta, S. Payr (eds). Emotions in

Humans and Artefacts. MIT Press.

[12] Jurafsky, D., Shriberg, E., and Biasca, D. 1997. Switchboard

SWBD-DAMSL shallow- discourse-function annotation

coders manual, draft 13. Technical Report 97-01, University

of Colorado Institute of Cognitive Science, 1997.

[13] Baumann, S. 2006. The Intonation of Givenness - Evidence

from German. Linguistische Arbeiten 508, Tübingen:

Niemeyer (PhD thesis, Saarland University).

Web Links

[A] SAIBA http://www.mindmakers.org/projects/SAIBA

[B] BML http://www.mindmakers.org/projects/BML

[C] EARL http://emotion-research.net/earl/

[D] DAMSL

http://www.cs.rochester.edu/research/speech/damsl/RevisedManu

al/RevisedManual.htm

[E] FIPA ACL

http://www.fipa.org/specs/fipa00037/SC00037J.html

Thoughts on FML: Behavior Generation in the VirtualHuman Communication Architecture

Jina LeeUniversity of Southern

CaliforniaInformation Sciences Institute4676 Admiralty Way, # 1001Marina del Rey, CA 90292

[email protected]

David DeVaultUSC Institute for Creative

Technologies13274 Fiji Way

Marina del Rey, CA [email protected]

Stacy MarsellaUniversity of Southern

CaliforniaInformation Sciences Institute4676 Admiralty Way, # 1001Marina del Rey, CA [email protected]

David TraumUSC Institute for Creative

Technologies13274 Fiji Way

Marina del Rey, CA [email protected]

ABSTRACT

We discuss our current architecture for the generation ofnatural language and non-verbal behavior in ICT virtualhumans. We draw on our experience developing this archi-tecture to present our current perspective on several issuesrelated to the standardization of FML and to the SAIBAframework more generally. In particular, we discuss ourcurrent use, and non-use, of FML-inspired representationsin generating natural language, eye gaze, and emotional dis-plays. We also comment on some of the shortcomings of ourdesign as currently implemented.

1. OVERVIEWIn this paper, we discuss our experience developing multi-

modal generation capabilities within the ICT virtual humanarchitecture. This paper is intended to contribute to anongoing e!ort to standardize Functional Markup Language(FML) as a representation scheme for describing commu-nicative and expressive intents across diverse conversationalagents. Our discussion focuses on how our current approachto generating natural language, eye gaze, and emotional dis-plays relates to FML and to the SAIBA framework withinwhich FML has been characterized [8].

The SAIBA framework makes a distinction between pro-cesses of intention planning, behavior planning, and behav-ior realization. It then situates these processes within ageneration pipeline, and proposes two communication lan-guages to mediate between these processes: FML to specifythe result of intention planning to behavior planning, andBML to specify the result of behavior planning to behaviorrealization.

While there has been a lot of work on BML, there hasbeen comparatively less work on FML and the various real-world architectural issues associated with implementing theSAIBA framework. We begin with a high-level discussion ofsome of these architectural issues.

One high-level consideration is that the distinction be-tween intention planning, behavior planning, and behaviorrealization is only one of many organizing distinctions that

could be made in a communication/action planning frame-work. Some others include the following.

One can distinguish actions according to the di!erent kindsof intentions that can be behind them. Allwood [1] dis-tinguishes three types of communication: Indicate, Display,and Signal. A sender indicates information if that infor-mation is conveyed without conscious intention. Displaysare consciously shown, and signals are conscious showingsof the showing (i.e. intending the receiver to recognize theconscious showing). An embodied agent may perform anaction intentionally without intending to communicate any-thing; if another agent or person is present, important infor-mation may nevertheless be conveyed by indication. Shouldthe planning of actions that are not intended to be commu-nicative be part of the FML/BML pathway, or should theseactions reach the behavior realizer through some other chan-nel? Moreover, some behaviors that embodied agents needto realize (e.g., breathing) are not “intentional” in the rele-vant sense, and thus the notion of intention planning is in-appropriate. If information about agent state is relevant torealizing such behaviors, is this information also channeledto the realizer outside the FML/BML pathway?

Another organizing distinction could be the type of behav-ior. Traditionally, verbal behavior and non-verbal behaviorhave been generated at di!erent times and using di!erentmeans. Verbal communication has discrete units, a fairlyarbitrary relationship of form to meaning, and deep lexical,syntactic and semantic structures, while non-verbal commu-nication often is more continuous, has a closer relationshipof form to meaning, and shallow syntactic structure. Tradi-tional text generation often has more stages in processing,and uses more contextual information. Most SAIBA workhas focused on non-verbal behavior. Should the same path-ways be used for text generation and non-verbal behavior, orshould these paths be split (e.g., with text generated first)?And of course, this issue extends to other kinds of behaviorsthat are not realizing a communicative function.

Another architectural issue arises in real-time interactiveconsiderations. Even though the proponents of the SAIBAframework are keenly aware of the importance of real-time

interaction, the SAIBA framework remains suggestive of atraditional pipeline architecture of planning followed imme-diately by plan execution. This is fine for a virtual agentthat resides in a static environment. However, in a more dy-namic environment, an agent must respond to unexpectedevents in the environment. For example, many communi-cation decisions must rely not just on individual intentionplanning, but also on monitoring the e!ects of previouslyplanned action, and especially on monitoring new actionsby people and other agents. Intention planning thus musthave access to this information and must also be able toadjust or cancel communication that has been planned butnot yet performed. This suggests not only additional re-quirements on what is provided by the intention planner tothe behavior planner but also on what is provided by thebehavior planner and realizer to the intention planning.

Finally, there is a more general architectural question ofhow to modularize a real-world generation system in a waythat provides each module with all the sources of informationit needs. For example, as we discuss in further detail below,our current gaze generation system relies on fine-grained,dynamic information about upstream cognitive processing.Similarly, natural language generation can sometimes re-quire detailed information about the agent’s cognitive stateand other contextual factors. Such rich information needscan create pressures that work against maintaining a cleantheoretical modularity such as that suggested in the SAIBAframework.

In the remainder of the paper, we discuss our virtual hu-man architecture and then our perspective on how our cur-rent design might inform the standardization of FML.

2. ICT VIRTUAL HUMAN COMMUNICA-TION ARCHITECTURE

The virtual human project at ICT [14, 20, 17] has pro-duced several virtual humans and a developing architecture,which is depicted in Figure 1. In this section, we describethe control flow and representations involved in generatingmultimodal output within this architecture.

For intentional communication signals, the generation pro-cess starts with configurations of the agent’s informationstate that match a proposal rule. Examples include obli-gations to answer a question, ground or repair previouslycommunicated information, or make a suggestion. Theseproposals to communicate compete with many other goalsof the agent – both to say other things as well as to performother actions such as monitoring the communication of oth-ers or acting in the world. Once a proposal is selected, thegeneration process begins.

2.1 Natural language generationIn our current system, natural language generation (NLG)

occurs before non-verbal behavior generation (NVBG). Ingeneral, the dialogue manager initiates NLG by sending ageneration request to an external generator. However, cur-rently the dialogue manager sometimes bypasses the exter-nal generator if it already knows a good text string for itsdesired output, according to hand-implemented SOAR rules,or rules generated from an ontology. We have four di!erentexternal generators that may be used, including two statis-tical generators, a hand-crafted grammar-based generator,and a hybrid generator. [19] has more details on a previous

Figure 1: The virtual human system architecture.

version of the generation process.The dialogue manager sends requests to the generator in

the form of one or more speech acts and dialogue acts torealize. The messages to the generator are of the form givenin Figure 2. The vrGenerate message can be received byany external generator. In this case the dialogue manageris asking for a greeting speech act from the virtual humanelder-al-hassan to a human addressee, who plays the roleof a U.S. Army captain (captain). This act is also the re-sponse to a previous utterance. One or more generators canreply to this request with vrGeneration messages such asthose in Figure 3. There can be one or more vrGenera-

tion interp messages, each one with a candidate text forthis output and with an interpretation identifier (1) and aquality value (-3.742008). The vrGeneration done mes-sage tells the dialogue manager that the generator(s) arefinished sending interpretations.

Figures 4 and 5 show a similar request and response. Thistime another virtual human, doctor-perez, is trying to ne-gotiate, and wants to address a problem in a plan involvingmoving to downtown by telling the elder that his agreementis important for the success of the plan. When the dialoguemanager has received the generation results, it can decidewhich one to use (if there is more than one result), based onboth the quality of the generation and other factors (e.g.,whether it has said this same string before). The dialoguemanager might also decide to cancel the speech if it is nolonger relevant (or if, e.g., another character starts speakingand this character does not want to interrupt).

Thus, in our current architecture, NLG is not part of apure pipeline since the upstream dialogue manager choosesbetween alternative NLG outputs and sometimes cancelsoutput altogether. After the dialogue manager decides togo forward, a call is sent to carry out this utterance. Thiscall includes information on the speech acts and dialogueacts as well as the text, and results in an XML message

vrGenerate elder-al-hassan elder-al-hassan203

addressee captain

speech-act<A135>.type csa

speech-act<A135>.action greeting

speech-act<A135>.actor elder-al-hassan

speech-act<A135>.response-to gsym1

speech-act<A135>.addressee captain

Figure 2: Generator request

vrGeneration interp elder-al-hassan

elder-al-hassan203 1 -3.742008

hello captain

vrGeneration done elder-al-hassan

elder-al-hassan203

Figure 3: Generator response

being sent to the NVBG module.

2.2 Nonverbal and other physical behaviorsIn addition to dialogue management, a virtual human’s

cognitive processes include task planning, a gaze model,and an appraisal-based model of emotion. These processesprovide a range of information to our NVBG module [10]through FML-inspired constructs. This information includesa specification of the communicative intent (including thespeech acts and dialogue acts), the surface text of the utter-ance, the agent’s gaze state, and a range of factors associatedwith the emotion model.

In this section, we present our current use of FML-inspiredconstructs to pass gaze and emotion information to the be-havior planner. We will not discuss further the simple FMLelements we currently use to capture the communicative in-tent and the surface text. It is important to note, how-ever, that this is a hybrid approach that assumes NLG isupstream of the behavior planner but also assumes the in-tentional/semantic content can help refine non-verbal be-havior choices. In terms of the SAIBA framework, one wayto view this approach is that in some implementations bothFML elements and BML elements are passed to the behaviorplanner. More generally, this raises fundamental issues forFML and SAIBA as to what assumptions are being madein the framework about how verbal and nonverbal behaviorsare generated (or co-generated). We discuss this in greaterdetail in Section 3.3. Presently, we are actively consider-ing alternative generation schemes and therefore expect ourperspective on the appropriate FML elements to evolve asour design process continues. Our focus in this section is onaspects of our current use of FML that are somewhat morestable and, we believe, more transferable to other systems.

2.2.1 GazeThe reader may think that gaze is not a function but a

behavior, and thus should not be an element in FML at all,but rather solely in BML. In the abstract, we would tendto agree. However, given the real-time changes in humangaze directions and targets during communication, and themyriad functions that gaze plays in human cognitive andsocial behavior, it is important to consider its role in detail.

In our current virtual human system, the gaze model [11]

vrGenerate doctor-perez doctor-perez386

addressee elder-al-hassan

speech-act<A348>.motivation<V22>.reason

downtown

speech-act<A348>.motivation<V22>.goal

address-problem

speech-act<A348>.content<V21>.

modality<V23>.conditional should

speech-act<A348>.content<V21>.type action

speech-act<A348>.content<V21>.theme downtown

speech-act<A348>.content<V21>.event agree

speech-act<A348>.content<V21>.agent

elder-al-hassan

speech-act<A348>.content<V21>.time present

speech-act<A348>.addressee elder-al-hassan

speech-act<A348>.action assert

speech-act<A348>.actor doctor-perez

Figure 4: Generator request

vrGeneration interp doctor-perez

doctor-perez386 1 -2.9832053

you should agree to this before we can think

about moving elder

vrGeneration done doctor-perez doctor-perez386

Figure 5: Generator response

resides in the cognitive module and generates various gazecommands. The key principle behind the model is that gazeshould reflect the agent’s underlying cognitive state; thishas historically led us to locate it within the cognitive mod-ule, not the behavior planner. Since gaze movement is afast and immediate process, the gaze model is closely inter-twined with the agent’s task planner, dialog manager, andemotion model. Each of these components, which constitutethe cognitive module, generates a set of cognitive operatorsthat represent the agent’s internal processing. The role ofthe gaze model is then to associate these operators with cor-responding gaze behaviors.

The generated cognitive operators can be understood interms of several broad categories of cognitive processes inconversation. For example, as illustrated in Table 1, thereare cognitive operators related to conversation regulation,update of internal cognitive state, and monitoring of eventsor goal status. While most operators related to conversa-tion regulation generate gaze commands accompanying ver-bal utterances, others do not. For instance, monitoring forexpected/unexpected changes, attending to a physical stim-ulant in the environment, or checking a condition for a pur-sued goal are internal intentions that are reflected intention-ally or unintentionally through various nonverbal behaviors.Additionally, there are cognitive operators related to theagent’s coping strategies (discussed further below).

The gaze model associates these cognitive operators withgaze behaviors by providing a specification of both the phys-ical manner of gaze (e.g. target, type, speed, priority) andits functional role. The functional role, or the reason of thegaze command, is a description of the cognitive operatorthat triggers the gaze command. This may be a sub-phaseof a higher-level cognitive operator. For example, during

Category Cognitive Operator Gaze Reason- planning speech (look at hearer, hold turn, rejection,rejection goal satisfied, acceptance reluctant, remember-ing)

output-speech - speakingConversation - speech doneRegulation - speech done hold turn

listen-to-speaker - listen to speakerinterpret-speech - interpret speechexpect-speech - expect speechwait-for-grounding - expect (acknowledgment, expect repair)update-desire

Update Internal update-relevance - planningCognitive State update-intention

update-belief - monitor goalattend-to-sound - attend to sound objectcheck-goal-status - monitor goal

Monitor for Events / monitor-goal-status - monitor goal refreshGoal Status monitor-for-expected-e!ect - monitor for expected e!ect

monitor-for-expected-action - monitor expected action- monitor expected action (assert intention to performthe action, (take action against an action)- seek social support- monitor goal- avoidance

Coping Strategy Coping-focus - convey displeasure- accept responsibility- make amends- resignation- avoidance (by-distancing, by-wishing-away)

Table 1: Partial overview of cognitive operators, gaze reasons, and gaze behaviors

the output-speech phase, there are sub-phases such as plan-ning speech, speaking, complete speaking, holding the turn,etc. Table 1 shows how various gaze reasons correspond tocognitive operators in our system.

In our system, we use an FML <gaze> element with theproperties of gaze behaviors specified in the attributes andsend it to NVBG. NVBG then transforms it into a BML<gaze> element and sends it to SmartBody [18], the behav-ior realization module.

As the gaze model was originally developed, the gaze man-ner specified by the model provided parameters to a proce-dural animation of gaze by a behavior realizer. However,in our current work, we are providing the reason parame-ter to the behavior planner. This specification will allowfor more expressive variations as well as variations that mayalso be tied to other aspects of the body’s state as well ascapabilities of the animation system.

2.2.2 EmotionIn our system, we model both the generation of emotional

states that arise as the virtual human reacts to events as wellas how the virtual human copes as it attempts to regulateits emotional state. EMA (EMotion and Adaptation) [7] isthe emotion model in our virtual human system. EMA islargely based on Lazarus’ work on appraisal theory [9].

Appraisal

EMA assesses emotion-eliciting events into a range of ap-praisal dimensions (or checks or variables), such as perspec-

tive, desirability, likelihood, expectedness, causal attribu-tion, temporal status, controllability, and changeability. Theappraisal dimension is then mapped to generate various emo-tion labels and intensity of those emotions. For example,an undesirable and uncontrollable future state is mappedas fear-eliciting. In general, a set of appraisal patterns cangenerate one or more emotion labels.

Currently in our system, an FML <a!ect> element isused to specify both the emotion labels along with the inten-sity, target, and stance (leaked or intended) of the emotion.Whenever the agent’s emotion is re-assessed, this informa-tion is sent to the NVBG module, which uses it to modifythe gestures created. Note we discuss in this section howwe model “leaked” emotions, or more accurately “felt” emo-tions, as opposed to emotional expression used intentionallyas a signal, which we discuss in the Coping Strategy sectionbelow.

Once the appraisal dimensions are (re-)evaluated, theyare also used to generate Facial Action Unit codes, based onthe work of Ekman [5]. As opposed to emotion labels, theaction units are specified in BML (instead of FML) withinthe <face> element and sent to NVBG. Since NVBG re-ceives the action units in BML, it simply passes them toSmartBody. However, conceptually it should be the behav-ior planner that generates action units along with other ges-tures after receiving the agent’s a!ective state. In the future,we suggest alternative ways to express the agent’s a!ect de-pending on the level of detail available. Section 3 describesour proposed FML specifications.

Note that there is a range of research issues concerning themapping from appraisals and emotions to action units thatwe are glossing over here. Whereas several psychological the-ories have postulated a mapping from appraisal variables toaction units, they di!er on the specifics of the mapping. Fur-ther, given any specific appraisal, there may not be a uniquemapping to action units even given the same theory. Thereare individual di!erences in how to map appraisals or emo-tions to action units. There are also alternative theories thatpostulate that there is not a mapping between appraisalsto action units but rather mappings from emotions to ac-tion units. There are also issues in dynamics. Psychologicaltheories di!er in whether they postulate temporal orderingrelations between appraisal checks and whether they arguethat this ordering is reflected in temporal di!erences in theordering of associated action units. There are, finally, evensome psychologists that argue against facial expressions re-vealing ”true” underlying emotional states, instead arguingthat facial expressions are social signals.

Coping Strategy

EMA also incorporates a computational model for copingstrategy integrated with appraisal dimensions [7]. EMA an-alyzes the causality of events that produce the given ap-praisal dimensions and suggests strategies to either preservedesirable states or overturn undesirable states. These strate-gies may propose to execute certain plans, alter goals andbeliefs, or shift blame for an undesirable event to another en-tity. The coping strategies modeled in EMA are organizedby their impact on the agent’s focus of attention, beliefs, de-sires, or intentions. Table 2 gives an overview of the copingstrategies.

In the current virtual human system, coping strategies arepropagated to the behavior planner in two ways. One is byimplicitly influencing the agent’s a!ective state and gener-ating a new emotion label, which is then taken into accountduring behavior generation. The other is by directly influ-encing the nonverbal behaviors generated. In particular, thegaze model described above has certain gaze behaviors as-sociated with di!erent coping strategies. For example, seekinstrumental support shifts gaze towards some other agentwhereas Resignation causes the agent to avert gaze from itscurrent target. However, as with the case of appraisal, itis more appropriate to describe the coping strategy withinFML and let the behavior planner decide how this wouldinfluence the behavior generation process.

Coping also provides the agent with the means to con-vey emotional states intentionally, for example, by showingdispleasure or anger. This expression or signaling of emo-tional state may di!er from the true or felt underlying emo-tional state of the virtual human. It is this distinction whichmotivated the original FML ideas of distinguishing “leaked”from “intended” emotions; see our proposed FML <a!ect>element in Section 3.2.

Currently, modeling of coping strategies is not commonin virtual human systems. Unlike other cognitive operationsdescribed in this paper, coping strategies may not have animmediate e!ect in the behavior generation process. Rathera coping response may influence how the agent selects, plans,and executes its internal goals. This in turn has influenceson the choices of behaviors. On the other hand, a copingresponse can be an immediate reaction with well-defined be-havioral correlates, such as avoidance responses impacting

gaze or shifting blame impacting an expression of anger.

3. PROPOSED SPECIFICATIONS OF FMLIn this section, we propose several elements of FML based

on our current provisional use of FML-inspired constructs.

3.1 GazeAs described in the previous section, the key principle in

our model of gaze is that it should reflect the agent’s innerprocessing. In line with this, our current and proposed spec-ification of the <gaze> element in FML includes the reasonof the gaze command in fine-grained detail along with thetarget and type of gaze (see Table 3). This allows di!erentbehavior planners to represent the same communicative in-tent with varying expressivity depending on the capabilityof the virtual human system (e.g. full human embodimentvs. simplified character with only a head figure).

A second alternative is to back away from the commitmentthat the link from cognitive processes to behavior planner iscaptured solely in FML and the link from behavior plannerto realizer is captured solely in BML. Rather, various mod-ules along a path (or paths) may be allowed to add FMLor BML elements. This allows for considerable flexibility inhow modules are realized but may also impact the sharingof modules across research e!orts.

Finally, we could go even further towards a functionalspecification. FML may want to avoid even calling this el-ement ‘gaze’. Perhaps ‘attention’? However that also doesnot quite capture the range of functions that is performedby di!erent gaze types. That range might be best expressedby the general categories in Table 1: Conversation Regula-tion, Update Internal Cognitive State, Monitor, and CopingStrategy. In this view, the FML element would be one ofthose categories, with the Reason being a further specializa-tion of that element. We believe this view is most consistentwith the goals of specifying FML.

3.2 EmotionOur proposal for representing emotion in FML is to have

alternative ways to express the agent’s a!ect. These alter-native ways would be tied to the underlying class of emo-tion model used by a system. For instance, we suggest anFML structure that allows the system to either represent theemotion labels (categories) or the more detailed appraisal di-mensions. Table 4 gives the suggested structure of two FMLelements for this purpose. Here are examples of both cases:

1. Representing emotional label:<a!ect type=”joy” intensity=”1.0” target=”captain-kirk”/>

2. Representing appraisal dimensions:<a!ect type=”appraisals” target=”captain-kirk”/>

<appraisal type=”desirability” value=”0.2” />

<appraisal type=”controllability” value=”0.5” />...

</a!ect>

In the latter case, if the value of the a!ective type is ‘ap-praisals’, the type, target, and stance of the emotion shouldstill be specified. But we propose the <a!ect> element tohave an arbitrary number of <appraisal> elements embed-ded to represent the di!erent appraisal variables and values.

Table 2: Coping strategies modeled in EMACoping Strategy Description

Attention RelatedSeek Information Form a positive intention to monitor pending unexpected, or uncertain

state that produced the appraisal values.Suppress Information Form a negative intention to monitor the pending, unexpected or un-

certain state that produced the appraisal values.Belief Related

Shift Responsibility Shift a causal attribution of blame/credit from/towards self and to-wards/from other agent.

Wishful Thinking Increase/lower probability of a pending desirable/undesirable outcomeor assume some intervening act/actor will improve desirability.Desire Related

Distance/Mental Disengagement Lower utility attributed to a desired, but threatened state.Positive Reinterpretation / SilverLining

Increase utility of positive side-e!ect of some action with a negativeoutcome.Intention Related

Planning / Action Selection Form an intention to perform some external action that improves anappraised negative outcome.

Seek Instrumental Support Form an intention to get some other agent to perform an external actionthat changes the agent-environment relationship.

Make Amends Form an intention to redress a wrong.Procrastination Defer an intention to some time in the future.Resignation Abandon an intention to achieve a desired state.Avoidance Take action that attempts to remove agent from a looming threat.

Table 3: Proposed structure of <gaze> element in FMLElement: <gaze>

gaze-type A symbol describing the type of gaze at the target (e.g. avert, cursory, look, focus, weak-focus).

target The name of an object that the agent is gazing at or shifting gaze to, or averting in the caseof gaze aversion.

priority A symbol describing the priority of the cognitive operation that triggered this gaze command.reason A detailed rationale behind why we are doing the gaze (currently represented as a token).

Table 4: Proposed structure of <a!ect> element in FMLElement: <a!ect>

type Indicates the category of a!ect (joy, anger, fear, ...) or whether the a!ect will be representedby appraisal dimension (appraisals).

target Person who is possibly being targeted by the resulting a!ective behavior.stance Whether the emotion is intentionally given o! or involuntarily leaked (intended, leaked).intensity The intensity of emotion.

Element: <appraisal>

type A single appraisal variable (desirability, controllability, ...).value The intensity of the appraisal variable.

As discussed above, researchers have developed a numberof theories of emotions, each varying in how they model thedynamics of emotional processes. Here we have suggestedtwo ways to represent emotion from two emotion theories,namely the categorial theory of emotion and appraisal the-ory. The expressivity to represent not only the emotionlabels but also the appraisal variables allows the behaviorplanner to draw on a deeper understanding of the impactan event has for an agent and to generate behaviors accord-ingly. However, to employ models of other emotion theo-ries, more discussion is needed about how to represent theproperties of those models. In particular, we should also con-sider incorporating dimensional models such Mehrabian andRussell’s PAD (Pleasure-Arousal-Dominance) model [12] ormore recent work related to such dimensional models (e.g.,Core A!ect). Finally, we should also explore the emotionannotation schemes being developed by other consortiumssuch as the HUMAINE work [15].

3.3 Language Generation and FMLAs discussed in Section 2.1, we currently use a system-

specific representation scheme to formulate NLG requestsand responses. We have not attempted to transform thisscheme into an FML representation that might be used acrossdi!erent systems. In this section, we discuss some of thechallenges we believe would be associated with standardiz-ing a messaging protocol for NLG across systems.

In general, our perspective is that if NLG is to be as-similated into the SAIBA framework, it should be viewedas part of behavior planning rather than intent planning.This is because, first, at a conceptual level, language useis planned behavior. Indeed, NLG systems typically frametheir language generation problem as one of planning a lin-guistic output that accomplishes an incoming communica-tive intention or communicative goal [13]. Second, in manysystems, there may be advantages in terms of naturalnessand e"ciency of communication that come with planningverbal and non-verbal behavior simultaneously, as in, forexample, [2].

Let us consider, then, what the implications for the stan-dardization of FML would be if NLG were to be generallysituated within the behavior planning stage of the SAIBAframework. In the canonical NLG pipeline [13], an NLGalgorithm is internally divided into three successive stages:document planning, microplanning, and realization. Docu-ment planning is the process of deciding what informationshould be communicated, while microplanning and realiza-tion plan an output text that achieves this communicativegoal. An intuitive approach would therefore locate docu-ment planning within the intent planning stage of SAIBA,and locate microplanning and realization within the behav-ior planning stage.

To understand the implications for FML, we need to lookat the typical inputs needed by microplanners and realiz-ers. While the division of labor between microplanning andrealization, and the interface between them, varies consid-erably between systems [13], we may generally observe thatboth processes depend on relatively rich input specificationsto achieve high quality output. For example, one subtaskthat microplanners typically solve is the generation of refer-ring expressions (GRE) for particular objects or individualsthat are implicated in the communicative goal. In general,GRE requires as input a ranking of the relative salience of

various objects and properties in the non-linguistic context,as well as the dialogue/discourse history, so that an appro-priate level of detail can be selected for the referent of theexpression (e.g., the choice of a pronoun versus a complexdefinite noun phrase); see, e.g., [3, 16].

More generally, the fact that microplanning and realiza-tion involve fine-grained lexical choices can add additionalinput requirements. For example, the SPUD microplanner[16] requires as input the communicative goal (expressedas a set of logical formulas), a grammar, and a represen-tation of the current context (including elements of dia-logue/discourse history as well as non-linguistic context).Because SPUD expects the communicative goal to be ex-pressed using logical formulas, it would not be trivial totranslate a virtual human generation request such as thosein Figures 2 and 4 into a communicative goal for SPUD. Fur-ther, the input context representation needs to extend downto the granularity of lexical semantics in the language to begenerated. One way of providing this information to SPUDis to provide a knowledge interface, as in [4]. The knowledgeinterface allows SPUD to interactively query for salience in-formation and to evaluate semantic constraints associatedwith alternative lexical choices in the current context. Thiscreates another question about how to provide, within theSAIBA framework, an NLG module with all the resources itpotentially needs. It would seem that an FML-ized gener-ation request would either need to carry a quite exhaustivedescription of context, or else the generator would need tobe provided with some mechanism by which upstream mod-ules can be interactively queried for additional informationas needed.

Another challenge is that di!erent realizers can also ex-pect di!erent input formats. For example, the FUF realizer[6] requires as input a functional description, which is a hi-erarchical set of attribute-value pairs that partially specifythe lexico-syntactic structure of the output utterance. TheOpenCCG realizer [21] requires as input the logical form ofthe utterance to be realized, expressed (in XML) as a se-mantic dependency graph or (equivalently) in a hybrid logicdependency semantics formalism. Typically, for a given real-izer, a paired microplanner draws on a lexicon and/or gram-mar, as well as various domain-specific rules and context in-formation, to automatically translate a communicative goalinto the appropriate inputs to the realizer. The challengefor FML is that the particular representation scheme thatis chosen for FML should aim to remain compatible with,and easily converted into, the particular input formats andinternal pipelines assumed by such di!erent NLG compo-nents. We do not immediately see how to achieve this goal,especially given the widely varying approaches to NLG thatare currently being explored. However, this is an area wheredetailed discussion between researchers might yield an op-erational interim approach.

4. CONCLUSIONIn this paper we have presented our implementation of

multimodal generation capabilities in the ICT virtual hu-man architecture. We have drawn on our experience withthis architecture to present our perspective on the standard-ization of FML elements for generating eye gaze, emotionaldisplays, and natural language. While our conclusions havegenerally been tentative, we hope to have achieved our aimof furthering the ongoing discussion of FML and the SAIBA

framework as a useful approach to multimodal generationacross diverse conversational agents.

5. ACKNOWLEDGMENTSThis work was sponsored by the U.S. Army Research,

Development, and Engineering Command (RDECOM), andthe content does not necessarily reflect the position or thepolicy of the Government, and no o"cial endorsement shouldbe inferred.

6. REFERENCES[1] J. Allwood. Bodily communication - dimensions of

expression and content. In B. Granstrom, D. House,and I. Karlsson, editors, Multimodality in Languageand Speech Systems, pages 7–26. Kluwer AcademicPublishers.

[2] J. Cassell, M. Stone, and H. Yan. Coordination andcontext-dependence in the generation of embodiedconversation. In Proceedings of INLG, 2000.

[3] R. Dale and E. Reiter. Computational interpretationsof the gricean maxims in the generation of referringexpressions. Cognitive Science, 19(2):233–263, 1995.

[4] D. DeVault, C. Rich, and C. L. Sidner. Naturallanguage generation and discourse context:Computing distractor sets from the focus stack. InProceedings of the 17th International Florida ArtificialIntelligence Research Society Conference (FLAIRS2004), pages 887–892, 2004.

[5] P. Ekman and W. Friesen. The Facial Action CodingSystem (FACS): A technique for the measurement offacial action. Consulting Psychologists Press, PaloAlto, CA, USA, 1978.

[6] M. Elhadad. FUF: the universal unifier user manualversion 5.0. Technical Report CUCS-038-91, 1991.

[7] J. Gratch and S. Marsella. A domain-independentframework for modeling emotion. Cognitive SystemsResearch, 5(4):269–306, 2004.

[8] S. Kopp, B. Krenn, S. Marsella, A. N. Marshall,C. Pelachaud, H. Pirker, K. R. Thorisson, and H. H.Vilhjalmsson. Towards a common framework formultimodal generation: The behavior markuplanguage. In IVA, pages 205–217, 2006.

[9] R. Lazarus. Emotion and Adaptation. OxfordUniversity Press, New York, NY, USA, 2000.

[10] J. Lee and S. Marsella. Nonverbal behavior generatorfor embodied conversational agents. In Proceedings ofthe 5th International Conference on Intelligent VirtualAgents, 2006.

[11] J. Lee, S. Marsella, J. Gratch, and B. Lance. Therickel gaze model: A window on the mind of a virtualhuman. In Proceedings of the 6th InternationalConference on Intelligent Virtual Agents, 2007.

[12] A. Mehrabian and J. A. Russell. An approach toenvironmental psychology. MIT Press, Cambridge,MA, USA; London, UK, 1974.

[13] E. Reiter and R. Dale. Building Natural LanguageGeneration Systems. Cambridge University Press, NewYork, NY, USA, 2000.

[14] J. Rickel, S. Marsella, J. Gratch, R. Hill, D. Traum,and W. Swartout. Toward a new generation of virtualhumans for interactive experiences. IEEE IntelligentSystems, 17:32–38, 2002.

[15] M. Schroder, L. Devillers, K. Karpouzis, J.-C. Martin,C. Pelachaud, C. Peter, H. Pirker, B. Schuller, J. Tao,and I. Wilson. What should a generic emotion markuplanguage be able to represent? In Proc. 2ndInternational Conference on A!ective Computing andIntelligent Interaction (ACII), pages 440–451, 2007.

[16] M. Stone, C. Doran, B. Webber, T. Bleam, andM. Palmer. Microplanning with communicativeintentions: the spud system. ComputationalIntelligence, 19(4):314–381, 2003.

[17] W. R. Swartout, J. Gratch, R. W. Hill, E. H. Hovy,S. Marsella, J. Rickel, and D. R. Traum. Towardvirtual humans. AI Magazine, 27(2):96–108, 2006.

[18] M. Thiebaux, A. Marshall, S. Marsella, andM. Kallmann. Smartbody: Behavior realization forembodied conversational agents. In Proceedings of the7th International Conference on Autonomous Agentsand Multiagent Systems, To appear.

[19] D. Traum, M. Fleischman, and E. Hovy. Nl generationfor virtual humans in a complex social environment.In Working Notes AAAI Spring Symposium onNatural Language Generation in Spoken and WrittenDialogue, March 2003.

[20] D. Traum, W. Swartout, S. Marsella, and J. Gratch.Virtual humans for non-team interaction training. InIn proceedings of the AAMAS Workshop on CreatingBonds with Embodied Conversational Agents, July2005.

[21] M. White, R. Rajkumar, and S. Martin. Towardsbroad coverage surface realization with ccg. In Proc.of the Workshop on Using Corpora for NLG:Language Generation and Machine Translation(UCNLG+MT), 2007.

The FML - APML language

Maurizio ManciniUniversity of Paris 8

140 rue de la Nouvelle France93100, Montreuil, France

[email protected]

Catherine PelachaudUniversity of Paris 8, INRIA

INRIA Rocquencourt, MiragesBP 105, 78153 Le Chesnay Cedex, France

[email protected]

1. INTRODUCTIONIn this paper we present a new version of the APML (Af-

fective Presentation Markup Language, [6]) representationlanguage, called FML-APML. This new version encompassesthe tags of APML as well as other tags related, for exam-ple, to world references and emotional state. The presentedlanguage has been developed in the Greta framework [12].Greta is an ECA (Embodied Conversational Agent) thatstarting from a representation of its communicative inten-tion, plans the verbal (speech) and nonverbal signals (fa-cial expressions, head movements, gestures) that must beproduced in order to convey it. We use the FML-APMLlanguage to model the agent’s communicative intention.

2. RELATED WORK: APMLAPML is an XML-based markup language for represent-

ing the agent’s communicative intention and the text to beuttered by the agent [6]. APML tags refer to the possibleinformation a person may seek to communicate: informationon the world, on the speaker’s mind and on the the speaker’sidentity. Based on the Poggi’s work [13], the APML lan-guage encodes the first and second types of information inECAs [6]. In APML, each tag corresponds to one of thecommunicative intentions described in [13], namely:

• certainty: this is used to specify the degree of certaintythe agent intends to express.

Possible values: certain, uncertain, certainly not, doubt.

• meta-cognitive: this is used to communicate the sourceof the agent’s beliefs.

Possible values: planning, thinking, remembering.

• performative: this represents the agent’s performative[1][14].

Possible values: implore, order, suggest, propose, warn,approve, praise, recognize, disagree, agree, criticize,accept, advice, confirm, incite, refuse, question, ask,inform, request, announce, beg, greet.

• theme/rheme: these represent the topic/comment ofconversation; that is, respectively, the part of the dis-course which is already known or new for the conver-sation’s participants.

• belief-relation: this corresponds to the metadiscour-sive goal, that is, the goal of stating the relationshipbetween di!erent parts of the discourse; it can be used

to indicate contradiction between two concepts or acause-e!ect link.

Possible values: gen-spec, cause-e!ect, solutionhood,suggestion, modifier, justification, contrast.

• turnallocation: this models the agent’s metaconversa-tional goals, that is, the agent’s intention to take orgive the conversation floor.

Possible values: take, give.

• a!ect : this represents the agent’s emotional state. Emo-tion labels are taken from the OCC model of emotion.

Possible values: anger, disgust, joy, distress, fear, sad-ness, surprise, embarrassment, happy-for, gloating, re-sentment, relief, jealousy, envy, sorry-for, hope, satis-faction, fear-confirmed, disappointment, pride, shame,reproach, liking, disliking, gratitude, gratification, re-morse, love, hate.

• emphasis: this is used to emphasize (that is, to con-vey its importance) what the agent communicates ei-ther vocally (by adding pitch accents to the synthe-sized agent’s speech) or through body movements (byraising the eyebrows, producing beat gestures, etc.).

Possible values: low, medium, high.

3. FML-APML OVERVIEWIn the SAIBA framework [8][17], the FML language en-

codes the agent’s communicative intentions. FML-APML isan evolution of APML and presents some similarities anddi!erences. The FML-APML tags are an extension of theones defined by APML, so all the communicative intentionsthat we can represent in APML are also present in FML-APML. We introduced the following changes in creatingFML-APML:

• Temporization of tags: APML tags have a nestingstructure imposed by the way in which the languageis defined. For example the top-level tag must alwaysbe a performative tag. The other tags, for examplethe one representing the agent’s certainty, must benested inside a performative:

<apml><performative type="inform"><rheme certainty="certain">I’m the Greta agent

</rheme>

</performative></apml>

The timing of these tags (i.e. the starting and ending ofa certain communicative intention) is inferred from theduration of text nested inside the tags. In the aboveexample, the performative, a!ect and certainty com-municative intentions have the same starting and du-ration time. It is not possible, for example, to extendthe three communicative intentions for a time slightlylonger than the spoken text.

In FML-APML each tag contains explicit timing data,similarly to BML tags. We also maintain coherencebetween the two languages defined inside the SAIBAframework. So, in FML-APML we can freely definethe starting and ending time of each tag, or make tagsreferring to each other using symbolic labels. This alsoallows us to specify tags that are not linked to any spo-ken text. That is, with FML-APML we can define thecommunicative intention of non-speaking agents: forexample we can represent the listener’s communica-tive intention (e.g. the listener can have the intentionto communicate that it is approving what the speakersays).

• Emotional state: we have extended the way in whichthe agent’s emotional state is coded. In the APML rep-resentation, we can only specify the actually expressedemotion. In FML-APML we can model more complexsituations, for example, if the speaker is feeling a cer-tain emotion but he hides it by showing another, fake,emotional state [10]. We base our extension on EARL[15].

• Information on the world: when communicating withothers, we could have the intention of communicatingsome physical or abstract properties of objects, per-sons, events. For example, we can accompany speechwith hand shapes that mimic the shape of an object,or perform large arm movements to give the idea of an“amazing” event. APML syntax allowed one to specifyonly some of these kinds of intentions, sometimes in atoo generic way. In APML the signal information waserroneously considered instead of the communicativeintention: for example, the deictic tag could be usedto explicitly perform deictic gestures. In FML-APML,we can specify that the agent is referring to an en-tity in the world, and eventually one of its properties.We leave the behavior planning system with the taskof deciding if, to refer to this entity, the agent has toperform a deictic gesture, mimic its property, etc.

In the next Sections we give an overview of the FML-APML syntax: we present the FML-APML tags; then wedescribe the tags’ attributes and temporization.

4. FML-APML TAGS: COMMON ATTRIBUTESAND SYNCHRONIZATION

FML-APML tags are used to model the agent’s commu-nicative intention. Each tag represents a communicative in-tention (to inform about something, to refer to a place, an

object or a person, to express an emotional state, etc.) thatlasts from a certain starting time, for a certain number ofseconds. The attributes common to all the FML-APML tagsare:

• name: the name of the tag, representing the commu-nicative intention modeled by the tag. For example,the name performative represents a performative com-municative intention [7].

• id : a unique identifier associated to the tag; it allowsone to refer to it in an unambiguous way.

• type: this attribute allows us to better specify the com-municative meaning of the tag. For example, a per-formative tag has many possible values for the typeattribute: implore, order, suggest, propose, warn, ap-prove, praise, etc.. Depending on both the tag name(performative) and type (one of the above values), ourBehavior Planning module determines the nonverbalbehaviors the agent has to perform.

• start : starting time of the tag, in seconds. Can beabsolute (time 0 corresponds to the start of the FML-APML file) or relative to another tag. It represents thepoint in time at which the communicative intentionmodeled by the tag begins.

• end : duration of the tag. Can be a numeric value (inseconds) relative to the beginning of the tag or a refer-ence to the beginning or end of another tag (or a math-ematical expression involving them). It represents theduration of the communicative intention modeled bythe tag.

• importance: a value between 0 and 1 which determinesthe probability that the communicative intention en-coded by the tag is communicated through nonverbalbehavior. We describe this attribute in detail in Sec-tion 5. It also modulates the number of modalitieson which the communication happens, as explained inSection 5.

The timing attributes start and end also allow us to modelthe synchronization of the FML-APML tags. They both canassume absolute or relative values. In the first case, the at-tributes are numeric non-negative values, considering time 0as the beginning of the FML-APML file. In the second casewe can specify the starting or ending time of other tags,or a mathematical operation involving them. Note that theoptional end attribute allows us to define communicative in-tentions that start at a certain point in time and last untilnew communicative intentions are defined. Here is an ex-ample of absolute and relative timings.

<FML-APML><tag1 id="id1" start="0" end="2"/><tag2 id="id2" start="2" end="3"/>

</FML-APML>

In the above FML-APML code, tag1 starts at time 0 andlasts 2 seconds; tag2 starts at time 2, and lasts 3 seconds.All the timings are absolute, that is, they are both relativeonly to the beginning of the actual FML-AMPL file (equiv-alent to time 0).

<FML-APML><tag3 id="id3" start="0" end="2"/><tag4 id="id4" start="t1:end+1"end="t1:end+3"/>

</FML-APML>

In this case, the first tag is the same as before. On the otherhand, tag2 has a relative timing as it starts as the first tagends and lasts for 3 seconds.FML-APML tags can be attached and synchronized to thetext spoken by the agent. This is modeled by including aspecial tag, called speech, in the FML-APML syntax. Withinthis tag, we write the text to be spoken along with synchro-nization points (called time markers) which can be referredto by the other FML-APML tags in the same file. For ex-ample:

<FML-APML><speech id="s1"><tm id="tm1"/>what are you

<tm id="tm2"/>doing

<tm id="tm3"/>here

<tm id="tm4"/></speech>

<tag3 id="id3" start="s1:tm2" end="s1:tm4"/></FML-APML>

With the above code, we specify that the communicativeintention of tag3 starts in correspondence with the worddoing and ends at the end of the word here.

5. FML-APML IMPORTANCE ATTRIBUTEWe say that a message is important if it has a particular

relevance to the Sender’s goals: if a message is importantwe want to be sure that it is delivered to the receiver. Thesame situation occurs with communicative intentions.

Not all the communicative intentions we communicate toothers have the same level of importance. Poggi et al. [14]note that, in the domain of goals (not necessarily commu-nicative goals) di!erent people may attribute a di!erent im-portance to the same goal. For example, generous people at-tribute high importance to the goal of being helpful towardothers; an independent person attributes high importance

to the goal of making choices freely and without the others’help. De Carolis et al. [3] show that in nonverbal discourseplanning the association of nonverbal signs to verbal infor-mation can be done by giving goals a priority. The conceptof urgency defined by Castelfranchi [5] seems to be relatedto importance: it is possible to sort the agent’s goals de-pending on their urgency, and choose to display those goalswhich have a higher urgency value. Importance is also citedby Theune [16]. She claims that gesture frequency has tobe increased if the speaker attaches a high importance tothe message being communicated. For their conversationalagents, Cassell et al. [4] choose to activate many modalitiesat the same time if the information importance is high. Forexample, information which is new or in contrast with re-spect to what has already been said is considered as having ahigher priority and thus more modalities are activated. Theimportance of body actions is also referred to by Nayak etal. in [9]. In this case the importance level is directly trans-lated into the priority of the corresponding body action, andhigher priority actions are chosen first during behavior gen-eration, while lower priority actions are discarded in case ofconflict.

In FML-APML we introduce an attribute, common toeach tag, called importance. Depending on its value, theagent may change the way the corresponding communica-tive intention is encoded. Similarly to the works proposedabove, in our system the FML-APML importance attributeallows us to sort the concurrent agent’s communicative in-tentions giving them a higher (resp. lower) priority if theirimportance is high (resp. low). We ensure that more im-portant communicative intentions are communicated first bythe agent while the least important intentions are eventuallycommunicated by free communication modalities. Then, weuse the same importance parameter to choose the multiplic-ity of multimodal behaviors. As the importance raises, weincrement the number of modalities on which the agent’sintentions are communicated. If, for example, importanceis low and the agent is giving the user directions to reacha particular place in the environment, it produces only aniconic gesture. If importance is very high, it adds redun-dancy: the agent produces a deictic eye gesture (looking atthe target in space), rotating the torso towards this position,while performing an iconic gesture.

6. EMOTION TAGEmotion has a central role in communication and ECAs

should be able to communicate their emotional state in or-der to increase e!ectiveness of interaction with humans. Inthe FML-APML language we have introduced the emotiontag, which models the speaker’s felt and expressed emotionalstate. The former is the emotional state the speaker is reallyexperiencing (which can be caused by an event, a person, asituation, etc.) while the latter is the one the speaker wantsto communicate to the others. These two emotional statescan be completely di!erent: for example, a person can pro-duce a “polite smile” to his superior even if he is angry athim. In general, people can show (expressed state is thefelt one), suppress (the felt state is expressed the less possi-ble), mask (the expressed state is di!erent from the felt one)their emotional state [11]. In FML-APML we model theserelations between felt and expressed emotional states by in-cluding the syntax of the EARL (Emotion Annotation andRepresentation Language) language, described in [15]. The

emotion tag allows us to specify complex emotional states,as reported in [2]. We can for example model situations inwhich our agent is feeling a particular emotional state butsimulates another emotion, hiding the felt one. This is doneby controlling the felt and expressed emotional states withthe regulation attribute of the emotion tag. The possiblevalues of the regulation attribute are:

• felt : this indicates that the tag refers to a felt emotion;

• fake: this indicates that the tag refers to a fake emo-tion, an emotion that the agent aims at simulating;

• inhibit : the emotion in the tag is felt by the agent butit aims at inhibiting it as much as possible;

Let us consider the following example:

<FML-APML><emotion id="e1" type="anger" regulation="felt"intensity="0.5" start="0" end="3"/><emotion id="e2" type="joy" regulation="fake"intensity="0.9" start="0" end="3"/>

</FML-APML>

The agent’s real emotional state is medium anger (the reg-ulation attribute of the emotion tag is set to felt ; intensityis 0.5, in a range going from 0 to 1) but it wants to hide itwith an intense fake happiness (the regulation attribute ofthe emotion tag is set to fake; intensity is 0.9).

7. WORLD TAGAs explained in [13], while communicating with others, we

seek to convey our knowledge about the world: objects andtheir characteristics (size, shape, location, etc.), events (realor abstract), places (relation, distance, etc.). Compared toAPML, the FML-APML language introduces a world tag toindicate such kind of communicative intention. The tag hasthe following attributes:

• ref type: the first attribute identifies the class of thereferenced world entity: an object, a place, a time, anevent. This attribute is required.

• ref id : is an identifier that we can use to specify oneor more world entities. This attribute is required.

• prop type (optional feature): allows us to refer to aproperty of the referenced entity: its shape, locationor duration.

• prop value (optional feature): describes the value ofthe property specified with the previous attribute.

So, in FML-APML we can refer to an object in the world ina generic way, for example, if we want to refer to a book:

<FML-APML><world id="w1" ref_type="object" ref_id="book"/>

</FML-APML>

Or, we can refer to the book which is on the table:

<FML-APML><world id="w1" ref_type="object" ref_id="book"prop_type="location" prop_value="table"/>

</FML-APML>

8. CONCLUSIONSIn this paper we present FML-APML, a language which

is used to model the communicative intention of an ECA. Itis an extension of the previously developed APML languageand it improves some of the APML weakness and missingfeatures. We propose FML-APML as an implementation ofthe FML language of the SAIBA framework.

9. REFERENCES[1] J. L. Austin. How to Do Things with Words. The

William James Lectures at Harvard University 1955.Oxford University Press, London, 1962.

[2] E. Bevacqua, M. Mancini, and R. Niewiadomski. Anexpressive eca showing complex emotions. In ArtificialIntelligence and the Simulation of Behaviour:Artificial and Ambient Intelligence, Newcastle,England, 2007.

[3] B. De Carolis, C. Pelachaud, and I. Poggi. Verbal andnonverbal discourse planning. In Workshop onAchieving Human-like Behaviors. Autonomous Agents,2000.

[4] J. Cassell and S. Prevost. Distribution of semanticfeatures across speech & gesture by humans andmachines. In Proceedings of the Integration of Gesturein Language and Speech., 1996.

[5] C. Castelfranchi. Reasons: Belief support and goaldynamics. Mathware & Soft Computing, 3:pp.233–247,1996.

[6] B. DeCarolis, C. Pelachaud, I. Poggi, andM. Steedman. APML, a mark-up language forbelievable behavior generation. In H. Prendinger andM. Ishizuka, editors, Life-Like Characters, CognitiveTechnologies, pages 65–86. Springer, 2004.

[7] S. Duncan. The dance of communication. InterimReports of the ZiF: Embodied Communication inHumans and Machines, 2006.

[8] S. Kopp, B. Krenn, S. Marsella, A. Marshall,C. Pelachaud, H. Pirker, K. Thorisson, andH. Vilhjalmsson. Towards a common framework formultimodal generation in ecas: the behavior markuplanguage. In Proceedings of the 6th InternationalConference on Intelligent Virtual Agents, 2006.

[9] V. Nayak. Emotional expressiveness through the bodylanguage of characters in interactive gameenvironments. PhD thesis, Media Arts and TechnologyUniversity of California, Santa Barbara, 2005.

[10] R. Niewiadomski and C. Pelachaud. Intelligentexpressions of emotions. In A!ective Computing andIntelligent Interaction, volume 4738 of Lecture Notesin Computer Science, pages 12–23. Springer, 2007.

[11] M. Ochs, R. Niewiadomski, C. Pelachaud, andD. Sadek. Intelligent expressions of emotions. InJ. Tao, T. Tan, and R. W. Picard, editors, A!ectiveComputing and Intelligent Interaction, FirstInternational Conference, volume 3784 of LectureNotes in Computer Science, pages 707–714. Springer,2005.

[12] C. Pelachaud. Multimodal expressive embodiedconversational agents. In MULTIMEDIA ’05:Proceedings of the 13th annual ACM internationalconference on Multimedia, pages 683–689, New York,NY, USA, 2005. ACM Press.

[13] I. Poggi. Mind, hands, face and body. A goal and beliefview of multimodal communication. Weidler, Berlin,2007.

[14] I. Poggi and C. Pelachaud. Performative facialexpressions in animated faces. In Embodiedconversational agents, pages 155–188. MIT Press,Cambridge, MA, USA, 2000.

[15] M. Schroder, H. Pirker, and M. Lamolle. Firstsuggestions for an emotion annotation andrepresentation language. In L. Devillers, J.-C. Martin,R. Cowie, E. Douglas-Cowie, and A. Batliner, editors,Proceedings of the International Conference onLanguage Resources and Evaluation: Workshop onCorpora for Research on Emotion and A!ect, pages88–92, Genova, Italy, 2006.

[16] M. Theune. ANGELICA: choice of output modality inan embodied agent. In International Workshop onInformation Presentation and Natural MultimodalDialogue, pages 89–93, Verona, Italy, 2001.

[17] H. Vilhjalmsson, N. Cantelmo, J. Cassell, N. E.Chafai, M. Kipp, S. Kopp, M. Mancini, S. Marsella,A. N. Marshall, C. Pelachaud, Z. Ruttkay, K. R.Thorisson, H. van Welbergen, and R. van der Werf.The behavior markup language: Recent developmentsand challenges. In 7th International Conference onIntelligent Virtual Agents, 2007.

!"#$%#"&'(%')(*+,'-.("'(#/,(!*01*(23%4,5&367(%')(

8&'9,:$,'-,9(;&3(2<=((((

(

!"#$%&'()**+&,'

HMI, University of Twente, The Netherlands

-".$%/0"1)*234*3145'

>(((0'#3&)$-#"&'(

673'89:;9'&0<.4,='"*&4>"'$.<'8%*)&*%.4?'9@34*?':4*34*%.4?';37&A%.<?'94%=&*%.4?'&"'

=&B.<' $&0*.<"'0.4*<%C)*%4@' *.' *73' $%4&5'C37&A%.<'.$'&'7)=&4.%>1'D4' *73'89:;9'23C'

E&@3?'&4>'%4'E)C5%0&*%.4"'"*3==%4@'$<.=''*73'%4%*%&*%A3?'*73'*2.'$%<"*'$&0*.<"'".=37.2''

&<3' 4.*' >3&5*'2%*7' $)<*73<?' !"#$%"& !"'& ()*#+!,-.'& #/& !"'& 0*'+.'*!(#-& 1##*2& ",3' C334'

"*&*3>' &5".' <3034*5,' ' FG?' HI1' :4' *73' 89:;9' $<&=32.<+?' *73' $.55.2%4@' *7<33' =&B.<'

E<.03""%4@'"*&@3"'&4>'0.<<3"E.4>%4@'=&B.<'=.>)53"'&<3'%>34*%$%3>'FJIK''

G1 L5&44%4@'.$'&'0.==)4%0&*%A3'%4*34*'

M1 L5&44%4@'.$'=)5*%=.>&5'C37&A%.<"'*7&*'0&<<,'.)*'*7%"'%4*34*'

N1 (3&5%-&*%.4'.$'*73'E5&443>'C37&A%.<"'

'

673<3' 7&"' C334' "*3E"' =&>3' *.2&<>"' "E30%$,%4@' *73' %4*3<$&03' ;OP' Q;37&A%.)<'

O&<+)E'P&4@)&@3R'C3*2334'"*&@3'M'&4>'N?'&4>'*73'*.E%0'.$'*7%"'E&E3<'Q&4>'2.<+"7.ER'

%"' *.'=&+3' *73' $%<"*' "*3E' *.' E<.E."3' &' "%=%5&<' 5&4@)&@3' C3*2334' "*&@3"' G' &4>' M1' :4'

0.4430*%.4'2%*7';OP'%*'7&"'*)<43>'.)*'C,'&'05."3<'5..+'*7&*'*73<3'&<3'"3A3<&5'0<%*%0&5'

%"")3"'27%07'%"'7&<>'*.'%40.<E.<&*3'%4*.'3%*73<'*73'C37&A%.)<'E5&44%4@'.<'*73'C37&A%.)<'

<3&5%-&*%.4'"*&@31'S.<' %4"*&403?'&0*)&5'07&<&0*3<%"*%0"'.$' *73'34A%<.4=34*' *73'&@34*' %"'

&0*%4@' %4' QA%"%C%5%*,?' 07&<&0*3<%"*%0' .$' *73' @<.)4>?' 5.0&*%.4' .$' =.A%4@' .CB30*"R' =&,'

7&A3'&4'%4$5)3403'.4'*73'>3*&%5"'.<'*73'07.%03'.$'*73'C37&A%.)<'*.'C3'<3&5%-3>1':*'7&"'

4''-& 3$%%'3!'5& !",!& /''54,.6& /+#)& !"'& 07#+152& (3& -''5'5& !#& 4'& ,41'& !#& *1&4' 03<*&%4'

C37&A%.)<"K' 31@1' %$' %*' %"' =%"*,?' )"3' C%@' 7&4>' 2&A3"' &4>' 5.)>' A.%03' Q"3530*%.4' .$'

=.>&5%*,?'&=E5%*)>3R'?'%$'*73'*3<<&%4'%"'A3<,'<.)@7?'.43'0&44.*''<)4'A3<,'$&"*'Q*%=%4@'.$'

C37&A%.)<R1' DCA%.)"5,?' %$' .43' 7&"' &' ".E7%"*%0&*3>' <3&5%-3<?' 0&E&C53' *.' *&+3' 0&<3' .$'

E7,"%0&5' C&5&40%4@' &4>' "%=)5&*%.4?' *73<3' %"' 4.' 433>' *.' "E30%$,' *73' C37&A%.)<' 2%*7'

<3"E30*'*.'*73'")<$&03'0.4>%*%.4"?'&"'*73'<3&5%-3<'2%55' *&+3'0&<3'.$'*7%"'&"E30*1'D4'*73'

.*73<'7&4>?''$.<'&'<3&5%-3<'2%*7.)*'")07'&'$3&*)<3?'*73'E5&443<'7&"'*.'E<.A%>3'&"'=)07'

>3*&%5'&C.)*'*73'C37&A%.)<'*.'C3'<3&5%-3>'&"'E.""%C53?'&4>'*73'<3&5%-3<'2%55'B)"*'5%*3<&55,'

<3&5%-3' *73' "E30%$%3>' C37&A%.)<1' ' T3403?' &"' "..4' &"' 23' >.' 4.*' "3**53' 27&*' *73'

<3"E.4"%C%5%*%3"'.$'*73'%4A.5A3>'*2.'=.>)53"'&<3?'23'.E34'*73'"E&03'8'&4>'433>'$.<'8'

>%$$3<34*' A&<%&4*"1' :4' *73' 0&"3' .$' ;OP?' *7%"' 53>' *.' *73' 4.*%.4' .$' 53A35' .$' >3*&%5'

"E30%$%0&*%.4"?' &4>' *73' >%$$3<34*%&*%.4' C3*2334' 0.<3' &4>' E<.E<%3*&<,' 5&4@)&@3'

353=34*"1'833'S%@)<3'G1'

'

''

'

2"+$3,(>1'The core BML specification of a behavior can be further refined through

greater levels of description, while namespaces can provide general extensions. From [6]. '

O.<3.A3<?'%*'2&"'4.*%03>'*7&*'*73'$33>C&0+'$<.='<3&5%-&*%.4'*.'C37&A%.)<'E5&44%4@'

%"' 3""34*%&5?' *.' +4.2' %$' *73' C37&A%.)<' 0.)5>' C3' <3&5%-3>' &"' E5&443>?' .<' 2%*7' ".=3'

=.>%$%0&*%.4'&*' &551' :*'2&"'&5".'4.*3>' *7&*' ".=3273<3'&4')E>&*3'.$' *73' $<&=3'.$' *73'

2.<5>'%"'*.'C3'*&+34'0&<3'.$1'S.<'%4"*&403?'%4'*73'S3&<'U.*'","*3=?'*73'3$$30*'.$'&'E)"7'

=&,' C3' *7&*' *73' .*73<' &@34*' 7&"' $&5534?' .<' *73' "35$' &@34*' @.*' 7)<*1'V734' >%"0)""%4@'

")07'%"")3"'<35&*3>'*.'C37&A%.)<"'&*'*73';OP'2.<+"7.E"?'&'$33>C&0+'=307&4%"='2&"'

35&C.<&*3>'.4?'&4>'&5".?'''*73'<.53'.$'%4*34*'E5&44%4@'2&"'<&%"3>1'

V3'C35%3A3'*7&*'%*'%"'4303""&<,'*.'%>34*%$,'*73'0&"*'.$'<.53"'.$'=.>)53"'&4>'$&0*.<"'&*'

*73'A3<,'C3@%44%4@'.$'*73'2.<+'"*&<*%4@'.4'SOP'>3"%@41':4'*7%"'"7.<*'E&E3<'23'2.)5>'

1(6'&!#&*#(-!&#$!&/,.!#+3&#/&!"'&9(!$,!(#-&,-5&!"'&,.!(-%&:%'-!23&.",+,.!'+(3!(.3&7"(."&

>.' 7&A3' &4' %4$5)3403' .4' *73' C37&A%.)<1' 673' =&B.<' W)3"*%.4' %"?' 7.2' *.' %40.<E.<&*3'

*73"3'$&0*.<"'%4*.'*73'89:;9'$<&=32.<+1'L&<*%0)5&<5,?'7.2'*73,'"7.)5>'C3'>%"*<%C)*3>'

&=.4@'*73'=&B.<'E<.03""%4@'=.>)53"X'9"'&'0.4"3W)3403?'27&*'&<3'*73'<3W)%<3=34*"'

E."3>'$.<'*73'SOP'5&4@)&@3X''

V3' &>><3""' *73' 8%*)&*%.4' &4>'9@34*' <35&*3>' %"")3"' %4' *73' 0.=%4@' M' 07&E*3<"1' :4'

Y7&E*3<'Z?'23'@%A3'0.40<3*3'3[&=E53"?'2%*7'*73'%4*34*%.4'*7&*'*73,'2%55'"3<A3'&"'0&"3"'

.$'>%"0)""%.41''S%4&55,'23'")=')E'W)3"*%.4"'&4>'<30.==34>&*%.4"'<35&*3>'*.'*73'SOP'

"E30%$%0&*%.41'

?((@/,(*+,'#((

:4'0.4430*%.4'2%*7'E5&44%4@'0.==)4%0&*%A3'$)40*%.4'&4>'C37&A%.)<?'*73'$.55.2%4@'

07&<&0*3<%"*%0"'.$'*73'&0*%4@'&@34*'&<3'<353A&4*K'

! A,3-,B#"&'(-%B%C"D"#",9E(;,-&!"'&"$),-#(5&03''2<& 0"',+2<&),=&4'&3'-3'&

73&*X''S<.='*73'E.%4*'.$'A%32'.$'&'7%@75,'=.>)5&<'&<07%*30*)<3?''&4>'$<.='

*73' 3$$%0%340,' .$' %=E53=34*&*%.4?' %*' %"' <353A&4*' 7.2' *73' E3<03E*%.4' %"'

<3&5%"3>K'31@1'%$'"34"%4@'C,'A%"%.4'%"'"%=)5&*3>'A%"%.4'5%+3'%4'FZI?'.<'%$'*73'

03,7-&*#+!(#-2&#/&!"'&7#+15&!"'&,%'-!&(3&,.!(-%&(-&(3&5'+(>'5&4=&#!"'+<&-#-&

A%"%.4\C&"3>'=3&4"1''

! F1&)"D.G(%-#"&'(-%B%C"D"#",9E(V7&*'&<3'*73'*<&>%*%.4&5'.)*E)*'=.>&5%*%3"'

.$' *73' 7)=&4.%>K' 0&4' 73' *&5+?' =.A3' &<.)4>X' V7&*' .*73<' =3&4"' Q.$'

5.0.=.*%.4?'&)@=34*3>'0.==)4%0&*%.4R'>.3"'*73'&@34*'7&A3X'V7&*'%"'%*"'

E7,"%0&5'"*&*3?'27%07'=&,'%4$5)3403'*73"3'C.>%5,'0&E&C%5%*%3"X']1@1'7&4>"'

$)55?'.<'3[7&)"*3>X'

! H'&5D,)+,( &;( #/,( 5&3D)E( ?"'& -'.'33(!=& !#& ",>'& 3#)'& 07#+15&

6-#71'5%'2&/#+&,%'-.=&(3&:@&.#))#-&3'-3'A&B#+&!"'&3,6'&#/&3#.(,1&,-5&

0.==)4%0&*%A3' C37&A%.)<?' ".=3' +4.253>@3' .$' *73' E7,"%0&5'

07&<&0*3<%"*%0"' &4>' ".0%&5' E<.*.0.5"' .$' 0.==)4%0&*%.4' &<3' 433>3>1' ]1@1'

$<.='27&*'&'>%"*&403'0&4'.43'C3'73&<>?'27&*'%"'*73'>%"*&403'*.'"*&4>'%4'

$<.4*'.$'&'E3<".4'*.'07&*'2%*7'7%=?'27&*'%"'".0%&5'"*&*)"'.$'*73'E3<".4'*.'

C3'@<33*3>?'3*01'

! !"#$%&'( )*#$%)%+,( 673' %>34*%*,' .$' *73' &@34*?' %4' *3<="' .$' &@3?' @34>3<?'

".0%&5'"*&*)"?'E3<".4&5%*,^'C)*'&5".'07&4@%4@'$&0*.<"'&"'=..>?'E7,"%0&5'.<'

=34*&5' "*&*3?' &"' 2355' &"' &' E3<".4&5' 7%"*.<,' &4>' ' E&"*' &$$&%<"' 2%*7' &4>'

+'1,!(#-3"(*&!#&&!"'&(-!'+1#.$!#+&,11&0.,-&4'&3''-2&(-&7",!&,-5&"#7&*'#*1'&

&<3' >.%4@?' 2734' 0.==)4%0&*%4@' 2%*7' 3&07' .*73<1' T3403' *73"3' $&0*.<"'

=)"*'C3'&A&%5&C53'*.'*)43'*73'%4*34*'&4>'C37&A%.)<'.$'7)=&4.%>"1'

'

I(@/,(!"#$%#"&'(

673'7)=&4.%>' %"' 0.==)4%0&*%.4'2%*7'7%"' <3&5' .<' A%<*)&5' %4*3<5.0)*.<' %4' &' <3&5' .<'

A%<*)&5'2.<5>'"%*)&*%.41'673'2.<5>'"%*)&*%.4'=&,'C3'07&<&0*3<%"3>'&"'&'*,E3'Q31@1'.E34'

E)C5%0' "E&03?' ' ' +'3!,$+,-!<& & #//(.'<&CDA& 9#)'& %'-'+,1& 6-#71'5%'& ,4#$!& ,.!#+3& ,-5&

&0*%A%*%3"' %4' *73"3' "%*)&*%.4"'=&,'C3')"3>?'C)*'0.40<3*3'E&<&=3*3<"' Q31@1' 5.0&*%.4R'.$'

"E30%$%0'E&<*%0%E&4*"'=&,'C3'.$'%4*3<3"*'*..1'''673'$.55.2%4@'&"E30*"'.$'*73'"%*)&*%.4'&<3'

<353A&4*K'

! J"9"C"D"#.(%')(%$)"C"D"#.(C,#5,,'(#/,(/$4%'&")(%')(#/,("',#3D&-$#&3((

V73<3' %"' *73' &>><3""' &4>' *.E%0'.$' ' 0.==)4%0&*%.4X' ' :"' 73_"73' A%"%C53X''

V7&*' &<3' *73' .C"*&053"' .$' *73' 2.<5>X' T.2' %"' *73' *%=3' .$' *73' >&,?'

5%@7*%4@X'T.2'%"'4.%"3X':"'&'<3$3<<3>'.CB30*'A%"%C53'$.<'C.*7'*73'7)=&4.%>'

&4>'7%"'&>><3""33X'

! K#/,3( B/.9"-%D( -"3-$49#%'-,9( &;( #/,( 9"#$%#"&'( ( 673' @<.)4>' *73'

7)=&4.%>' %"' *.' 4&A%@&*3' .4?' 7.2' 0<.2>3>' *73' 34A%<.4=34*' %"' *.'=.A3'

,+#$-5<&"#7&04$3=2&E!",!&(3<&5=-,)(.,11=&.",-%(-%D&!73'34A%<.4=34*'%"X''

! @/,( 9&-"%D( %9B,-#9( &;( #/,( 9"#$%#"&'( V7&*' %"' *73' 0)<<34*' 5.0&*%.4'

Q.$$%0%&5\E<%A&*3?'05."3>'8'.E34R?'27&*'%"'*73'"*&@3'.$'*73'3A34*'@.%4@'.4'

E'A%A& 3#)'4#5=& *+'3'-!(-%<& ,& .'+')#-=& %#(-%& #-<& CD?' 27&*' %"' *73'

E3<".4&5'&4>'$.<=&5'<35&*%.4"7%E'C3*2334'*73'0.4A3<"&4*"X''

L(8%9,(9#$)",9(&;(+3,,#"'+(

;35.2'23'*&+3'".=3'3[&=E53"'.$'@<33*%4@"')4>3<'*73'5..E?'&4>'%>34*%$,'7.2'".=3'

.$'*73''&$.<3=34*%.43>'$&0*.<"'%4$5)3403'*73'%4*34*'&4>_.<'C37&A%.)<'E5&44%4@'"*&@3"1'

'

:4'.)<'3[&=E53'23'&"")=3'&'A%<*)&5'<3"*&)<&4*'2%*7'&'2&%*3<'QVR'&4>'@)3"*"'Q`G?'

FG<&CD1'D)<'7)=&4.%>'T'.$'%4*3<3"*' %"'&'E<.$3"".<?'&4>'73' %"'"%**%4@'%4'7%"'$&A.)<%*3'

<3"*&)<&4*1'T3'7&"'&4'&EE.%4*=34*'2%*7'&'$<%34>'S?'73'%"'0730+%4@'%$'S'%"'&5<3&>,'*73<3?'

C)*' 0&44.*' "33'7%=1'T3' 4.*%03"' *7.)@7' *7&*' .43'.$' *73'@)3"*"?'`G' %"' &' 0.553&@)3' 73'

433>"'*.'*&5+'*.1'`G'%"'"%**%4@'2%*7'7%"'C&0+'*.'T?'34@&@3>'%4'>33E'0.4A3<"&*%.4'2%*7'

*73'.*73<'E3<".4'`M'&*'7%"'*&C531'T'>30%>3"'*.'2&%*'$.<'*73'<%@7*'=.=34*'*.'&EE<.&07'

`G1''O3&427%53'".=3'432'E3.E53'`N''&4>'`Z'&<<%A31'`N'%"'&4'3[\0.553&@)3''27.='T'

>.3"' 4.*' 5%+3?' `Z' %"' &' "*)>34*' 73' +4.2"' C,' *73' $&031' 673' 2&%*3<' V?' &' @..>'

&0W)&%4*&403'.$'T?'"7.2"')E'*.'*&+3'7%"'.<>3<1'''

'

:4'*7%"'"0343?'23'3[E30*'*73'$.55.2%4@'@<33*%4@"'*.'*&+3'E5&03?'%4'&4&5.@,'*.'&'<3&5'

5%$3'"%*)&*%.4K'

G1 T'%"'@&-%4@'&<.)4>?'3A34'0730+%4@'".=3'<3=.*3'0.<43<"'C,'"*&4>%4@')E?''

*.'$%4>'&4>'@<33*'S?'C)*'&"'73'@3*"'0.4A%403>'*7&*'S'%"'4.*'*73<3?'>30%>3"'*.'

"%*'>.24'&4>'2&%*1'

M1 T'"%*"'%4'&'E."%*%.4'*.'C3'&C53'*.'"33'27&*'`G'%"'>.%4@?'&"'7%"'%4*34*%.4'%"'*.'

@<33*'7%='2734'*73'<%@7*'=.=34*'&<<%A3"1'

N1 673'2&%*3<'V'@<33*"'T?'%4'&4'%4$.<=&5'2&,?'C)*'.45,'%$'73'7&"'4.*'>.43'%*'

3&<5%3<'&5<3&>,1'

Z1 9'@<33*"'T'2%*7'&'"=%53'&4>'C.2?'C)*'T'&A.%>"'3,3'0.4*&0*"'2%*7'`N?'&4>'

E<3*34>"'*7&*'73'7&"'4.*'4.*%03>'7%=1'''

H1 `Z'@<33*"'T'E.5%*35,?'&4>'T''4.>"'C&0+'*.'7%"'"*)>34*1'

'

S<.=' &4' %4*34*' E.%4*' .$' A%32?' 23' 7&A3' 0&"3"' .$' ' Q4.*' *.R' 0%+''!2& ,3& (-!'-!1' 673'

$.55.2%4@'&"E30*"'%4$5)3403'27&*'@<33*%4@'2%55'*&+3'E5&03'%4'3&07'0&"3K'

'

G1 A,39&'%D"#.( &;(ME(T' %"' &' B.A%&5' E3<".4?' ".' 73' )")&55,' @<33*"' E3.E53' 73'

+4.2"?'&4>'<3*)<4"'@<33*%4@"?'3A34'C,')4+.24"1' '67%"' %"' 5%+3'&' <3$53[K' %$'

".=3C.>,'4.>"'&*'7%=?'73'4.>"'C&0+'

M1 1&)"D.( -%B%C"D"#",9( &;(ME(T' %"' <&*73<' 0.<E)534*?' ".' 73' >.3"' 4.*' 5%+3' *.'

=.A3''&<.)4>'=)07?'C)*'73')"3"'7%"'A.%03'&4>'$&031'

N1 @/,(B/.9"-%D(9"#$%#"&'E'9"'T'%"'.)*'.$'"%@7*'.$'`G?'T'0&4'3%*73<'"7.)*'&*'

7%=' .<' 2&5+' )E' *.' 7%=1' T.23A3<?' `G' "%*"' *..' $&<' $<.=' 7%=' *.' C3''

&>><3""3>'C,''"E3307?'".' 'T'0730+"'7.2'73'0.)5>'2&5+')E'*.'7%='%4'*73'

<&*73<'0<.2>3>'E5&031''

Z1 @/,(9&-"%D(%9B,-#(&;(#/,(9"#$%#"&'E(:*'%"'4.*'E.5%*3?'&4>'4.*'E<.=%"%4@'*.'

%4*3<<)E*'&'0.4A3<"&*%.4'*73'.*73<'E3<".4'%"'7%@75,'%4A.5A3>'%41''

H1 N,D%#"&'9/"B(C,#5,,'(#/,("'#,3D&-$#&39E(T'>.3"'4.*'*&+3'*73'%4%*%&*%A3'*.'

@<33*' `Z?' 43%*73<' V' 4.<' `N1' T.23A3<?' 73' <3*)<4"' *73%<' @<33*%4@"' %4'

>%$$3<34*'2&,"K'87&+%4@'7&4>"'2%*7'V' %4'&' B.A%&5'2&,?'4.>>%4@'C&0+' *.'

`Z?'C)*'7%>%4@'7%="35$'C37%4>' *73'=34)'0&<>' *.'&A.%>'@&-3'0.4*&0*'2%*7'

`N1'

'

V73<3?'.4'27&*'53A35Q"R'*.'*&+3'0&<3'.$'*73"3'>%"*%40*'E&<&=3*3<"X'D43'3[*<3=3'

&EE<.&07'%"'*.'>3&5'2%*7'3A3<,*7%4@'.4'*73'0.@4%*%A3'53A35?'&4>'0<3&*3'0.==)4%0&*%A3'

(-!'-!&0%+''!2&#/&1#7&%+,-$1,+(!=<&7(!"&*,+,)'!'+3'%4'SOP'")07'&"K'

'!"#$%&!'())*&+,)-.)(!"/012345"#$%&'($!"/012346"&&+,)-.)(7,)(+89-$:*;<")*+,-."&+,)-.)(7=8>;<".-/0"&$%&'($1&(2%$3456738!"9:,;<="&>:+*-9?)<">.*?;"&)9@:(89#)9*798:+)<"@;=,A@"&B&%&CD(6E&38$345#2(F(2!"G"#$H8(!"@A:@."IJJJ&!A"#$%&

'

D43'=&,'&<@)3'*7&*'*73'&C.A3'E&<&=3*3<"'.45,'%4$5)3403'7.2'*73'%4*34*'.$'@<33*%4@''%"'

*.'C3'<3&5%-3>'%4'*3<="'.$'C.>%5,'C37&A%.)<?'".'*73,'"7.)5>'C3'*&+34'0&<3'.$'.4'*73'

C37&A%.)<'E5&44%4@'53A351'673'C37&A%.)<'E5&443<'*734'%"'%4$.<=3>'&C.)*'*73'@<33*%4@''

%4*34*?'2%*7'E&<&=3*3<"K''!"#$%&C'&(($#68(%K(&!"L;:?*<G"#$%&'($!"L;:?*<M"#24E%$345!":;?N-A:-<NG"&!A"#$%&

'

6734' *73';37&A%.)<'L5&443<'7&"'&003""' *.'>3*&%5"'.4'/012345B& '/012346?' *73%<'

3[&0*' 5.0&*%.4' %4' 102CDE1D4C5?' &4>' 7&"' *73' +4.253>@3' &C.)*' 0%<0)="*&403"' 5%+3'

4.%"3' &4>' &<<&4@3=34*' .$' *&C53"' %4' 102CDE1D4C51' 67%"' =&,' 53&>' *.' &' 0.=E53['

@<33*%4@'C37&A%.)<?'0.4"%"*%4@'.$'C37&A%.)<'"*3E"'2&%*'$.<'*)<4'.EE.<*)4%*,?'&EE<.&07'

/012345?' 2&%*' $.<' @&-3' 0.4*&0*' 2%*7' /012345?' &4>' %4%*%&*3' 7&4>' "7&+3' 2%*7'

/0123451'

O(((!$44%3.(&;(-/%DD,'+,9((

673'%"")3"'>%"0)""3>'".'$&<'0&4'C3'")==&<%-3>'&"'&'5%"*'.$'07&5534@3"?'W)3"*%.4"'*.'

C3' >30%>3>' )E.4' &' E<%.<%' *.' "3**%4@' .)*' 2%*7' *73' >3"%@4' .$' SOP1' 673,' %4A.5A3' *73'

5&4@)&@3'.$'SOP?'*73'$33>C&0+'C3*2334' *73'=.>)53"?'&4>' *73'E5&03=34*'&4>'&003""'

.$'+4.253>@3'2%*7%4'.<'.)*"%>3'*73'*7<33'=.>)53"'.$'S%@)<3'G'%4'FJI1'''

'

G1 V7&*' *,E3' .$' %4*34*"' &<3' 23' 5..+%4@' &*X' D45,' 0.==)4%0&*%A3?' .<' &5".'

5.0.=.*%A3?'=&4%E)5&*%A3' &4>' .E3<&*%.4&5X'94'&'#&D&+.(&;( "'#,'#9( %"' *.'

C3'>3A35.E3>?'27%07'=&,'C3'3[*34>3>1'''

M1 V7&*'&<3'C&"%0?'353=34*&<,'%4*34*')4%*"X'a.'23'=.>35'3&$#"',(&3(3,;D,PQ

D"6,(C,/%R"&39(C%9,)(&'("'#,'#(Q31@1'@3**%4@'%4$.<=&*%.4'&C.)*'*73'2.<5>'

C,'A%"%.4RX':$',3"?'23'34>')E'2%*7'&'A3<C."3'>3"0<%E*%.4'.$' Q&5*3<4&*%A3"'

.$R' %4*34*"' &4>' 0.=E)*&*%.4&55,' "5.2?' $33>C&0+' $<.=' *73' ;37&A%.)<'

H',1(I,!(#-A&@/&-#!<&7"'+'&,+'&!"'3'&0(51'&(-!'-!32&5',1!&7(!"J'

N1 T.2'*.'=.>35'&4>'4%"'#%"'' !"'&01(/'\!()'2&#/&(-!'-!3?'*73%<'0.\3[%"*3403?'

*73%<'E<%.<%*%3"X' '94'3A34*'.$' *73'2.<5>'=&,'=&+3'&4' %4*34*'.C".53*3?'.<'

5.23<'%*"'E<%.<%*,1''

Z1 :4'*73'89:;9'$<&=32.<+'*73<3'%"'4.*7%4@'&"")=3>'&C.)*'*73''%#$3,(%')(

;&34( &;( 6'&5D,)+,( -&'-,3'"'+( #/,( F-/%'+"'+G( 5&3D)! *73' C37&A%.<' %"'

*&+%4@' E5&03' %41' ' 8&=3' %"' *<)3' &C.)*' *73' "),'#"#.( F%')( &#/,3(

-/%3%-#,3"9#"-9G( &;( #/,( /$4%'&")9( %4' *73' Q<3&5' .<' E7,"%0&5R' 2.<5>1'

T.23A3<?' =&4,' &"E30*"' .$' *73' 2.<5>' >.' =&**3<?' &"' E<30.4>%*%.4"' .<'

"3530*%.4' 0<%*3<%&' $.<' E5&44%4@' &4' %4*34*1' V73<3' &4>' %4' 27&*' $.<=&*' *.'

"*.<3?'&4>'7.2'*.'@3*'&003""'*.'<353A&4*'2.<5>'%4$.<=&*%.4X'''

H1 T.2' %"' *73' 89:;9' $<&=32.<+?' &4>' E&<*%0)5&<5,?' SOP?' <35&*3' *.' &#/,3(

-&+'"#"R,( %3-/"#,-#$3,9?' ")07' &"' FM?' NIX'V7&*' 0&4'23' 53&<4' $<.=' *73"3'

,**1(.,!(#-3J& ;,-& !"'3'& 4'& $3'5& !#& #$!1(-'& ,& 0.#+'& +'K$(+')'-!32& ,3& #/&

3[E<3""%A%*,' .$' SOPX' ' V7&*' "7.)5>' C3' *73' <35&*%.4"7%E' *.' ,4&#"&'(

4%36$B'FbIX'

J1 L&<*%0)5&<5,?' 7.2' *.( "'#,3B3,#( D&5QD,R,D( ;,,)C%-6( $<.=' *73' C37&A%.<'

<3&5%-3<' *.' @%A3' )E' &4' %4*34*?' &4>' C&"3>' .4' *73' %4$.<=&*%.4' @343<&*3'

&4.*73<?'>%$$3<34*'.43X''

S(((N,-&44,')%#"&'9((

;35.2':'E)*'$.<2&<>'=,'%>3&"'7.2'*.'>3&5'2%*7'".=3'.$'*73'%"")3"'<&%"3>1'

G1 P3*2"'5%"*'*73'"#$%&'()%*'&4>'%+)(*("+%,!-%.('*'.$')"%4@_0<3&*%4@'&4'SOP1':'

*7%4+'.$'=3<%*"'&"K''

&1 &55.2%4@' "*,53>_%4>%A%>)&5' =)5*%=.>&5' C37&A%.<' 3[E<3""%4@'

$)40*%.4"?'>3"%@4%4@?'>3$%4%4@'&4>'=&,'C3?''

C1 &4&5,-%4@' C37&A%.<' $.<' 7)=&4.%>"' .4' &' $)40*%.4&5' 53A35?' %4' &'

)4%$.<='5&4@)&@3'

M1 P3*2"'&>.E*_>3A35.E'&4'"+'"/"01!"2!23+&'("+*'Q%4*34*"R?'2%*7'0&*3@.<%3"'5%+3'

C%.5.@%0&5' $)40*%.4"' Q31@' ' 4.*' *.' @3*' "*%$$?' *.' @&*73<' %4$.<=&*%.4R?'

0.==)4%0&*%A3' $)40*%.4"' 2%*7' ")C\0&*3@.<%3"' &"' E<3"34*%4@?' 0.4A3<"%4@'

2%*7'&'"%4@53'%4*3<5.0)*.<''.<'2%*7'&'A&<,%4@'4)=C3<?'$)40*%.4"'<35&*3>'*.'

>.=&%4"'5%+3'QA3<C&5R'*)*.<%4@?'0.4*<.55%4@'31@1'&4'.<073"*<&'.<'*<&$$%01''

N1 9'Y.<3'SOP?'2%*7'&'0.<<3"E.4>%4@'23+&'("+4.1'Q&4&5.@'*.'0%*'34.1R'=&,'

C3'>3"%@43>'$.<'0.==)4%0&*%A3'$)40*%.4"1'a.=&%4\"E30%$%0'0.5530*%.4"'.$'

$)40*%.4"'=&,'C3'0E5)@@3>''%42?'*.'34<%07'C37&A%.<1'

Z1 :4'.)<'.4*.5.@,'53*2"'C3'0.4"0%.)"'&C.)*'*73''(-%!&4>',3.4'("+4/!4*5%&''.$'

*73'$)40*%.4"?'C.*7'2734'>3"%@4%4@'*73'>3"0<%E*%.4'.$'$)40*%.4"'&4>'2734'

E<.03""%4@' *73=1' ']1@1'3[E<3""%4@'&' "*&*3'&"' 5.4@'&"'4.'3[E5%0%*' ' 034>'.$'

"*&*32' %4*34*' %"' @%A34' Q")07' &"' 3[E<3""%4@' &4' 3=.*%.4&5' "*&*3R?' A3<")"'

3[E<3""%4@' &4' %4*34*' &"' 5.4@' &"' *73' @.&5' %"' 4.*' &07%3A3>?' C)*' 2%*7' &'

=&[%=)=' >)<&*%.4' Q31@1' @3**%4@' *73' $5..<' %4' 0.4A3<"&*%.4R1' :4' *73' 5&**3<'

0&"3?' =307&4%"="' *.' 0730+' *73' $)5$%55=34*' .$' *73' %4*34*?' &4>' E<.A%>%4@'

$33>C&0+'&C.)*'%*'7&"'*.'C3'>3A%"3>1'

H1 :4'*73'>3$%4%*%.4'.$'$)40*%.4"?'5.%&"+,('("+*!4+,!&"+*%63%+&%*?'&"'2355'&"'

E.""%C5,?'>30.=E."%*%.4'%4*.'=.<3'<3$%43>'$)40*%.4"'"7.)5>'C3'>3&5*'2%*71'

S.<' %4"*&403?' &' E<30.4>%*%.4' $.<' ' 0@<33*' $<.=' 05."32' =&,' C3' E<.[%=%*,?'

2%*7' &' <3&5%-&*%.4' .$' 4.>?' .<' "=%53?' &4>_.<' ".=3' A3<C&5' @<33*%4@?' 27%53'

0@<33*' $<.=' &' >%"*&4032' &"")=3"' &' >%"*&4*' C)*' A%"%C53' .*73<' E3<".41' :*'

"7.)5>'C3'>30%>3>'273<3' &4>'2734' *7%"' 0.4>%*%.4"' &<3' 0730+3>1'8%=%5&<'

0.4"%>3<&*%.4"' &EE5,' $.<' &"")=3>' &4>' 3[E30*3>' 0.4"3W)3403"' .$' &'

$)40*%.4'*.'C3'073+3>'.<'2&%*%4@'*.'C3'*<)3'Q31@1'*73'@<33*%4@'<3*)<43>R1'

J1 S%4&55,?' %*' =&,' C3' )"3$)5' *.' &4&5,"3?' $.<=&55,' .<' %4$.<=&55,?' 7.2' %4'

=3>%&*3>' 2.<5>"' 0.=E53[' $)40*%.4"' &4>' %*"' 0.4"3W)3403"' &<3'

0.==)4%0&*3>1' S.<' %4"*&403?' @<33*%4@' "33="' *.' C3' .4' *73' .43' 7&4>' &4'

3A3<,\>&,'&4>'7%@75,'<.)*%43'&0*?'&4>?'.4'*73'.*73<'7&4>?'2%*7'&'=)5*%*)>3'

.$'A&<%&4*"'&4>'")C*53*%3"1'T.2'&<3'@<33*%4@"'3[E<3""3>'&4>'<3*)<43>'31@1'

%4'830.4>'P%$3X'V7&*'%"'*73'"*&*3'.$'*73'&<*?'&"'.$'"%*)&*3>'3[E<3""%A%*,X''

'

*-6'&5D,)+,4,'#9(

67%"'<3"3&<07'7&"'C334'")EE.<*3>'C,'*73'`96]'E<.B30*?'$)4>3>'C,'*73''U3*73<5&4>"'

D<@&4%-&*%.4'$.<'80%34*%$%0'(3"3&<07'QUVDR'&4>'*73''U3*73<5&4>"':Y6'(3"3&<07'&4>'

:44.A&*%.4'9)*7.<%*,'Q:Y6'(3@%3R1'

N,;,3,'-,9(

G1 ;3A&0W)&?']1?'(&.)-&%.)?'91?'L3*3<"?'Y1?'Y&<%>&+%"?'`1?'c&<E.)-%"?'c1?'L35&07&)>?'

Y1?' O&40%4%?' O1' QMddJR1' O)5*%=.>&5' "34"%4@?' %4*3<E<3*&*%.4' &4>' 0.E,%4@' .$'

=.A3=34*"' C,' &' A%<*)&5' &@34*1' :4K' 7."&%%,(+0*! "2! 7%.&%5'("+! 4+,! 8+'%.4&'()%!

9%&:+"/"0(%*!;789<=>?1'

M1 S)4@3?' e1?' 6)?' f1' &4>' 63<-.E.)5."?' a1' Gggg1' Y.@4%*%A3'O.>35%4@K' c4.253>@3?'

(3&".4%4@' &4>' L5&44%4@' $.<' :4*355%@34*' Y7&<&0*3<"1' :4' L<.01' .$' 8:``(9LT' gg?'

P."'94@353"?'Y91'

N1 81'c.EE?'Y1';30+3<?':1'V&07"=)*7K''

673'h%<*)&5'T)=&4'O&['\'O.>35%4@']=C.>%3>'Y.4A3<"&*%.41':4K'c:'MddJ'\'a3=.'

L<3"34*&*%.4?'][*34>3>'9C"*<&0*"?'EE1'MG\MZ?'MddJ1'

Z1 L3*3<"?' Y1' QMddJR'@!7%.&%5'34//1AB4*%,! 9:%".1! "2!C(+,!C",%/! 2".! @0%+'!

8+'%.4&'("+! 8+('(4'("+?' L3*3<"?' Y1' :4*3<4&*%.4&5' e.)<4&5' .$' T)=&4.%>'

(.C.*%0"'Q:eT(R?'NQNR?'EE1'NMG'\'NZd?'H1 67#<%"".4?'c1' (1' QMddbR1'9A&*&<' :4*355%@3403' :4$)"%.4-c3,'U.*32.<*7,' :"")3"1'

c3,4.*3' E<3"34*&*%.4?'>T#/( 0'#,3'%#"&'%D(8&';,3,'-,( &'(8&4B$#,3(U3%B/"-9(

%')(*3#";"-"%D(0'#,DD"+,'-,?'N:9'Mddb?'9*734"?'`<3303?'O&,'Nd\NG?'GMN\GNZ1'

J1 h%57B&5="".4?'T1?''Y&4*35=.?'U1?''Y&""355?'e?''Y7&$&%?'U1?'c%EE?'O1?'c.EE?'

81?'O&40%4%?'O1?'O&<"355&?'81?''O&<"7&55?'91?'L35&07&)>?'Y1?''()**+&,?'!"1?'

67#<%"".4?' c1?' h&4' V35C3<@34?' T1?' ' h&4' >3<' V3<$?' (' QMddbRK' 673'

;37&A%.<'O&<+)E' P&4@)&@3K' ' (3034*'a3A35.E=34*"' &4>' Y7&5534@3"?' Y1'

L35&07&)>?'e\Y'O&<*%4?']1'94><i?'`1'Y.553*?'c1'c&<E.)-%"?'a1'L35i'Q3>"RK'

:4*355%@34*'h%<*)&5'9@34*"?'L<.01'.$':h9db?'L&<%"?'PU9:'ZbMM?'EE1'gg\GGG1'

8E<%4@3<'h3<5&@?';3<5%41'

b1 VNY' :40)C&*.<' `<.)E' (3E.<*' QMddbR'7**EK__22212N1.<@_MddH_:40)C&*.<_3=.*%.4_f`(\3=.*%.4_'

'

http://3ia.teiath.gr/main_page.php?page=program.html



Applying the SAIBA framework to the Tactical Languageand Culture Training System

Prasan Samtani, Andre Valente and W. Lewis JohnsonAlelo, Inc

11965 Venice Blvd, Los Angeles CA 90066{psamtani,avalente,ljohnson}@tacticallanguage.com

1. INTRODUCTIONThe Tactical Language and Culture Training System (TLCTS)

helps learners acquire basic communicative skills in a foreignlanguage and culture. The system is broadly divided intotwo main sections. In the Skill Builder, learners are coachedthrough a set of lessons on language and culture by a vir-tual tutor. The tutor o!ers attributional and motivationalfeedback in order to keep the learner on track [4].

Following the acquisition of skills, learners must proceedto complete missions in a simulated environment populatedwith virtual characters - the Mission Game. Learners canspeak to AI characters in the game through a microphone,and can select appropriate gestures using the mouse. Inorder to successfully complete missions, the learner mustdisplay a mastery of the specific linguistic skills in the tar-get language, as well as a knowledge of the culture. Thelearner is accompanied by an aide character, who can o!errecommendations if the player gets stuck.

Figure 1: TLCTS Mission Game

Three training systems have been built so far using TLCTS:Tactical Iraqi (for Iraqi Arabic), Tactical Pashto(for Pashto,spoken in Afghanistan), and Tactical French. Tactical Iraqiis currently in use by thousands of learners in the US armed

Cite as: Title, Author(s), Proc. of 7th Int. Conf. on Au-tonomous Agents and Multiagent Systems (AAMAS 2008),Padgham, Parkes, Müller and Parsons (eds.), May, 12-16., 2008, Estoril,Portugal, pp. XXX-XXX.Copyright © 2008, International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.

forces as they prepare for deployment overseas. Reportsfrom users suggest that the program is engaging and highlye!ective. Anecdotal reports indicate that trainees start ac-quiring basic communication skills very rapidly, and acquirebasic functional proficiency in conversational Arabic rele-vant to their mission in just a few weeks.

The Mission Game is fundamentally a social simulation.Processing and generating multimodal behavior is a funda-mental component of the Tactical Language and CultureTraining System. The user controls a player character andmust communicate with agents in the scenario in order toachieve his/her goals. The agents receive a stimulus fromthe user and produce an output intent. A key challengein these types of simulations is to produce the behavior ofautonomous non-player characters given abstract specifica-tions of communicative intent. This process has been docu-mented extensively in previous publications [4][5]

In this paper, we describe the framework we adopted forgenerating multimodal behavior in TLCTS and draw impli-cations for the design of FML. We start by describing theframework we adopted, which closely matches the SAIBAFramework1. Then, we discuss the representation we cre-ated for communicative acts in TLCTS. We compare thisrepresentation to elements of FML, and propose some mod-ifications to FML as well as the addition of a language torepresent context information. Finally, we present our con-clusions.

2. BEHAVIOR GENERATION IN TLCTSPreviously, we had simplified the behavior generation prob-

lem in TLCTS by (a) treating the inputs and outputs to theagents as symbols and (b) maintaining a strict one-to-onemapping from intent to behavior (see figure 2). As our sys-tem was scaling up, this was no longer a feasible solution,as it forced authors to micromanage the behavior. Any in-put from the user that does not map perfectly to the set ofacceptable output symbols produced a generic “What wasthat?” response from the agents.

The process of intent planning in the current-generationversion of TLCTS is realized through a state machine. Ear-lier versions of TLCTS used Psychsim, a social simulationframework developed at USC/ISI[8], however, its usage wasdiscontinued for runtime performance reasons. The impor-tance of following social norms with respect to the targetculture has to be encoded implicitly within the state ma-chine description by the authors. While this approach had

1http://wiki.mindmakers.org/projects:saiba:main/

worked excellently when the system was small, it began tolead to inconsistency as the system grew in size. Other lim-itations include:

1. It does not lend itself to distributed processing becauseall processing is done at a single module (the MissionManager).

2. The input processing is too simple, relying on a map-ping process to match a speech act to each input (ut-terance and/or gesture).

3. Speech acts are currently represented as symbols, whichdoes not easily lend itself to the representation of com-plex speech acts; e.g., acts that need parameters (“whatis the name of this object?”).

4. The behaviors of all agents are decided based on acentralized finite state machine, which is computation-ally e"cient but is limited in expressivity and hard tomaintain.

5. Conversational outputs are produced manually withbundles of speech recordings and animations, and se-lected by the finite state machine.

6. It assumes that there is only one player participatingin the simulation.

Figure 2: Old pipeline

To solve these limitations, we designed an improved be-havior generation for TLCTS, shown in figure 3. The basicprocess adopts the SAIBA Framework [6]. The latest userinput (in the form of a speech act) is passed to agents thatperform intent planning; that is, decide which communica-tive acts (if any) to perform. The action is specified as acommunicative act (usually a speech act) that goes througha behavior generation step that ultimately produces charac-ter behavior that is realized through the game engine (cur-rently Unreal Engine).

To generate culturally and contextually appropriate be-havior that automatically adapts based on the current so-cial and environmental context, we must encode a signifi-cant amount of knowledge about the dialog, the world, andthe target culture. Some of these areas are well researched.For example, the maintenance of dialog context is a popular

Figure 3: New output pipeline.

topic amongst researchers. The three knowledge bases (di-alog context, cultural context and environmental context)carry separate sets of information. For example, the dialogcontext contains knowledge such as the current and previoustopics of conversation and the level of formality of the dia-log. When generating behavior, the translation rules selectappropriate behaviors; for example, selecting “marHaba” inan informal Arabic conversation and “as-salaamu 9aleykum”in more formal ones. The cultural ontology specifies appro-priate behaviors for an intent in a particular culture; forexample, nodding could show approval or disapproval de-pending on the culture. The corresponding input pipelinealso uses these knowledge bases.

3. REQUIREMENTS FOR REPRESENTINGCOMMUNICATIVE ACTS IN TLCTS

Developing a system to model dialog in TLCTS is com-plex, because communicative acts are used within the systemfor both generation and interpretation. There are a largenumber of potential inputs a learner can produce, and weneed a system that can e!ectively manage all of them with-out undue e!ort from the authors. On the speech recog-nition side, we introduced the use of utterance templates,which use a more flexible grammar definition syntax. Formore details on utterance templates, see [7].

Using communicative acts that are useful for interpre-tation places a unique requirement on our system. Ourcommunicative acts need to be more content-rich than theywould be if they were simply used for generation of behav-ior. We identified three main roles for communicative actswithin TLCTS:

To specify function What does the act do? Does it in-form, make a request, o!er a greeting, accept a pro-posal?

To modulate function How polite or forceful was the speaker?What level of redress was used when making a request?

To specify the context under which the act is applicableIs it directed at a male or a female listener? Is itused between people of equal standing or to someoneof higher standing?

In addition to the requirements of authorability and flex-ibility, our development of systems for multiple target lan-guages and cultures places a new requirement: adaptability.Our plan is to construct libraries of communicative acts thatare reusable across di!erent languages. However, each tar-get culture has its own definition of what acceptable behav-ior is. Therefore, we need to be able to define how di!erent

communicative acts are interpreted in di!erent cultures andsituations. In order to do this, we set about creating a uni-fied representation of context, further described in Section5.1.

4. THE TLCTS COMMUNICATIVE ACT ON-TOLOGY

Several researchers have worked on creating representa-tions of communicative intent. We initially looked at FrameNet,on online lexical resource based on semantic frames [1]. FrameNetpossesses many of the features that we require within a rep-resentation language, however, it is significantly larger andmore complex than we would like. Since it is used by thenatural language community, it tends to contain extraneouslinguistic information that we do not require (such as wordsenses). Its size and complexity are also a barrier to au-thorability. However, it has been an excellent resource forus as we create our ontology, and we adapted several of itsconcepts.

We then looked at the paper by David Traum and Eliza-beth Hinkelman [9], which presents a typology that identifiesthe various functions of communicative acts. A summary ofthis is presented in table 1. We adopted this as the startingpoint of our ontology.

Turn-taking These acts modeltaking and receivingthe turn within aconversation

take-turn,release-turn,keep-turn, assign-turn

Grounding These acts are usedto frame the corespeech acts

initiate, continue,ack, repair, req-repair, req-ack,cancel

Core SpeechActs

These are the tradi-tional types of speechacts

inform, whq, ynq,accept, request,reject, suggest,eval, req-perm,o!er, promise

Argumentation These are used tobuild more complexactions out of thecore speech acts

elaborate, sum-marize, clarify,q&a, convince,find-plan

Table 1: Traum and Hinkelman’s typology of speechacts

We found that several categories of core acts in Traum’soriginal typology could be seen as subclasses of one another.In addition, there are acts that were not covered by theontology - including o!ering greeting, thanks, support etc.Based on the work done by Feng, Hovy, Kim and Shaw at ISI[3], we added a new class of core speech acts called ‘social’acts, which include the aforementioned types. The resultingspecification of communicative acts is shown in table 3. Thedi!erent classes for core acts are organized in a hierarchy,shown in table 2.

4.1 Modulating communicative actsWe mentioned that the structure of communicative acts

was divided into three broad categories. The first, describingfunction, was described in the above section. The secondcategory describes how communicative acts are modulated.

core

inform request evaluate socialinform

accept reject o!er promise

request

request-info

whq ynq

request-action

eval

compliment criticize

social

greet thank

Table 2: Class Hierarchy for core acts

There are many possible models for modulating commu-nicative acts. Our interest is primarily in models of polite-ness. We adopted a model of politeness based on Brown andLevinson’s theory of politeness[2]. The constructs used arethe following:

Degree of imposition The degree of imposition inherentin the act itself. Independent of politeness tactics used.For example, asking someone to lay down and put theirhands flat on the ground is inherently more imposingthan asking them the time.

Negative face threat Degree to which the act a!ects thereceiver’s negative face (desire to remain autonomous)

Positive face threat Degree to which the act a!ects thereceiver’s positive face (desire to be appreciated/avoidcriticism)

4.2 Specifying applicabilityThe final part of the communicative act specification deals

with when an act is applicable. Certain communicative actsare only appropriate at certain times in the day (for exam-ple, “Good night"). Others are only applicable when thereceiver is female (“Yes ma’am"). We need a way to spec-ify the context in which a communicative act is applicable.This is separate from the description of context itself, whichdescribes the actual context in which the conversation takesplace. In particular, this feature of being able to specifythe applicability of a communicative act is especially usefulwhen there is a mismatch between the actual context and theapplicable context of the communicative act - for example,saying “Yes ma’am" to a commanding o"cer. Functionally,this act is very similar to “Yes, sir", but it is likely to be in-terpreted as a serious insult. Therefore, all communicativeacts are interpreted based on the current context.

5. MATCHES AND MISMATCHES WITH FMLSince at a high level we are adopting the SAIBA frame-

work, we decided to evaluate whether we could adopt itsrepresentation languages, FML and BML. We examined the

GroundingValue Descriptioninitiate Start the grounding of the conver-

sation, typically includes greeting,pleasantries

continue Continue the grounding of the con-versation, for example, introducingoneself, asking for information

ack Acknowledge an action/suggestionmade by the hearer

repair Repair ground that was lost bymaking an inappropriate/o!ensivestatement?

req-repair Request the receiver to initiate re-pairing the grounding of the conver-sation

req-ack Prompt the listener for an acknowl-edgement (for example: “How doesthat plan sound?")

cancel Nullify the impact of a previous ac-tion (for example: “Disregard whatI said earlier")

Core-Actinform Informs the hearers about some

subject (person/place/question)accept Informs the hearer that a particular

o!er has been acceptedreject Informs the hearer that a particular

o!er has been rejectedo!er Propose a suggestion/action/deal

to the hearerpromise A promise to the hearer that a par-

ticular action will be performed ata later time

request Can be a request for information oran action

request-info A request for some informationwhq A request for specific information

about a subject ("What is yourname?")

ynq A simple yes/no OR true/false typequestion - does not have to be a yesor no - eg: ("Was he young or old?")

request-action A request for some action to be per-formed ("sit down")

eval O!ers an evaluation to the hearercompliment A positive evaluationcriticize A negative evaluationsocial An act that only has a social func-

tion (and no other)greet Serves to initiate a conversation.thank Thank a hearer for something they

have done/will do. Can also be usedsarcastically.

Table 3: Core Act Specification

structure of BML, and found it extremely suitable with re-gards to our needs. While we have not implemented a BMLrealizer, we use a messaging system to realize behavior, andwe perceive that incorporating BML should not be too dif-ficult.

However, we feel there are some fundamental mismatchesbetween our needs for representing function and the currentproposal for FML. First, part of the the mismatch is relatedto the lack of context in the SAIBA framework. We feel thatthe mapping from FML to BML cannot take place withoutthe use of context, and propose that a unified descriptionof context (both cultural and environmental) is absolutelynecessary within the SAIBA framework. In addition, we be-lieve that there are some representational problems even atthe top level structure of FML, which need to be addressedin order for SAIBA to be a complete framework for a widevariety of interactive social simulations. Below we outlinewhat such representation of context should contain.

Second, the structure of FML as proposed seems to invertcore and auxiliary elements. We believe that at the top level,a unit of function needs to represent a communicative act(as we do in our system). Our understanding is that theclosest match to a communicative act in the current FML isthe performative element.

We therefore propose that the performative element shouldbe the top level element within an FML block. Our analysisof communicative acts indicates that the remaining elementsabove performative are either unnecessary or out of place inthe structure. For example, the decision of whether to takethe turn by force or to wait for the turn to be freed shouldbe made by the agent, and thus it is not an attribute ofthe communicative function - see [10] for more details. Fur-ther, we recommend that the topic element should be movedwithin the performative element.

5.1 Representing context - a new language?As mentioned above, a fundamental issue with the origi-

nal SAIBA framework is that there is no explicit represen-tation of context. In our opinion, the mapping from FMLto BML cannot take place without a very detailed represen-tation of context. For example, even an extremely simpletask such as exiting a conversation is dependent on the timeof day (“Have a great day” vs “Good night”). An alterna-tive to maintaining an explicit context could be to movethis information into the FML representation. However, webelieve is against the goal of SAIBA, which is modularizethe generation of communicative behavior into function andrealization.

We therefore propose that there should be a formal lan-guage explicitly representating context. We tentatively namethis new language "Context Markup Language" or CML.CML would be divided broadly into three modules:

Dialog context Includes the history of what has happenedin the dialog, the current topic of conversation, thelevel of tension, etc. This is updated very frequently.

Environmental context This includes information aboutthe time of day, current setting (certain settings, likeplaces of worship, can influence how certain commu-nicative functions should be realized), etc. This is up-dated less frequently (whenever the location changesor at significant points in time (noon, dusk)).

Cultural context Provides information on the culturally

appropriate way to express certain communicative func-tions. For example, the palm-over-heart gesture to ex-press sincerity among Iraqis, or the folded hands toexpress respect in Hindu culture. This can be consid-ered read-only for each culture and does not change.This module would also contain representations of so-cial norms — what is acceptable and unnacceptable ina culture, and what are appropriate (or at least com-monly acceptable) responses to norm violations.

If context is explicit maintained in a separate module, it ispossible for FML to focus solely on representing the desiredfunction of a communicative act. In our view, FML shouldnot encode any information on how this act is to be realized.

We have started a research project to define the detailedstructure of CML. Our working assumption is that CMLshould be based on an ontology of context. Most of thecontent inside the modules will be propositional - mostlyfacts, statements. Appendix A provides an example of anontology representing environmental context, written in KIFsyntax and implemented in the knowledge representationsystem PowerLoom2. We intend to propose a detailed XMLstructure at a later date.

6. CONCLUSIONThe TLCTS Mission Game is a practical, heavily used so-

cial simulation that aligns well with the SAIBA framework.However, we found that some features in the original FMLproposal did not match our needs. Based on these needs andour experience in social simulations, we recommended somemodifications to FML that will make it a batter match withour needs and, we believe, the needs of other groups as well.In addition, we proposed that an additional langauge calledCML should be developed to SAIBA, unifying the represen-tation of context information as an aid in the translationfrom FML to BML for a target situation and culture.

7. REFERENCES[1] Baker, Collin F., Fillmore, Charles J., and Lowe, John

B. The berkeley framenet project. In Proceedings ofthe COLING-ACL, 1998.

[2] P. Brown and S. Levinson. Politeness: Someuniversals in language usage. Cambridge UniversityPress, 1987.

[3] D. Feng, E. Shaw, J. Kim, and E. Hovy. Learning todetect conversation focus of threaded discussions.

[4] Johnson, W.L., Beal, C., Fowles-Winkler, A., Lauper,U., Marsella, S., Narayanan, S., Papachristou, D.,Valente, A., and Vilhjalmsson, H. Tactical languagetraining system: An interim report. In proceedings ofIntelligent Tutoring Systems, 2004.

[5] Johnson, W.L., Marsella, S., and Vilhjalmsson, H. Thedarwars tactical language training system.Interservice/Industry Training, Simulation andEducation Conference, 2004.

[6] S. Kopp, B. Krenn, S. Marsella, A. N. Marshall,C. Pelachaud, H. Pirker, K. R. Thorisson, and H. H.Vilhjalmsson. Towards a common framework formultimodal generation: The behavior markuplanguage. In Proceedings of Intelligent Virtual Agents2006, pages 205–217, 2006.

2See http://www.isi.edu/isd/LOOM/PowerLoom.

[7] J. Meron and W. Johnson. Improving the authoring offoreign language interactive lessons in the tacticallanguage training system. SLaTE Workshop on Speechand Language Technology in Education, 2007.

[8] D. V. Pynadath. Psychsim: Agent-based modeling ofsocial interactions and influence. In ICCM, pages243–248, 2004.

[9] D. R. Traum and E. A. Hinkelman. Conversation actsin task-oriented spoken dialogue. ComputationalIntelligence, 8:575–599, 1992.

[10] H. H. Vilhjálmsson, C. Merchant, and P. Samtani.Social puppets: Towards modular social animation foragents and avatars. In D. Schuler, editor, HCI (15),volume 4564 of Lecture Notes in Computer Science,pages 192–201. Springer, 2007.

APPENDIXA. ENVIRONMENTAL ONTOLOGY IN POW-

ERLOOM(defmodule "TACTLANG-ENVIRONMENT":includes("PL-USER"))(in-module "TACTLANG-ENVIRONMENT")

(defconcept environment)

(defconcept time-of-day:documentation "Represents the time of dayfor the environmental context")

(assert (and (time-of-day morning)(time-of-day afternoon)(time-of-day evening)(time-of-day night)))

(assert (closed time-of-day))(deffunction current-time ((?e environment))

:-> (?t time-of-day))

(defconcept physical-thing)

(defconcept person ((?t physical-thing)))

(defconcept vector)(deffunction vector-x ((?v vector)) :-> (?x FLOAT))(deffunction vector-y ((?v vector)) :-> (?y FLOAT))(deffunction vector-z ((?v vector)) :-> (?z FLOAT))

(deffunction location ((?t physical-thing)):-> (?v vector))

(deffunction geo-distance((?a1 physical-thing) (?a2 physical-thing)):-> (?dist FLOAT))

(defconcept place ((?t physical-thing)))(deffunction current-place ((?person)) :-> (?p place))

(defconcept place-of-worship ((?p place)):documentation "A place of worship may imposeadditional restrictions on what is culturallyappropriate. The exact restrictions should be

represented in the cultural ontology")

(assert (and (place-of-worship mosque)

(place-of-worship temple)(place-of-worship church)(place-of-worship synagogue)(place-of-worship gurdwara)))

(defconcept market ((?p place)):documentation "A market may ease restrictionson appropriate behavior - for example, thoseon maintaining physical distance or onappropriate volume of voice.")

(defconcept weather-conditions)(defconcept precipitation-conditions)(assert (and (precipitation-conditions clear)

(precipitation-conditions rain)(precipitation-conditions snow)(precipitation-conditions hail)(precipitation-conditions fog))

(defconcept overhead-conditions)(assert (and (overhead-conditions sunny)

(overhead-conditions cloudy)(overhead-conditions partially-cloudy))

(deffunction weather-precipitation ((?w weather)):-> (?p precipitation-conditions))

(deffunction weather-overhead ((?w weather)):-> (?p overhead-conditions))

(deffunction weather ((?p place)):-> (?w weather))

!"#$%&'"(%)*+$,"+'"-./0*%+/"1&2$&)&/*3*%+/""'$+4"53/637'"*+"8!9#!"

!"##$%&'()*+,)-%%.#&"#/&01(%2(##&34&5*61(%%.#&7$#2$1&8.1&9#"):%(%&"#/&;$%(<#&.8&=#2$))(<$#2&9<$#2%&

"#/&>?*..)&.8&7.-@A2$1&>?($#?$&3$:B+"CDB&E#(C$1%(2:F&=?$)"#/&GHIJ&IKK&LMHNHF&MJNOP&

L*"##$%F&2*.1(%%.#PQ1A4(%&

&

&

!"#$%!&$!"#$!%&'()!#*+%!,%! )#&(!-*-$'! &.)',/01$(!)#$!*&2!,%!34564!*./!)#$!

%0.1)&,.*+! 2*'70-! +*.80*8$! 9:;<=! %',2! )#$! -$'(-$1)&>$! ,%! )#$!

?$.)$'!%,'!4.*+@(&(!*./!A$(&8.!,%!5.)$++&8$.)!48$.)(!9?4A54=!*)!

B$@7C*>&7!D.&>$'(&)@E! !"#$! ($1,./!#*+%! -',>&/$(! *! F'&$%! #&(),'&1!

,>$'>&$G!,%!)#$!%0.1)&,.*+!'$-'$($.)*)&,.!,%!1,220.&1*)&>$!&.)$.)!

&.! *! +&.$! ,%! 1,220.&1*)&>$! #02*.,&/(! *./! '$+*)$/! (@()$2(H!

()*')&.8!G&)#!I*./*+%!*./!+$*/&.8!0-!),!,.$!,%!)#$!$*'+@!-',-,(*+(!

%,'!:;<!&.!)#$!34564!%'*2$G,'7E!

&'()*+,-)./'01/#234)5(/6).5,-7(+,.!5EJEK! L!,(-8-5-'9/ :0()99-*)05)MN! O.,G+$/8$! B$-'$($.)*)&,.!

:,'2*+&(2(! *./! ;$)#,/(! P! !"#$%&' #()' &*"+,-&.' "%,"%&%(-#-+/('

0#(12#1%&.'"%,"%&%(-#-+/(&E'

;)0),'9/$),<.!4+8,'&)#2(H!A$(&8.H!3)*./*'/&Q*)&,.H!<*.80*8$(H!"#$,'@E!

=)>?+,1.!R2F,/&$/! ?,.>$'(*)&,.*+! 48$.)(H! :0.1)&,.*+! B$-'$($.)*)&,.H!

;0+)&2,/*+!?,220.&1*)&,.H!S02*.!?,2-0)$'!5.)$'*1)&,.E!

@A" :B$%C6D&$:CB/4(! )#$! 34564! 1,.(,')&02!8$*'(! 0-! %,'! )#$! ($1,./!-#*($! ,%! )#$!

-+*..$/! G,'7! ,.! '$-'$($.)*)&,.(! %,'! 20+)&2,/*+! 8$.$'*)&,.H! G$!

G,0+/! +&7$! ),! (022*'&Q$! G#*)! G$! 1,.(&/$'! (,2$! ,%! )#$! 7$@!

*(-$1)(! ,%! )#&(! G,'7! *(! G$++! *(! 8&>$! *! >&$G! ,%! (,2$! ,%! )#$!

#&(),'&1*+!',,)(!)#*)!)#$!$%%,')!8'$G!,0)!,%E!!

"#$! 34564! 9(&)0*)&,.H! *8$.)H! &.)$.)&,.H! F$#*>&,'H! *.&2*)&,.=!

$%%,')! $T&()(! %&'()! *./! %,'$2,()! %,'! )#$! -0'-,($! ,%! &.1'$*(&.8! )#$!

(@.$'8@! G&)#&.! )#$! '$($*'1#! 1,220.&)@! %,10($/! ,.! 20+)&2,/*+!

1,220.&1*)&,.! &.! ',F,)(! *./! >&')0*+! #02*.,&/(E! ?,2-0)$'!

8'*-#&1(! G,'7! #*(! $.C,@$/! $.,'2,0(! (011$((! &.! ()*./*'/&Q*)&,.!

$%%,')(!%,'!G#*)!G$!($$!*(!)#$!+,G$()!+*@$'!&.!*!()*17!0-,.!G#&1#!

(@()$2(! *./! *F()'*1)&,.(! +*@$'! $>$'U&.1'$*(&.8! 1,2-+$T&)@! *./H!

$>$.)0*++@H! &.)$++&8$.1$E! "#$! ,F($'>*)&,.! &(! (&2-+$N! 4(! 2,'$!

-$,-+$! /,! '$($*'1#! &.! &.)$'*1)&>$! #02*.,&/(! )#$! -,)$.)&*+! %,'!

/0-+&1*)&,.!,%!$%%,')!&.1'$*($(E!"#$!1,.(,')&02!#*(!*//'$(($/!)#$!

-',F+$2!,%!20+)&2,/*+! 8$.$'*)&,.!V,.$!*F()'*1)&,.! +$>$+W! *F,>$!

)#$! 8'*-#&1(! +$>$+X! )#&(! '$(0+)$/! &.! )#$! 6;<! $%%,')H! G#&1#! G*(!

F*($/!,.!*!+,)!,%!-'&,'!G,'7E!!

4)! )#$!6;<!+$>$+!G$!($$!8',0-&.8(!,%!-'&2&)&>$!2,>$(H!G#*)! &.!

',F,)&1(! &(! 1*++$/! $U2,>$(H! &.),! +*'8$'H! 2,'$! 1,2-+$T! ($)(! ,%!

&.()'01)&,.(H!&.),!G#*)!&.!)#$!',F,)&1(!G,'+/!&(!,%)$.!'$%$''$/!),!*(!

*1)&,.! *./! G#&1#! &.! 6;<! *'$! 1*++$/! F$#*>&,'(E! 6;<! &(! )#0(! *!

+*.80*8$! %,'!/$(1'&F&.8!$>$.)(! )#*)!*'$! (0--,($/! ),!#*--$.E!"#$!

$>$.)(! *'$! .,)! *(! (&2-+$! *(! 8'*-#&1(! 1,22*./(! 9,)#$'G&($! )#$@!

G,0+/!.,)!(*>$!)#$!/$>$+,-$'!*.@!)&2$=H!(,!)#$@!1*..,)!'$-'$($.)!

)#$!(*2$!%&.$!+$>$+!,%!/$)*&+!P!*./!)#*)!&(!-'$1&($+@!)#$!-,&.)E!"#$@!

*'$! #&8#$'! +$>$+! )#*.! $U2,>$(! *./! )#$'$%,'$! 1*.! F$! 0($/! ),!

-',8'*2!+*'8$!($)(!,%!$U2,>$(!&.!(&.8+$!()',7$(E!S,G$>$'H!#02*.!

F$#*>&,'! &(!#&8#+@!1,2-+$TH! *./! C0()!*(!F*(&1!1,2-0)$'!8'*-#&1(!

1,22*./(! *'$! .,)! G$++U(0&)$/! ),! /$(1'&F$! 1,2-+$T! #02*.,&/!

F$#*>&,'(H!6;<!&(!.,)!1,.>$.&$.)!%,'!'$-'$($.)&.8!+,.8!1#*&.(!,%!

20+)&2,/*+!$>$.)(!P!G#*)!&.!)#$!4E5E!G,'+/!*'$!1*++$/!-+*.(E!

R.)$'!:;<!P!%0.1)&,.*+!2*'70-!+*.80*8$E!"#$!*&2!G&)#!:;<!&(!

),!/$>$+,-! )#$!.$T)! +$>$+!,%!/$(1'&-)&,.! +*.80*8$!0-!%',2!6;<H!

,.$! )#*)!1*.!/$(1'&F$!G#*)!(#,0+/!#*--$.! &.!*!20+)&2,/*+!*8$.)!

*)!G#*)!G$!1,0+/!1*++!*!%0.1)&,.*+!+$>$+!P!'$-'$($.)&.8!&.!$(($.1$!

G#*)!)#$!*8$.)Y(!F$#*>&,'9(=!(#,0+/!*1#&$>$!P!&)Y(!8,*+(E!!

"#$!34564!1,.(,')&02Y(!*--',*1#!),!)#&(!$%%,')! &(!F*($/!,.!,.$!

)$.$)!)#*)!&(!$T)'$2$+@!&2-,')*.)!%,'!)#$!(011$((!,%!*1#&$>&.8!)#$!

()*)$/! *&2(H! &E$E! &.1'$*($/! 1,++*F,'*)&,.H! $*($! ,%! (#*'&.8! '$(0+)(!

*./! *1)0*+!G,'7&.8! (@()$2(N! *! 1+$*'! ($-*'*)&,.!,%! '$-'$($.)*)&,.!

+*.80*8$! *./! )#$! -',1$(($(! )#*)! -',/01$! *./! 1,.(02$! )#$!

+*.80*8$E!"#&(! &(H! %,'! )#$!2,()!-*')H!-'*1)&1*++@!2,)&>*)$/!*./! &(!

F*($/!,.!*!+,.8!#&(),'@!*./!1*.H!&.!,0'!>&$GH!#$+-!7$$-!)#$!$%%,')!

,.! *! -',(-$',0(! )'*17E! 4! (011$((%0+! ($-*'*)&,.! 7$$-(! ,-$.! )#$!

-,((&F&+&)@!)#*)!*.@,.$!1*.!1'$*)$!)#$&'!,G.!-+*..&.8!2$1#*.&(2(E!

:,'! )#&(! ),! F$! -,((&F+$! )#$! %0)0'$!:;<!1*..,)! *./!20()! .,)! -0)!

1,.()'*&.)(!,.!)#$!7&./(!,%!-',1$(($(!)#*)!1,.(02$!*./!-',/01$!&)E!

Z#$)#$'! )#&(! &(! -,((&F+$! '$2*&.(! ),! F$! ($$.E!60)! (&.1$! )#$!2*&.!

'$($*'1#! %,10(! &.! *')&%&1&*+! &.)$++&8$.1$! *./! 1,220.&1*)&>$!

#02*.,&/(! ,.! )#$! ),-&1! ,%! 20+)&2,/*+! 8$.$'*)&,.! *'$! )#$!

2$1#*.&(2(!)#*)!1,.)',+!*./!-',/01$!)#$!F$#*>&,'(H!)#&(!20()!F$!*!

%'$$! >*'&*F+$H! 0.1,.()'*&.$/! F@! )#$! +*.80*8$(! )#*)! )#$! -',1$(($(!

G,'7!G&)#E!"#$! +*.80*8$H!:;<H!G&++!/$(1'&F$! )#$! &.)$.)&,.(! )#*)!

*.! *8$.)! 2*@! #*>$! &.! G#*)! &)! /,$(E! "#&(H! ,%! 1,0'($H! (#,0+/! F$!

-,((&F+$!),!/,!G&)#,0)!(*@&.8!*.@)#&.8!*F,0)!)#$!2$1#*.&(2(!)#*)!

*'$!'$[0&'$/!),!2*.&-0+*)$!)#,($!&.)$.(&,.(E!!

5)!&(!&2-,')*.)!),!%,++,G!)#$!F*(&1!&/$*!F$#&./!)#$!34564!$%%,')!,%!

+*@$'(! ,'! F*./(! P! )#*)! &(H! *! 8&>$.! 2*'70-! +*.80*8$! &(! .,)! ,.+@!

+&2&)$/! &.! )#*)! )#$'$! *'$! /$)*&+(! P! +,G$'U+$>$+! )#&.8(! P! )#*)! &)!

1*..,)! 9*./! (#,0+/! .,)=! '$-'$($.)H! )#$'$! G&++! *+(,! F$! +*'8$'! P!

#&8#$'!+$>$+!P!)#&.8(!)#*)!&)!(#,0+/!.,)!'$-'$($.)X!6;<!*./!:;<!

*'$!1,.()'*&.$/!),!F*./(!,%!,-$'*)&,.E!"#$($!F*./(!*'$!+&2&)$/!F@!

)&2$!*./!(1*+$H!)#*)!&(H!)#$!)&2$(1*+$(!1,>$'$/!F@!6;<!*'$!(2*++$'!

)#*.! )#,($! 1,>$'$/! F@! :;<E! <&7$G&($H! *(! G$! F0&+/! :;<! )#$'$!

2*@! F$! +*'8$! )&2$(1*+$(! %,'! G#&1#! :;<! G&++! F$! &.*--',-'&*)$X!

)#&(H!#,G$>$'H!'$2*&.(!),!F$!($$.E!

Z$! G&++! .,G! )0'.! ),! (,2$! ,%! )#$! #&(),'&1*+! -'$1$/$.)(! %,'! )#$!

10''$.)!:;<!$%%,')(E!

EA" /FDB&$:CB/G/:B$HB$:CB/%HI%H#HB$!$:CB/:B/;!B6!JF/5.!)#$!\2&'!*'1#&)$1)0'$H!,.!G#&1#!)#$!1,220.&1*)&>$!#02*.,&/!

I*./*+%! G*(! F0&+)! L]M! 9:&80'$! ]=H! *! .02F$'! ,%! &/$*(! G$'$!

-'$($.)$/! )#*)! '$+*)$! ),! )#$!-'$($.)! $%%,')E!"#$!2*&.!1,2-,.$.)(!

'$+$>*.)!),!:;<!&.1+0/$!)#$!3*-+/('4*5%)20%"!943=H!G#&1#!1,0+/!

'$1$&>$! *./! $T$10)$! 8,*+(! '$-'$($.)&.8! F,)#! %0.1)&,.*+! *./!

F$#*>&,'*+!(-$1&%&1*)&,.(E!!

!

F-*2,)/@K/$L)/;'01'98GM<-,/?'./5'7'39)/+8/,)'9N(-<)/8'5)N(+N

8'5)/5+0O),.'(-+0./?-(L/L2<'0/2.),.A/B+(-5)/;'01'98P./*'Q)/

,).7+01-0*/(+/(L)/L'01/*).(2,)R./-0(),7,)()1/8205(-+0/N/-A)A/

7+-0(-0*/N/?-(L-0/S'/L2<'0N/9-T)/1)9'>U/+8/VWWNXWW/<.A/

^.! )#$! -$'1$-)&,.! (&/$H! \2&'! #*/! *! ($)! ,%! -',1$(($(! 1*++$/!

620-+$/)#0' 7%&*"+,-/"&! )#*)! 1,0+/! *88'$8*)$! &.%,'2*)&,.! %',2!

8(+$/)#0'9%"*%,-/"&!9F,)#!)#$($!2,/0+$(!G,'7$/!G&)#!'$*+U)&2$!

-$'1$-)0*+!/*)*!8$.$'*)$/!F@!)#$!F$#*>&,'!,%!*!-$'(,.=X!)#$!,0)-0)!

,%! )#$($!G$'$! V(7$)1#$(W! P! /$(1'&-)&,.(! ,%! #02*.! F$#*>&,'! P! *)!

>*'&,0(! +$>$+(! ,%! /$)*&+E! RT*2-+$(! ,%! (,2$! ,%! )#$! #&8#$'U+$>$+!

/$(1'&-),'(!*'$!8&>$.!&.!"*F+$!]E!!

$'39)/@A/HY'<79)/L-*L),N9)O)9/1).5,-7(+,./-0/;'01'98/

!"#"$!%&'($)

&*+"$!%&'($)

,*$&"$!%&'($)

,*$&%-*.+%./%0112-)

/*3%&'($)

*22(133"$!%41)

!(11&)

!(11&%/*55"67)

!

D-,.!)#$!'$+&*F+$!/$)$1)&,.!,%!*.@!,%!)#$($!/$(1'&-),'(!9*./!)#$&'!

)$2-,'*+! &.)$'U'$+*)&,.(#&-(=H!7%*+)%"! 2,/0+$(! G,0+/! %&'$! 8,*+(!

%,'! F$&.8! *1#&$>$/! )#',08#!2,>$2$.)! ,'! (-$$1#! &.! )#$!I*./*+%!

*8$.)E! ! "#$! 41)&,.! 31#$/0+$'! G,0+/! '$1$&>$! )#$($! 8,*+(! *./!

8$.$'*)$! )#$!*--',-'&*)$!*.&2*)&,.!1,22*./(H! 9&E$E! $U2,>$(=E! 5.!

\2&'! )#$! 43! &(! )#$! +*()! (),-! F$%,'$! F*++&()&1! $T$10)&,.! ,%!

*.&2*)&,.E!",!$.*F+$!&.)$''0-)*F&+&)@H!)#$!*F&+&)@!),!1*.1$+!*1)&,.(!

[0&17+@! %,'! *.@!/$8'$$!,%! %'$$/,2H! )#$!43!G,0+/!.$>$'!1,22&)!

2,'$! )#*.! J__! 2(! *)! *! )&2$! ),! )#$! *.&2*)&,.! +$>$+! F$+,GE! "#$!

8,*+(! 8$.$'*)$/!F@! )#$!A$1&/$'(! 1,0+/! (-$1&%@! )#$! (#*-$`+,,7!,%!

*.! *1)&,.H! $E8E!5#():"#+&%:,#0$:!/";#")H!G#*)! &)! (#,0+/!*1#&$>$H!

$E8E!1"%%-H!,'!F,)#H!$E8E!1"%%-:5#,,+0<E!"#$!43!'$(,+>$/!)#$($!8,*+(!

/,G.! ),! )#$!8'*-#&1(! +$>$+!F@!($+$1)&.8!F$)G$$.!,-)&,.(!(01#!*(!

G#$)#$'!),!8'$$)!G&)#!G&.7H!.,//&.8!,'!G*&>&.8H!$)1EH!/,G.!),!)#$!

+$>$+!,%!-'&2&)&>$!*.&2*)&,.!1,22*./(E!"#0(H!)#$!43!#*./+$/H!&.!

,.$! -+*1$! G&)#! *! (&.8+$! 2$1#*.&(2H! F,)#! G#*)! G$! G,0+/! +*)$'!

'$%$'!),!*(!)#$!6;<!+$>$+!*./!:;<!+$>$+E!/

VA" F%!ZH#/CF/FDB&$:CB#/:B/%H!/"#$!8$.$'*+!*--',*1#!,%!I*./*+%`\2&'!G*(!2*&.)*&.$/!&.!*!+*)$'!

*8$.)! 1*++$/! BR4! 9:&80'$! J=H! *+)#,08#! BR4a(! *'1#&)$1)0'$!

'$-+*1$/! (#*'$/! /$(1'&-),'! F+*17F,*'/(! G&)#! *! %&T$/! 2$((*8&.8!

-&-$+&.$!)#*)!-*(($/!*',0./!*!20+)&2,/*+!="#$%!LJME!!!

!

!

F-*2,)/EA/%H!/?'./'/,)'9N).('()/'*)0(/5'7'39)/+8/<29(-<+1'9/

0'(2,'9/9'0*2'*)/*)0),'(-+0/'01/201),.('01-0*A/

4! 0($'! $>$.)! 8$.$'*)$/! *.! &.-0)! :'*2$! &.! BR4a(! (@()$2! )#*)!!

1,.)*&.$/!*!%&$+/!%,'!,F($'>$/!>&(0*+!,'!*0/&F+$!F$#*>&,'!*./!)G,!

%&$+/(! %,'! %0.1)&,.*+! &.)$'-'$)*)&,.(! ,%! )#$($! F$#*>&,'(N! 4!

9"/,/&+-+/(#0!&.)$'-'$)*)&,.!91,.)$.)!'$+*)$/=!*./!*.!>(-%"#*-+/(#0!

9-',1$((!'$+*)$/=!&.)$'-'$)*)&,.E!!"#$($!&.)$'-'$)*)&,.(!G$'$!*//$/!

F@! *.!8()%"&-#()+(1' 6/)20%! F$%,'$! *!7%*+&+/(' 6/)20%! G,0+/!

)#$.!0($! )#$! &.)$'-'$)*)&,.(! ),!1'$*)$!*.!*--',-'&*)$! '$(-,.($! %,'!

BR4E! !"#$! '$(-,.($!G*(! $.1,/$/! &.! *.!,0)-0)!:'*2$! (&2&+*'! ),!

)#$! &.-0)! :'*2$! &.! )#*)! &)! 1,.)*&.$/! )#$! (*2$! b',-,(&)&,.*+! *./!

5.)$'*1)&,.*+! %&$+/(H! G#&1#! G$'$! .,G! %&++$/! G&)#! 1,220.&1*)&>$!

%0.1)&,.(! )#*)! .$$/$/! ),! F$! '$*+&Q$/! F@! )#$! *8$.)E! ! "#$! ,0)-0)!

:'*2$! G*(! ($.)! )#',08#! *! ?%(%"#-+/(' 6/)20%! )#*)! 8$.$'*)$/!

*--',-'&*)$! F$#*>&,'(H! %0+%&++&.8! )#$! %0.1)&,.(! F@! -+*1&.8! *!

/$(1'&-)&,.!,%!)#,($!F$#*>&,'(!&.!*!(-$1&*+!,0)-0)!%&$+/E!!!

"#$'$%,'$!BR4a(!1$.)'*+!/$1&(&,.!2$1#*.&(2!,.+@!,-$'*)$/!,.!*!

%0.1)&,.*+!'$-'$($.)*)&,.!,%!)#$!0($'a(!&.-0)!*./!-',/01$/!,.+@!*!

%0.1)&,.*+!'$-'$($.)*)&,.!,%!#$'!1,220.&1*)&>$!&.)$.)E!!!

BR4a(!5.)$'*1)&,.*+!%0.1)&,.(!G$'$!&.!-*')!/'*G.!%',2!I*./*+%a(!

/$(1'&-),'(!*./!*'$!(#,G.!&.!"*F+$!JE!!!

$'39)/EA/:0(),'5(-+0'9/8205(-+0./-0/%H!/

!"#"$!%&'($)

&*+"$!%&'($))

,*$&"$!%&'($)

+115"$!%&'($)

2(855"$!%&'($)

6"3&1$"$!)@-/'#'&,%#A%"B'

,*$&"$!%0112-*.+))

!"#"$!%0112-*.+))

1951.&"$!)@&/$%'+(,2-B)

5(131$&'@;+-5+('*/(C%"&#-+/(#0')+&-#(*%B'

"$#"&"$!)@-/'&-#"-'#'*/(C%"&#-+/(B'

61*#"$!@-5%'*/(C%"&#-+/(B'

!

BR4a(!b',-,(&)&,.*+! %0.1)&,.(!G$'$! &.! )#$! %,'2!,%!4,%%*5'3*-&H!

G#&1#!G$'$!%,'!)#$!2,()!-*')!/,2*&.!/$-$./$.)H!F0)!/&>&/$/!&.),!

+$,%"#-+C%&H! +(-%""/1#-+C%&! *./! )%*0#"#-+C%&E! ! "#$'$! G*(! *+(,! *!

(-$1&*+!"+-2#0!1*)$8,'@!)#*)!1,.)*&.$/!*!1"%%-+(1!*./!*!!#"%;%00E/

XA" FZJ/:B/#I!%=/5.%+0$.1$/! F@! G,'7! ,.! BR4! *./! )#$! .$$/! %,'! (,2$)#&.8! 2,'$!

+&8#)G$&8#)H!6R4"!G*(!F0&+)!*(!*!),,+!%,'!8$.$'*)&.8!20+)&2,/*+!

1,U>$'F*+!F$#*>&,'!F*($/!,.!*.*+@Q&.8! )#$! )$T)! ),!F$!(-,7$.!LcME!!

D.+&7$! I*./*+%! *./! BR4H! 6R4"! ,.+@! /$*+)! G&)#! 20+)&2,/*+!

8$.$'*)&,.H! .,)! -$'1$-)&,.E! ! "#$! 2,()! 1,2-'$#$.(&>$!

&2-+$2$.)*)&,.! ,%! 6R4"! $T&()$/! *(! -*')! ,%! )#$! 3-*'7! (@()$2!

9:&80'$!c=H!G#&1#!*0),2*)$/!*>*)*'!F$#*>&,'! &.!*.!,.+&.$!>&')0*+!

$.>&',.2$.)!F*($/!,.!1#*)!2$((*8$(!$T1#*.8$/!F@!&)(!0($'(!LKME!!!

!

F-*2,)/VA/C09-0)/'O'(',./'2(+<'()1/-0/#7',T/3'.)1/+0/

8205(-+0'9/'00+('(-+0/+8/()Y(/)Y5L'0*)1/3>/2.),.A/

5.! )#&(! &2-+$2$.)*)&,.H! $*1#! 1#*)! 2$((*8$! 8,)! *.*+@Q$/! *./!

*..,)*)$/!&.!)$'2(!,%!>*'&,0(!/&(1,0'($!%0.1)&,.(!9:&80'$!K=E!!!

:'&&1(*$.1'&*%(%DE$#,FE'&,%#A%"DE,%"&/(FE;:.6*'31;'

'':&/141;:&'($'-<,%DE-#A%E;'

'''':*.&"8$;:$1,;))8&>$!!:<$1,;:<*.&"8$;'

'''':8-=1.&;))#&2!!:<8-=1.&;:<&'($;:<&/141;'

'':(/141;)

'''':&'($'-<,%DE1+C%E'-#"1%-DE,%"&/(GE;'

'''':145/*3"3'-<,%DE,5"#&%E;'

'''''':(101(1$.1'-<,%DEC+&2#0E'-#"1%-DE$#,H$+(%E;'

'''''''':(101(1$.1'-<,%DE-%I-2#0E'&/2"*%DE,%"&/(JE;'

'''''''''':8-=1.&'+)DE$#,H$+(%E;))(,2$'

'''''''''''':145/*3"3'-<,%DE;/")E;'

'''''''''''''':$1,;))8,+/!!:<$1,;'

'''''''''''':<145/*3"3;))))

'''''''''':<8-=1.&;)

'''''''':<(101(1$.1;:<(101(1$.1;:<145/*3"3;:<&'($;:<(/141;)

:<.6*'31;:<'&&1(*$.1;)

)

F-*2,)/XA/$L)/()Y(/[*-O)/L-</.+<)/*+91\/'2(+<'(-5'99>/

'00+('()1/-0/(),<./+8/1-.5+2,.)/8205(-+0/-0/#7',T/A/

!

"#$!/&(1,0'($!%0.1)&,.(!*..,)*)$/!F@!)#$!&2-',>$/!6R4"!,%!)#$!

3-*'7!(@()$2!*'$!(#,G.!&.!"*F+$!cE!

$'39)/VA/6-.5+2,.)/8205(-+0./-0/(L)/#7',T/FZJ/

&85".%3/"0&'

&'($)@-<,%&H'-#A%.'A%%,'/"'1+C%B'

.6*'31)@-<,%&H'(/"$#0.'%I*0#$#-+/('/"'K2%&-+/(B'

$1,)@0%I+*#0'1+C%((%&&B'

&/141)@0+(A'-/',"%C+/2&'2--%"#(*%B'

(/141'@(%;'*/(-"+L2-+/('-/'*/(C%"&#-+/(B'

145/*3"3)

.8$&(*3&)

(101(1$.1)@-<,%&H'C+&2#0'/"'-%I-2#0B'

"66'3&(*&1)@%0#L/"#-%'!%#-2"%'-5"/215'+002&-"#-+/(B'

!(8'$2"$!)@-<,%&H'"%K2%&-B'

"#$($!%0.1)&,.(!G,0+/!)#$.!F$!2*--$/!&.),!(0--,')&.8!.,.>$'F*+!

F$#*>&,'! ,.! )#$! '$1$&>&.8! $./! 9*11,'/&.8! ),! $T&()&.8! $2-&'&1*+!

/*)*!,.!%*1$U),U%*1$!/&(1,0'($=!%,'!*!%0++!20+)&2,/*+!/$+&>$'@E!

"#$! %0.1)&,.(! G$'$! /'*G.! %',2! )#$! +&)$'*)0'$! ,.! /&(1,0'($! *./!

1,.>$'(*)&,.! *.*+@(&(H! *./! '$-'$($.)! (,2$! ,%! )#$!2,()! 1,22,.!

$+$2$.)(! )#*)! 8&>$! '&($! ),! 1,.>$'(*)&,.*+! .,.>$'F*+! F$#*>&,'E!!

3,2$! ,%! )#$2! *'$! >(-%"#*-+/(#0! &.! .*)0'$! 9(01#! *(! )0'.=! G#&+$!

,)#$'(! *'$!9"/,/&+-+/(#0! 9(01#! *(! 1,.)'*()=E! !S,G$>$'H! (,2$! *'$!

/&%%&10+)! ),! 1+*((&%@! *11,'/&.8! ),! )#$($! 1*)$8,'&$(H! (01#! *(! )#$!

-',1$((!,%!8',0./&.8H!G#&1#!2*@!'$+@!,.!F,)#E!

5.! 6R4"! )#$! %0.1)&,.! *..,)*)&,.(! G$'$! /,.$! 0(&.8! d;<! )*8(!

-+*1$/! /&'$1)+@! G&)#&.! )#$! *..,)*)$/! )$T)E! ! "#$! )$'2! =2(*-+/('

6#"A2,'M#(12#1%' 9:;<=!G*(!0($/! ),!/$(1'&F$! )#$($! )*8(! &.! )#$!

3-*'7!(@()$2!),!1,.)'*()!)#$2!G&)#!)#$!($)!,%!)*8(!0($/!),!/$(1'&F$!

)#$!(0--,')&.8!N%5#C+/"!96;<=E!!"#&(!.*2&.8!,%!)#$!)G,!/&%%$'$.)!

)*8! ($)(! #*(! F$$.!2*&.)*&.$/! &.! )#$! 34564! %'*2$G,'7H! F0)! )#$!

*1)0*+!)*8!()'01)0'$!#*(!$>,+>$/E!

]A" $^H/H_CJ_:B;/FZJ/:B/#!:"!/b$'#*-(! ,.$! ,%! )#$! +*'8$()! /&%%$'$.1$(! F$)G$$.! 6R4"! )*8(! *./!

34564!)*8(!&(!)#*)!)#$!+*))$'!F'$*7(!%'$$!,%!)#$!()'&1)!#&$'*'1#&1*+!

,'/$'&.8! ,%! )*8(! G&)#! )#$! &.)',/01)&,.! ,%! 4<(*5' 9/+(-&! LeME!!

34564!d;<! /$(1'&-)&,.(! 1*.! F$! %+*)H!G&)#! ,'/$'&.8! 1,.()'*&.)(!

-',>&/$/! )#',08#! [email protected]#! *))'&F0)$(E! ! "#&(! *++,G(! -*')&*++@!

,>$'+*--&.8!)*8(E!

Z#&+$!6R4"!1,0+/!2*7$!)#$!*((02-)&,.!)#*)!)#$!&.-0)!G,0+/!F$!

)#$! )$T)! ),!F$! (-,7$.H!*./! )#$'$%,'$! )#*)!:;<!)*8(!1,0+/!(&2-+@!

F$! -+*1$/! *',0./! *--',-'&*)$! )$T)! $+$2$.)(H! )#$! 34564! :;<!

'$-'$($.)*)&,.!1*.!2*7$!.,!(01#!*((02-)&,.E! !"#$!8$.$'*)&,.!,%!

)#$! )$T)! &)($+%! 2*@! .,)! #*>$! ,110''$/! *)! )#$! )&2$! ,%! %0.1)&,.!

/$(1'&-)&,.E!

"#*)! &(!G#@! )#$! :;<! F'$*7U,0)! 8',0-! *)! )#$!B$@7C*>&7! 34564!

G,'7(#,-! &.! J__e! -',-,($/! ),! /&>&/$! :;<! )*8(! &.),! )G,! ($)(E!!

"#$!%&'()!($)!/$%&.$(!1$')*&.!F*(&1!%0.1)&,.*+!,'!($2*.)&1!8(+-&!)#*)!

*'$! *((,1&*)$/! G&)#! )#$! 1,220.&1*)&>$! $>$.)E! ! "#$! ($1,./! ($)!

&.1+0/$(!O,%"#-+/(&!)#*)!$(($.)&*++@!,-$'*)$!,.!-'$>&,0(+@!/$%&.$/!

0.&)(! ),! *--+@! 1$')*&.! %0.1)&,.*+! $%%$1)(! ,.! )#$2E! ! "#$! &.&)&*++@!

-',-,($/!0.&)(!*'$!+&()$/!&.!"*F+$!KE!

$'39)/XA/F-,.(/#!:"!/FZJ/7,+7+.'9K/D0-(./

5*(&"."5*$&)

&'($)

&85".)

51(08(4*&"#1)@&,%%*5'#*-B)

.8$&1$&)@)%-#+0%)',"/,/&+-+/(B)

!

"#$($!0.&)(!*'$!,'/$'$/!#$'$!%',2!)#$!G&/$()!&*/,%!),!)#$!(2*++$()!

&*/,%E!!"#*)!2$*.(!)#*)!)#$!G&/$()!(1,-$/!:;<!2*@!1,.)*&.!,.$!

,'!2,'$!,%!)#$!(2*++$'!(1,-$/!$+$2$.)(E!!"#$!^-$'*)&,.!)*8(!1*.!

*%%$1)! *.@! ,%! )#$($! 0.&)(! 9*./! )#$'$%,'$! )#$&'! (1,-$! G&++! >*'@!

8'$*)+@=E! ! 4! -'$+&2&.*'@! +&()! ,%! )#$! (088$()$/! ^-$'*)&,.! )*8(! &(!

-',>&/$/!&.!"*F+$!eE!

$'39)/]A/F-,.(/#!:"!/FZJ/7,+7+.'9K/C7),'(-+0./

145/*3"3)

.8$&(*3&)

"66'3&(*&"8$)

*001.&)

38."*6)@"%0#-+/(#0'1/#0&B)

.8!$"&"#1)@$%-#:*/1(+-+C%'%P1P')+!!+*20-<'/!',"/*%&&+(1B)

.1(&*"$&7)@,"/)2*%"Q&'*%"-#+(-<'/!'2(+-Q&'-"2-5B)

!

4.! $T*2-+$! ,%! #,G! *.! :;<! F+,17! 1,0+/! F$! 1,.()'01)$/! 0(&.8!

)#$($! )G,! ($)(! ,%! )*8(! G*(! 8&>$.! &.! LfME! ! "#$! $T*2-+$! &(!

'$-',/01$/!#$'$!&.!:&80'$!eE!!4(!)#$!34564!$%%,')!%,10($/!2,'$!

,.!6;<!/0'&.8!)#$!%,++,G&.8!-#*($H!)#&(!$*'+@!:;<!-',-,(*+!#*(!

.,)! F$$.! %+$(#$/! ,0)! (,! %*'H! *./! &(! )#$'$%,'$!>$'@!201#!G,'7! &.!

-',8'$((!

!

:>%)?10"$"$!)'$"&3)@)0"(3&)31&)%;)

:5*(&"."5*$&'+)DE#0+E'"/0%DE&,%#A%"E<;'

:5*(&"."5*$&'+)DE-"#+(%%E'"/0%DE#))"%&&%%E<;'

:&'($'+)DE-2"(FE'&-#"-DE-#A%E'%()DE1+C%E;'

'''''':&85".'+)DE-/,+*FE'-<,%DE(%;E;'

'''''''''''':51(08(4*&"#1'+)DE,%"!/"$FE'-<,%DE%(K2+"<E;'

'''''''''''''''''':.8$&1$&;1/#0'-"#+(%%'R'5%"%:<.8$&1$&;'

'''''''''''':<51(08(4*&"#1;)

'''''':<&85".;)

:<&'($;)

:>%)A51(*&"$!)8$)'$"&3)@)31.8$2)31&)%;)

:145/*3"3'-<,%DE(%;ES,%"!/"$FH5%"%:<145/*3"3;'

:*001.&'-<,%DE!%#"ES,%"!/"$FH1/#0:<38."*6;'

:38."*6'-<,%DE$#+(-#+(T)+&-#(*%ES-"#+(%%:<38."*6;'

F-*2,)/]A/!0/)Y'<79)/+8/'0/FZJ/1).5,-7(-+0/(L'(/<-*L(/,).29(/

-0/9)'0-0*/'?'>/'01/.7)'T-0*/[`L'(/',)/>+2/1+-0*/L),)a\/

bA" &CB&JD6:B;/%HZ!%=#/<,,7&.8! F*17! ,>$'! )#$($! '$+*)$/! -',C$1)(H! ,.$! )#&.8! &(! ()'&7&.8N!

"#$!%&'()! (@()$2(!%,10($/!,.! )#$!$(($.)&*+!2$1#*.&(2!,'!-',1$((!

,%! 2*&.)*&.&.8! '$*+U)&2$! /&*+,80$H! G#&+$! )#$! +*)$'! ,.$(! ()*')!

+,,7&.8! &.),! )#$! -'$($.)*)&,.! ,%! 1,.)$.)E! ! "#$'$! ($$2(! ),! F$! *!

'$+*)&>$+@! 8,,/! *8'$$2$.)! *F,0)! )#$! -',1$((! ,'! &.)$'*1)&,.*+!

%0.1)&,.(H! (,! -$'#*-(! )#&(! &(! *! 8,,/! -+*1$! ),! ()*')!G&)#! *! (#*'$/!

(-$1&%&1*)&,.E!!5)!&(!&2-,')*.)!)#*)!&.)$'*1)&,.*+!%0.1)&,.(!1,.)&.0$!

),!F$!%&'()U1+*((!1&)&Q$.(!&.!*.@!:;<!'$-'$($.)*)&,.!*./!)#*)!)#$@!

.,)! ,.+@! F$! 0($%0+! %,'! 8$.$'*)&,.! ,%! F$#*>&,'! F0)! *+(,! %,'!

&.)$'-'$)*)&,.!,%!F$#*>&,'E!

cA" %HFH%HB&H#/L]M& "#g'&((,.H!OE!BEH!]hhfE!?,220.&1*)&>$!S02*.,&/(N!4!

?,2-0)*)&,.*+!;,/$+!,%!b(@1#,(,1&*+!A&*+,80$!37&++(E!!

b#EAE!"#$(&(H!;*((*1#0($))(!5.()&)0)$!,%!"$1#.,+,8@H!;4E!

LJM& ?*(($++H!iEH!j&+#Ck+2((,.H!SEH!?#*.8H!OEH!6&172,'$H!"EH!?*2-F$++H!<E!*./!\*.H!SEH!]hhhE!B$[0&'$2$.)(!%,'!*.!

4'1#&)$1)0'$!%,'!R2F,/&$/!?,.>$'(*)&,.*+!?#*'*1)$'(E!5.!

?,2-0)$'!4.&2*)&,.!*./!3&20+*)&,.!lhh!9R0',8'*-#&1(!

3$'&$(=E!j&$..*H!40()'&*N!3-'&.8$'!j$'+*8E!

LcM& ?*(($++H!iEH!j&+#Ck+2((,.H!SEH!*./!6&172,'$H!J__]E!6R4"N!)#$!6$#*>&,'!RT-'$((&,.!4.&2*)&,.!",,+7&)E!5.!b',1$$/&.8(!

,%!4?;!35IIB4bSH!<,(!4.8$+$(H!408E!]JU]mH!-EKmmUKnfE!

LKM& j&+#Ck+2((,.H!SEH!J__eE!4082$.)&.8!^.+&.$!?,.>$'(*)&,.!)#',08#!40),2*)$/!A&(1,0'($!"*88&.8E!5.!b',1$$/&.8(!,%!

"#$!f)#!4..0*+!;&.&)'*17!,.!b$'(&()$.)!?,.>$'(*)&,.!*)!)#$!

cn)#!S*G*&&!5.)$'.*)&,.*+!?,.%$'$.1$!,.!3@()$2!31&$.1$(H!

i*.E!cUfH!J__eH!S&+),.!Z*&7,+,*!j&++*8$H!6&8!5(+*./H!S*G*&&H!

5RRRH!J__e!

LeM& O,--H!3EH!O'$..H!6EH!;*'($++*H!3EH!;*'(#*++H!4EH!b$+*1#*0/H!?EH!b&'7$'H!SEH!"#g'&((,.H!OEH!j&+#Ck+2((,.H!SEH!J__fE!

",G*'/(!*!?,22,.!:'*2$G,'7!%,'!;0+)&2,/*+!I$.$'*)&,.!

&.!R?4(N!"#$!6$#*>&,'!;*'70-!<*.80*8$E!!5.!b',1$$/&.8(!

,%!)#$!f)#!5.)$'.*)&,.*+!?,.%$'$.1$!,.!5.)$++&8$.)!j&')0*+!

48$.)(H!408E!J]UJcH!;*'&.*!/$+!B$@H!?4!

LfM& j&+#Ck+2((,.H!SE!*./!;*'($++*H!3EH!J__eE!3,1&*+!b$'%,'2*.1$!:'*2$G,'7E!5.!b',1$$/&.8(!,%!)#$!Z,'7(#,-!,.!;,/0+*'!

?,.()'01)&,.!,%!S02*.U<&7$!5.)$++&8$.1$!*)!)#$!J_)#!

o*)&,.*+!4445!?,.%$'$.1$!,.!4')&%&1&*+!5.)$++&8$.1$H!i0+@!

h)#H!b&))(F0'8#H!b4!

Workshop 2: Functional Markup

Documents

Transcript of Workshop 2: Functional Markup