Workshop 2: Functional Markup
Transcript of Workshop 2: Functional Markup
The Seventh International
Conference on Autonomous
Agents and Multiagent Systems
Estoril, PortugalMay 12-16, 2008
Workshop 2:Functional Markup Language
Dirk Heylen
Stefan Kopp
Stacy Marsella
Catherine Pelachaud
Hannes Vilhjálmsson
(Editors)
Why Conversational Agents do what they doFunctional Representations for Generating
Conversational Agent BehaviorThe First Functional Markup Language
Workshop
Dirk Heylen (University of TwenteStefan Kopp (University of Bielefeld)
Stacy Marsella (University of Southern California)Catherine Pelachaud (Paris VIII University and INRIA)
Hannes Vilhjalmsson (Reykjavk University)
April 9, 2008
!"#$%&'( #&)( *&+,"-,"./�( 1+#&2,( %&( 3,0#+%/�( 4',&+.!
!
"#$%&'(!)#*+$%,-!
.%,&'-/0&-,1!21#3-,0#&(!4%55-6-!%7!4%$89&-,!/1:!;17%,$/&#%1!<*#-1*-!
=>?!@91&%1!A3-B!CD@E?EB!)%0&%1B!FA!?EGGH!
I#*+$%,-J**0K1-9K-:9!!
!!
!
!
ABSTRACT
In this paper, I describe the concepts of interpersonal stance and
conversational frame, and why they are important functions to
represent for embodied conversational agents in general, but
especially those designed for social and relational interactions
with users.
Categories and Subject Descriptors
H5.2 [Information Interfaces and Presentation]: User Interfaces—
Evaluation/methodology, Graphical user interfaces
General Terms
Algorithms, Design, Human Factors, Standardization, Theory.
Keywords
Relational agent, embodied conversational agent, virtual agent,
conversational frame, contextualization cue.
1. INTRODUCTION The vast majority of dialogue systems developed to date
(embodied and otherwise) have been designed to engage people in
a strictly collaborative, task-oriented form of conversation in
which the communication of propositional information is the
primary, if not only, concern. As our agents expand their
interactional repertoires to include social interaction, play, role-
playing, rapport-building, comforting, chastising, encouraging
and other forms of interaction they will need both explicit ways of
representing these kinds of conversation internally and ways of
signaling to users that a new kind of interaction has begun.
Various social scientists have coined the term conversational
frame to describe these different forms of interaction .
Another emerging trend in embodied agent research is the desire
to model relational interactions, in which one of the objectives is
the establishment of rapport, trust, liking, therapeutic alliance, and
other forms of social relationship between a user and an agent [6]. One of the most important types of behavioral cues in these
interactions are those which display social deixis, or interpersonal
stance, in which one person exhibits their presumed social
relationship with their interlocutor by means of behaviors such as
facial displays of emotion, proxemics, and overall gaze and hand
gesture frequency.
While these two phenomena—framing and interpersonal stance—
have very different conversational functions, their effect on verbal
and nonverbal conversational behavior is very similar. As with
the effects of affective state or attitude, they have a global impact
on verbal and nonverbal behavior, affecting both the choice of
whether a given behavior is exhibited or not (e.g., a particular
hand gesture) as well as the quality of behaviors selected.
In this paper, I discuss each of these conversational functions,
reviewing work from linguistics, sociolinguistics and the social
psychology of personal relationships, and motivate their use in
virtual agents in a health counseling domain. I then discuss the
current implementation of these conversational functions in the
health counseling agents my students and I have been developing
over the last several years, and desiderata for including these in
the emerging Functional Markup Language (FML) specification.
2. FRAMING Gregory Bateson introduced the notion of frame in 1955, and
showed that no communication could be interpreted without a
meta-message about what was going on, i.e., what the frame of
interaction was [4]. He showed that even monkeys exchange
signals that allow them to specify when the "play" frame is active
so that hostile moves are interpreted in a non-standard way.
Charles Fillmore (1975) defined frame as any system of linguistic
choices associated with a scene (where a scene is any kind of
coherent segment of human actions) [11]. Gumperz (1982)
described this phenomena (he called contextualization) as
exchanges representative of socio-culturally familiar activities,
and coined "contextualization cue" as any aspect of the surface
form of utterances which can be shown to be functional in the
signaling of interpretative frames [14]. Tannen went on to define
conversational frames as repositories for sociocultural norms of
how to do different types of conversation, such as storytelling,
teasing, small talk, or collaborative problem-solving talk [24].
The sociocultural norms take the form of assumptions, scripts
(prototypical cases of what one should do), and constraints (what
one ought not to do). These parallel the description of topics that
can be taken for granted, reportable (talked about; relevant) or
excluded based on sociocultural situations, as described in [22]. Scripts can dictate parts of interactions explicitly (as in ritual
greetings), describe initiative or turn-taking strategies (e.g., the
entry and exit transitions in storytelling and the imperative for the
storyteller to hold the floor [15]) or describe the obligations one
has in a given situation (as done in [26] for the collaborative
task-talk frame).
Padgham, Parkes, Mueller and Parsons (eds.): Proc. of AAMAS 2008,
Estoril, May, 12-16, 2008, Portugal, pp. XXX-XXX
Copyright ! 2008, International Foundation for Autonomous Agents
and Multiagent Systems (www.ifaamas.org). All rights reserved.
The contextualization cues must be used by an agent to indicate
an intention to switch frames. While many contextualization cues
are nonverbal (see [14] and most of [3]), there are many
examples of subtle linguistic cues as well (people rarely say “let’s
do social chat now”). Often these can be ritualized or
stereotypical opening moves or topics, for example a question
about the weather or immediate context for small talk [21], or a
story initial cue phrase ("Oh, that reminds of when…") [10].
In many ways, frames act like recipes in the SharedPlans theory
[12]. They are instantiated in response to a shared goal of the
interlocutors to work towards satisfaction of the goal. They
specify sub-goals that must satisfied, and they are placed in the
intentional structure (plan tree) in the same manner as recipes to
indicate embedding relationships among discourse segments.
However, there are many significant differences between frames
and recipes. First, while the discourse segment purposes [13] associated with recipes are pushed onto the focus stack, and (in
most situations) only the top of this stack is inspected during
generation or interpretation, the same is not true for frames.
Frames can also be nested (for example, interlocutors within a
storytelling frame embedded in a small talk frame embedded in a
doing conversation frame). But, whereas the intentional state
reflects the why of a dialogue and the attentional state reflects the
what, the set of nested frames reflects the how; that is the set of
sociocultural norms and conventions in use. Further, the top-level
frame is not the only one used; conventions and assumptions from
all enclosing frames are still in effect unless overriden by those
higher up on the stack.
3. INTERPERSONAL STANCE One way in which language can be used to set relational
expectations is through social deixis, or what Svennevig calls
“relational contextualization cues” [23], which are “those aspects
of language structure that encode the social identities of
participants…or the social relationship between them, or between
one of them and persons and entitites referred to” [17]. Politeness
strategies fall under this general category (facework strategies are
partly a function of relationship [7]), but there are many other
language phenomena which also fit, including honorifics and
forms of address. Various types of relationship can be
grammaticalized differently in different languages, including
whether the relationship is between the speaker and hearer as
referent, between the speaker and hearer when referring to
another person or entity, between the speaker and bystanders, or
based on type of kinship relation, clan membership, or relative
rank [17]. One of the most cited examples of this is the tu/vous
distinction in French and other languages. For exmaple, Laver
encoded the rules for forms of address and greeting and parting in
English as a (partial) function of the social relationship between
the interlocutors, with titles ranging from professional forms (“Dr.
Smith”) to first names (“Joe”) and greetings ranging from a
simple “Hello” to the more formal “Good Morning”, etc [16].
Forms of language may not only reflect existent relational status,
but may be used to negotiate changes in the relationship, by
simply using language forms that are congruent with the desired
relationship. Lim observed that partners may change their
facework strategies in order to effect changes in the relationship
[18]. And, according to Svennevig:
The language forms used are seen as reflecting a
certain type of relationship between the interlocutors.
Cues may be used strategically so that they do not
merely reflect, but actively define or redefine the
relationship. The positive politeness strategies may
thus … contribute to strengthening or developing the
solidarity, familiarity and affective bonds between the
interactants. The focus is here shifted from
maintaining the relational equilibrium toward setting
and changing the values on the distance parameter
(Svennevig, 1999, pg. 46-47).
In terms of nonverbal behavior, the most consistent finding in this
area is that the use of nonverbal "immediacy behaviors"—close
conversational distance, direct body and facial orientation,
forward lean, increased and direct gaze, smiling, pleasant facial
expressions and facial animation in general, nodding, frequent
gesturing and postural openness—projects liking for the other and
engagement in the interaction, and is correlated with increased
solidarity (perception of “like-mindedness”) [2,19]. Other
nonverbal aspects of "warmth" include kinesic behaviors such as
head tilts, bodily relaxation, lack of random movement, open
body positions, and postural mirroring and vocalic behaviors such
as more variation in pitch, amplitude, duration and tempo,
reinforcing interjections such as "uh-huh" and "mm-hmmm",
greater fluency, warmth, pleasantness, expressiveness, and clarity
and smoother turn-taking [1]. Research on the verbal and
nonverbal cues associated with conversational “rapport” has also
been investigated [8,25]..
4. CURRENT IMPLEMENTATION The health counseling agents we are currently developing (e.g.,
[5]) use the BEAT text-to-embodied-speech translator [9]. However, the concepts and features we have implemented within
BEAT would map equally well to FML.
4.1 Interpersonal Stance Functions As discussed above, one of the most consistent findings in the
area of interpersonal attitude is that immediacy behaviors—close
conversational distance, direct body and facial orientation,
forward lean, increased and direct gaze, smiling, pleasant facial
expressions and facial animation in general, nodding, and
frequent gesturing—demonstrate warmth and liking for one’s
interlocutor and engagement in the conversation. BEAT was
extended so that these cues would be generated based on whether
the agent’s attitude towards the user was relatively neutral or
relatively warm.
Since BEAT is designed to over-generate, and produce nonverbal
behaviors at every point in an utterance that is sanctioned by
theory, attitudes are effected primarily by reducing the number of
suggested nonverbal behaviors, as appropriate. For example, in a
warm stance (high immediacy), fewer gaze away suggestions are
generated, resulting in increased gaze at the interlocutor, whereas,
in the neutral stance (low immediacy), fewer facial animation
(eyebrow raises and headnods) and hand gesture
Table 1. Effects of Stance and Frame on Nonverbal Behavior.
Frequencies are relative to baseline BEAT behavior.
Proximity of 0.0 is a full body shot (most distant); 1.0 is a close up shot on the face.
Relational Stance
Frame High Immediacy
(Warm)
Low Immediacy
(Neutral)
TASK Proximity=0.2
Neutral facial expression
Less frequent gaze aways
Proximity=0.0
Neutral facial expression
Less frequent gestures
Less frequent headnods
Less frequent brow flashes
SOCIAL Proximity=0.2
Smiling facial expression
Less frequent gaze aways
Proximity=0.0
Smiling facial expression
Less frequent gestures
Less frequent headnods
Less frequent brow flashes
EMPATHY Proximity=1.0
Concerned facial expression
Slower speech rate
Less frequent gaze aways
Proximity=0.5
Concerned facial expression
Slower speech rate
Less frequent gestures
Less frequent headnods
Less frequent brow flashes
ENCOURAGE Proximity=0.5
Smiling facial expression
Less frequent gaze aways
Proximity=0.1
Smiling facial expression
Less frequent gestures
Less frequent headnods
Less frequent brow flashes
Low Immediacy
Task Frame
High Immediacy
Empathy Frame
Low Immediacy
Encourage Frame
High Immediacy
Task Frame
High Immediacy
Encourage Frame
High Immediacy
Social Frame
Low Immediacy
Task Frame
High Immediacy
Empathy Frame
Low Immediacy
Encourage Frame
High Immediacy
Task Frame
High Immediacy
Encourage Frame
High Immediacy
Social Frame
Figure 1. Example Effects of Stance and Frame on Proximity and Facial Expression for
the “Laura” Health Counseling Agent
Low Immediacy
Task Frame
High Immediacy
Empathy Frame
Low Immediacy
Encourage Frame
High Immediacy
Task Frame
High Immediacy
Encourage Frame
High Immediacy
Social Frame
Low Immediacy
Task Frame
High Immediacy
Empathy Frame
Low Immediacy
Encourage Frame
High Immediacy
Task Frame
High Immediacy
Encourage Frame
High Immediacy
Social Frame
Figure 1. Example Effects of Stance and Frame on Proximity and Facial Expression for
the “Laura” Health Counseling Agent
suggestions are generated. Such cues that are encoded through
relative frequency of behavior are currently implemented by
means of a StanceManager module which tracks the relational
stance for the current utterance being processed, and is consulted
by the relevant behavior generators at the time they consider
suggesting a new behavior. Centralizing this function in a new
module was important for coordination—since attitude (and
emotion in general) affects all behaviors systemically.
Modifications to baseline BEAT behavior were made at the
generation stage rather than the filtering stage, since at least some
of the behaviors of interest (e.g., eyebrow raises) are generated in
pairs and it makes no sense to filter out a gaze away suggestion
without also filtering out its accompanying gaze towards
suggestion.
Relational stance affects not only whether certain nonverbal
behaviors occur (i.e. their frequency), but the manner in which
they occur. To handle this, the behavior generation module
consults the StanceManager at animation compilation time to get
a list of modifications that should be applied to the animation to
encode manner (the “adverbs” of behavior). Currently, only
proximity cues are implemented in this way, by simply mapping
the current relational stance to a baseline proximity (camera shot)
for the agent, however, in general these modifications should be
applied across the board to all aspects of nonverbal behavior and
intonation (ultimately using some kind of animation blending, as
in [20]).
Currently, interpersonal stance is indicated functionally via an
attribute in the root-level UTTERANCE tag that simply specifies
what the relational stance is for the current utterance being
generated. For example:
<UTTERANCE STANCE=”WARM”>Hi there.</UTTERANCE>
The generators for gaze, gesture, headnods, and eyebrow
movement consult the StanceManager at the time they are about
to suggest their respective behaviors, and the StanceManager tells
them whether they can proceed with generation or not.
4.2 Framing Functions As mentioned above, people clearly act differently when
they are gossiping than when they are conducting a job interview,
not only in the content of their speech but in their entire manner,
with many of these “contextualization cues” encoded in
intonation, facial expression and other nonverbal and paraverbal
behavior.
Contextualization cues are currently implemented in the
StanceManager. Conversational frames are marked in the input
text using XML tags, such as the following:
<UTTERANCE><EMPATHY/>Sorry to hear that you’re stressed out.</EMPATHY></UTTERANCE>
During translation of the utterance into “embodied speech”, the
behavior generation module keeps track of the current frame and
when it detects a change in frame it consults the StanceManager
for the animation instructions which encode the requisite
contextualization cues. We have implemented four conversational
frames for our health counseling agents, based on empirical
studies of human counselor-patient interactions: TASK (for
information exchange), SOCIAL (for social chat and small talk
interactions), EMPATHY (for comforting interactions), and
ENCOURAGE (for coaching, motivating and cheering up
interactions).
4.3 Combined Influence The interpersonal stance and conversational frame specifications
are combined within the StanceManager to yield a final set of
modifications to behavior generation and animation modulation,
as shown in Table 1. Figure 1 shows several examples of the
effects of stance and frame on proximity and facial expression.
For example, in the high immediacy, ENCOURAGE frame
condition (lower left cell of Table 1) the agent is displayed in a
medium shot (half way between a wide, full body shot and a close
up shot), has a smiling facial expression, and does 50% fewer
gaze aways than the default BEAT behavior (thereby spending
more time looking at the user). Most of the parameters specified
in Table 1 are design-based (i.e. ad hoc) and ultimately need to be
grounded in human behavior from relevant empirical studies. In
addition, more general and principled methods for combining the
influence of such functional specifications need to be developed.
5. SUMMARY As our agents leave the confined world of information-deliverers,
they will need the ability to signal the kinds of interactions they
are initiating with users and the level of relationship they are
expecting to participate in. The representation of conversational
frame and interpersonal stance are thus important elements of a
Functional Markup Language for current and future
conversational agents.
6. REFERENCES [1] Andersen, P. and Guerrero, L. 1998. The Bright Side of
Relational Communication: Interpersonal Warmth as a
Social Emotion. In P. Andersen and L. Guerrero, Eds.
Handbook of Communication and Emotion. Academic Press,
New York, pp. 303-329.
[2] Argyle, M. 1988 Bodily Communication. Methuen & Co.
Ltd, New York.
[3] Auer, P. and Luzio, A. d. 1992 The Contextualization of
Language. John Benjamins Publishing, Philadelphia.
[4] Bateson, G. 1954 A theory of play and fantasy. Steps to an
ecology of mind. Ballantine, New York.
[5] Bickmore, T. and Pfeiffer, L. 2008. Relational Agents for
Antipsychotic Medication Adherence CHI'08 Workshop on
Technology in Mental Health.
[6] Bickmore, T. and Picard, R. 2005. Establishing and
Maintaining Long-Term Human-Computer Relationships.
ACM Transactions on Computer Human Interaction. 12, 2,
293-327.
[7] Brown, P. and Levinson, S. C. 1987 Politeness: Some
universals in language usage. Cambridge University Press,
Cambridge.
[8] Cassell, J., Gill, A., and Tepper, P. 2007. Coordination in
Conversation and Rapport. ACL Workshop on Embodied
Natural Language, pp. 40-50.
[9] Cassell, J., Vilhjálmsson, H., and Bickmore, T. 2001. BEAT:
The Behavior Expression Animation Toolkit. SIGGRAPH
'01, pp. 477-486.
[10] Ervin-Tripp, S. and Kuntay, A. 1997. The Occasioning and
Structure of Conversational Stories. In T. Givon, Ed.,
Conversation: Cognitive, communicative and social
perspectives. John Benjamins, Philadelphia, pp. 133-166.
[11] Fillmore, C. 1975. Pragmatics and the description of
discourse. In P. Cole, Ed., Radical pragmatics. Academic
Press, New York, pp. 143-166.
[12] Grosz, B. and Kraus, S. The Evolution of SharedPlans. In A.
Rao and M. Wooldridge, Eds. Foundations and Theories of
Rational Agency.
[13] Grosz, B. and Sidner, C. 1986. Attention, Intentions, and the
Structure of Discourse. Computational Linguistics. 12, 3,
175-204.
[14] Gumperz, J. 1977. Sociocultural Knowledge in
Conversational Inference. In M. Saville-Troike, Ed.,
Linguistics and Anthroplogy. Georgetown University Press,
Washington DC, pp. 191-211.
[15] Jefferson, G. 1978. Sequential aspects of storytelling in
conversation. In J. Schenkein, Ed., Studies in the
organization of conversational interaction. Academic Press,
New York, pp. 219-248.
[16] Laver, J. 1981. Linguistic routines and politeness in greeting
and parting. In F. Coulmas, Ed., Conversational routine.
Mouton, The Hague, pp. 289-304.
[17] Levinson, S. C. 1983 Pragmatics. Cambridge University
Press, Cambridge.
[18] Lim, T. 1994. Facework and Interpersonal Relationships. In
S. Ting-Toomey, Ed., The challenge of facework: Cross-
cultural and interpersonal issues. State University of New
York Press, Albany, NY, pp. 209-229.
[19] Richmond, V. and McCroskey, J. 1995. Immediacy. In
Nonverbal Behavior in Interpersonal Relations. Allyn &
Bacon, Boston, pp. 195-217.
[20] Rose, C., Bodenheimer, B., and Cohen, M. 1998. Verbs and
Adverbs: Multidimensional motion interpolation using radial
basis functions. IEEE CGAA(?). Fall,
[21] Schneider, K. P. 1988 Small Talk: Analysing Phatic
Discourse. Hitzeroth, Marburg.
[22] Sigman, S. J. 1983. Some Multiple Constraints Placed on
Conversational Topics. In R. T. Craig and K. Tracy, Eds.
Conversational Coherence: Form, Structure and Strategy.
Sage Publications, Beverly Hills, pp. 174-195.
[23] Svennevig, J. 1999 Getting Acquainted in Conversation.
John Benjamins, Philadephia.
[24] Tannen, D. 1993. What's in a Frame? Surface Evidence for
Underlying Expectations. In D. Tannen, Ed., Framing in
Discourse. Oxford University Press, New York, pp. 14-56.
[25] Tickle-Degnern, L. and Rosenthal, R. 1990. The Nature of
Rapport and Its Nonverbal Correlates. Psychological Inquiry.
1, 4, 285-293.
[26] Traum, D. and Allen, J. 1994. Discourse Obligations in
Dialogue Processing. ACL '94.
(
(
Modular definition of multimodal ECA communication acts to improve dialogue robustness and depth of intention
Alvaro Hernández, Beatriz López, David Pardo, Raúl Santos,
Luis Hernández
Signal, Systems and Radio communications Department, Universidad Politécnica de Madrid
(UPM) - Madrid, 28040, Spain [email protected]
José Relaño, Mª Carmen Rodríguez
Telefónica I+D, Spain [email protected]
Abstract
In this paper we propose a modular structure to define communication acts with verbal and nonverbal elements inspired in the SAIBA model. Our modular structure is a conceptual interpretation of the functional features of a multimodal interaction platform we have developed, with an embodied conversational agent (ECA) that implements verbal and gestural communication strategies aimed at minimising robustness and fluency problems that are typically encountered in spoken language dialogue systems (SLDSs). We conclude that it is useful to add a pre-verbal level on top of the FML-BML scheme in the SAIBA framework, and we propose a category extension for FML to account for communication elements that have to do with the speaker’s non-declared intentions.
Keywords: Embodied conversational agent, communication act, intentions, literality, multimodality, dialogue robustness, functional markup language.
1 INTRODUCTION The increasing presence of spoken language dialogue systems and embodied conversational agents on the interfaces of new “in-home” and videotelephony digital services is bringing to the fore a number of typical problems with dialogue robustness and fluency [1] as well as new ones related to the increasingly multimodal character of these systems. In our research efforts we are paying particular attention to the effects of using visual communication channels, in particular attaching a human-like animated figure to an SLDS (thus upgrading it to an ECA) [2], with a view not only to enrich the overall communication act, but also to improve dialogue flow. A few potential benefits ECAs offer the dialogue, such as increased efficiency in turn management ([3], [4]) and better error recovery ([5], [6], [7]), have already been identified by various leading authors in the field ([8], [9], [10], [11]).
According to Poggi [12], an important thing to bear in mind when designing a system that features a conversational agent with expressive communication abilities is to define how the ECA’s acts of communication are constructed as coordinated verbal and nonverbal messages. This is currently a hot area of research, and a most noteworthy effort is that behind the SAIBA framework [13] to define and standardize ECA verbal and gestural communication.
In this paper we propose a modular structure to define communication acts with verbal and nonverbal elements. Our proposal emerges from efforts to conceptually adapt the definition of an ECA engine we have developed to be used in a variety of application domains to a SAIBA-like modular “model.” As a result we suggest adding a communication definition level above that defined in FML, and we propose a category expansion in FML to express a certain kind of communication intentions, as we shall see in Section 2.
The paper is structured as follows: In Section 2 we describe our modular approach to forming ECA communication acts. In Section 3 we propose an adaptation of our communication act generation scheme to FML, expanding the latter as we have seen appropriate. Finally, in Section 4 we sum up the main points of what our approach offers.
2 SYSTEM STRUCTURE
Overall, a multimodal interaction system can be thought of as a black box that receives input information from the user through a variety of modes of interaction and produces an information output, also choosing a combination of interaction modes, as a reaction to the input. How the system actually reacts, precisely what information is provided as output and how it is provided will depend on a variety of contextual parameters, most importantly, of course, on the application that motivates the interaction and the associated communication goals (by which, here, we mean the major goals directly related to the overall object of the interaction, not the message-bound communication goals that may exist at any particular moment during the interaction). Indeed, we believe that contextual parameters, and not merely the user’s input, affect the communication goals themselves, and therefore should be taken into account when designing the structure of the multimodal output generation system.
2.1 Interaction scenarios
We have designed our conceptual multimodal interaction platform to be flexible enough to be used for a variety of purposes. In particular, we have implemented three distinct (though combinable) functions. The first is to handle a spoken dialogue-driven application to control household devices with user speech as the sole input and a human-like avatar producing speech and gestures (with body and face) as output ([2], [14]). The second is to provide a virtual “companion” (which is being developed within the activities of the COMPANIONS project [15]) with multimodal interaction capability. The idea of “companion” is for it to be a sort of virtual agent that “knows” the user enough to make suggestions in a variety of areas such as what film to watch or what to cook for dinner. Interaction here is more flexible: as input we may have any combination of speech, text, and taps and strokes on a touch-screen, with the user’s physical position in the room as a context parameter. The third scenario is a spoken dialogue-driven biometrics system for secure access to a variety of services and applications such as the two previously mentioned. Here the input is the user’s speech and the output is provided by the ECA’s speech and gestures, just as in the household device control scenario.
2.2 Main interaction modules
The multimodal interaction platform we have conceived in order to handle multimodal expression of communication goals conceptually has three main internal modules, as shown in Figure 1: the interaction manager, the phrase generator and the behaviour generator.
2.2.1 The interaction manager
First the user’s input, which may be simultaneously or sequentially multimodal (users may use speech, text, and haptic modes of interaction), is captured and fed to the interaction manager module, together with the contextual parameters that are to be taken into account. These include the following:
• Knowledge of the particular interaction scenario and application in which the current interaction is framed. • Knowledge of the user’s current interaction capabilities. For instance, the user’s voice may be different than
usual, or he/she might be having difficulty in speaking, in which case it may be wise to adopt an interaction strategy that requires the user to speak as little as possible.
• Knowledge of the user’s emotional state. For instance, it is useful to know whether the user is getting frustrated (by analysing the physical characteristics of the user’s voice, his/her choice of words or the flow of the dialogue) since such knowledge may be used to put together specific interaction strategies (for instance to try to reduce user frustration), which will generally have a gestural component.
Taking this information the interaction manager analyses the specific situation the interaction is in at each particular moment and produces a communication intention base (CIB). The CIB defines the ECA’s response on a pre-verbal level. It is composed of three elements: interaction control, open discourse and non-declared intentions.
• The interaction control element defines turn management (i.e., turn offering, turn giving, turn requesting and turn taking) and theme structure (i.e., how the various content elements will be put together following a discourse strategy).
• The open discourse (or open communication) element deals with the literal meaning the system wishes to communicate. In other words, it defines what is to be communicated in words.
• The non-declared communication element describes an intention –with regard to the system’s interaction goals– behind the literal meaning of the message. Note that there may be hidden intentions that are quite departed from the literal meaning communicated!
Both the open and the non-declared parts may have verbal and gestural components, but in the actual system we have designed the interaction manager module produces a simplified CIB in which the open communication level is associated solely with a verbal intention (an abstract representation of a meaning that the next module, the phrase generator, will put into words) and the non-declared level sets the ECA’s general attitude and emotional response, as well as certain intentions or goals that the gestures ultimately performed should aim to achieve.
Here is a brief example to clarify these ideas. Suppose we are using the biometric access application and trying to verify our identity through voice recognition, and suppose the system doesn’t positively recognize us at the first attempt, after providing the system with a sample of our voice (by answering a question the system asked us, for example). If we are told that there’s been a verification failure we are likely to become somewhat frustrated and perhaps anxious, which besides being undesirable states of mind in themselves may also affect our voice in further attempts making it decreasingly likely that the system will manage to recognise us. In such a situation the best thing the system can do is to have us provide new voice samples while trying to ensure good sample quality by getting us to speak calmly. A possible strategy the interaction manager might adopt to achieve this is, firstly, to hide from us the fact that a recognition failure has occurred and make us speak again as if it were all part of the normal verification process (of course, experienced users might realise that something odd is going on!), and secondly, to make the verbal request with a certain attitude (e.g., calmness) and complement it with an ECA gesture sequence designed with getting the speaker to focus and stay calm in mind. These general attitudinal and general gestural indications would constitute the non-declared part of the CIB provided by the interaction manager when adopting this specific interaction strategy, while the system’s explicit verbal intention would constitute the open part.
Note that the actual gesture sequence corresponding to the general gestural indications given by the interaction manager will be determined later. This provides flexibility, very much in the FML-BML vein, to allow for pursuing the same goal (gestural intention) with different gesture sequences as the context or culture may require.
2.2.2 The phrase generator
The communication intention base generated by the interaction manager is passed on to the phrase generator. Admittedly, this is something of a misnomer because the function of this module is not only to generate phrases but also to tag the text with specific gestural indications (as opposed to the general gestural indications of the non-declared part of the CIB). We call the collection of tags pointing to their corresponding places in the text the behaviour descriptor. As for the verbal intention conveyed in the CIB, it is now converted into text in the form of a set of successive linguistic units.
The output of the phrase generator is, thus, functionally similar to FML, although our implementation doesn’t follow any FML specifications. We may stress two peculiarities of our approach, however:
1. The flow of the interaction is entirely determined by the interaction manager. Hence, although all tags are introduced at this stage (since no text exists before), the composition of interaction-level tags, such as those related to turn management for instance, is already established in the CIB.
2. For our scheme to be in greater harmony with FML, the latter should include tag categories corresponding to our non-declared communication level, so that this information may be carried down with all the rest to the gesture implementation stage. In Section 3 we explain these tag categories in greater detail.
2.2.3 The behaviour generator
Finally, the behaviour generator concatenates the linguistic units produced by the phrase generator and translates the gesture tag structure into specific ECA body gestures and facial expressions to be performed in synchrony with the verbal rendition of the text. Thus the ECA’s behaviour is assembled.
2.3 Example: acknowledgement of misunderstanding
In order to better illustrate the functional flow of the proposed modular structure of ECA verbal + nonverbal communication act generation, we present an example of dialogue a dialogue strategy for use when the system realises it previously misunderstood the user. We describe the overall interaction scenario, and briefly sketch the situation, context of interaction, response strategy and output ultimately offered to the user.
Motivating situation: A critical situation arises when the system fails to correctly understand something the user has said, more precisely, when the system believes it has understood the user’s utterance, but in fact the user has said something else. If the user tries to tell the system that it has misunderstood, or if he/she tries to correct the misunderstanding by repeating or rephrasing, the system will (hopefully!) realise what has happened. This is crucial since in such situations (especially if they occur, as they usually do, fairly often) there is a risk of the user losing confidence in the system’s capabilities and becoming irritated, thus making it more difficult for the system to understand his/her subsequent utterances. This is one reason why error cycles often ensue, and it is important for the system to try to cut them short by adopting an adequate strategy.
Scenario: John wants to cook a Spanish omelette. He doesn’t remember what the measures for the ingredients are, so he asks the cooking assistant to give him the recipe. The fume extractor is on, and in this noisy environment the speech engine recognizes ‘spaghetti.’ The system replies following an explicit confirmation strategy and asks John whether indeed he wants to prepare spaghetti. To this John simply answers “No.”
Response strategy and generation of the corresponding ECA communication act: When the Interaction Manager realizes there has been a speech recognition error, the immediate objective is to try to keep the user in a positive attitude while moving on with the interaction. Taking into account the relevant context variables, it decides to implement a communication act in two parts: the first is to show “remorse” for having misunderstood, the second to encourage to the user to repeat the utterance trying not to annoy him/her. (The system needs the user to repeat the utterance since in this case he/she has only indicated that there was a misunderstanding without correcting the information at the same time, which would call for a different response strategy, such as implicit or explicit confirmation). These guidelines are passed on to the Phrase Generator, which then selects the suitable linguistic units and the associated ECA behaviour descriptors. In this case, the phrases generated might be “I am sorry” and “Could you repeat again?”, and Remorse and Interest their respective behaviour descriptors (tags). Finally, the Behaviour Generator translates these descriptors into specific gesture instructions (movements) for the ECA to perform. A detailed description of the ECA’s two-part verbal + gestural communication act is shown in Figure 2.
Figure 2: Text and ECA behaviour assembly for an acknowledgement of misunderstanding situation
If we were to adapt this example to an implementation scheme based on FML, we would find that no further tags are be needed beyond regular FML to implement the non-declared part of the CIB. It is seems reasonable that both “remorse” and “showing interest” (which stem directly from the non-declared intention of keeping the user in a calm and positive frame of mind) can and should be implemented using literal, text-based tags. Section 3 will make this statement clear. In it we hope to identify the need for new sets of tags that extend the regular text-based FML.
3 ADAPTING TO (AN EXTENDED) FML
In the previous section we described how we define the message at the pre-textual level using a structure we call the communication intention base (which is the output of the interaction manager module). Now we consider how the information contained in the CIB could be carried down into an FML structure. In other words, we describe what the FML structure would look like if we want it to carry through the information determined in the CIB. Figure 1 (a) succinctly illustrates how our ECA’s verbal and gestural behaviour is put together (FML is not used). Figure 2 (b), in contrast, shows the links we propose between the CIB and the FML level.
(a) (b)
Figure 1. Description of the modular verbal and gestural communication act generation: (a) Implemented system; (b) Correspondence between the CIB and FML.
The open discourse part of the CIB is the primary basis on which the textual message is formed together with all
the ECA gesture tags directly associated with the text (for instance, marks to emphasise particular words). In this sense the tags are literal: they express the text through gesture. Non-declared intentions could also modulate the gestures attached to the verbal message, the literal tags, partly determining how the ECA says a message (mainly in order to achieve a certain effect such as influencing the user’s response). For instance, continuing with the example introduced in the previous section, if we want the user to stay calm and unaware of an error we might choose to make the ECA smile while emphasising a particular part of its utterance. Both the smile and the emphasis can be expressed with regular “literal” FML tags like “emphasise” and “affect,” but either their presence or the nature of the smile are influenced by the non-declared intentions.
The most important function of the non-declared intentions, however, would be to determine a non-literal communication level that defines behaviour that is not directly related to a verbal message. This allows two things: a) defining “text-free” ECA behaviour (i.e., gesturing without saying anything); and b) while speaking, displaying a behaviour that is overlapped with the expression of the verbal message, but is semantically independent of it (or at least not directly related to it). In this sense it “overarches” the text-based message.
Allowing text-free behaviour could be useful to specify ECA behaviour during the user’s turn. This could be a waiting gesture if the user remains silent, for instance. By text-free we mean “lacking a verbal basis from the ECA.” However, in especially advanced systems ECA-text-free behaviour could be dependent on the verbal message from the user, as a reaction to it (thus, user-text-dependent). This allows introducing gestural reactions to what the user is saying, while he or she is still speaking.
The interaction control element in the CIB can be carried through in three different ways. Taking turn-giving cues, for instance, they could be defined a) via a literal interactional tag associated with some verbal indication that the ECA wants to give the turn to the user; b) via a non-literal tag overarching a verbal message that has nothing to do with turn management (the ECA gives visual cues to invite the user to speak, while finishing (overarching) an utterance to give the user several response options); or c) via text-free tags (the ECA performs a turn-giving gesture without saying anything).
In the following subsection we will see a few more examples to illustrate these ideas. We now very briefly propose a small set of tags belonging to the non-literal category that expands the regular “literal” FML set of tags.
3.1 Non-literal tags
As mentioned above, we introduce non-literal tags to define behaviour that is not directly related to the verbal message, but may be superimposed on it. Such behaviour may be useful in pursuing intentions that are hidden or not openly declared to the interlocutor. We propose three non-literal tags: empathy, knowledge and persuasion.
Empathy: This tag defines the ECA’s attitude toward the user (kind and understanding, or aggressive, for example). Its main attributes are valence and level:
• Valence values: positive, neutral or negative. • Level: strong, medium, weak.
Knowledge: Deals both with information related to dialogue management that the ECA wants to suggest but not put in words (because this can improve the fluency of the dialogue), and with deceiving or hiding information from the user. We distinguish the following types of non-literal knowledge tags:
• Dialogue stage: defines behavioural cues that may clarify for the user the stage the dialog is in. Examples: If the system is waiting for the user to say or do something, this tag could be used to indicate the generation of a waiting gesture. Behavioural sequences could be defined for dialogue initiation and termination. The ECA could indicate through gestures that it wants to give the turn to the user (as we discussed earlier) or that it wants to take the turn from the user.
• Recognition confidence: defines what sort of visible reactions the ECA should perform to show the user how confident the system is that it is correctly understanding a user’s utterance. This behavioural stance would be superimposed on whatever gestures are performed to express the verbal message the ECA is giving at the time. Another option would be to introduce it in the user’s turn, while he or she is speaking (an instance of user-text-dependent reactive behaviour). Type: high, intermediate, low. (For instance, an indication to perform low confidence cues at the FML level, could be translated at the BML level into a leaning of the head and squinting.)
• Manipulation: to manipulate the information offered to the user. We may distinguish four subtypes of manipulation tags:
o Conceal: to hide information from the user. For instance, hiding the fact that recognition has failed in order to maintain the user’s trust (and then trying to obtain the correct information further along in the dialogue).
o Focus: to draw the user’s attention to, or away from, certain facts the system has to tell the user. o Deceive: to try to make the user believe something that is false. (The difference with outright lying
is that deceit is done through gestural behaviour, not saying something that isn’t true (the latter could be implemented with ordinary literal tags).
Persuasion: Marks behaviour to influence the user in order to persuade him or her to do something, or to do it in a certain manner. There can be many types of persuasion marks. Here we propose two:
• Negotiation: defining behaviour to influence the user in a negotiation. For instance, if in a cooking recipe application we are trying to persuade the user not to take a desert, we might want to perform gestures to put
the user off (perhaps involving an expression of disgust constructed in the BML stage) while the ECA says something on the literal level as innocent as “Are you sure you want to eat that?” The corresponding extended FML section could read:
• Speech: influencing the way the user says something. A relatively simple attribute could be “rhythm”, to try to get the user to follow a certain rhythmic pace when speaking.
3.2 Playing with other interface modes
Beyond the literal and non-literal tags we could think of yet another category to take into account other nonverbal information that may affect the interaction. An example such information would be the presentation of pictures onscreen alongside the ECA. It would be interesting to be able to conveniently define how the ECA reacts to (or uses) these other information sources.
We can think of interaction examples that involve all three kinds of “FML” tags. Take, for instance, the system’s response to the user’s request for a list of recipes containing carrots. A list (perhaps with pictures) might then appear onscreen, and the ECA could start reading it. This would involve gesturing on the literal level (adding expression to the reading of the recipes). Performing deictic gestures to point successively at the different elements of the list the ECA is mentioning would involve tags that link the ECA’s verbal and gestural behaviour to other interaction elements (in this case in visual/textual modes). Finally, non-literal knowledge manipulation tags might be introduced to implement behaviour to draw the user’s attention away from the item “carrot cake” (perhaps by turning the head and looking uninterested) which the system knows is too calorie-laden but which it must mention because it’s on the list requested.
It is, of course, debatable whether such extra-conversational mode driven tags are really needed, or indeed whether any non-literal tags are needed for that matter, or if, on the contrary, everything in the CIB could be translated into gestures tightly synchronised with the verbal message, as specified in ordinary FML. We believe, however, that at the very least the new tag categories we propose add a considerable degree of conceptual clarity.
4 DISCUSSION The ECA communication act generation process described in this paper is part of an effort we are undertaking to design natural, smart and multimodal human-computer interfaces. We have identified a number of correspondences between our scheme and that of the SAIBA framework. We believe the latter provides a clear conceptual demarcation of communication act parts and generation phases, and one that provides considerable implementation flexibility. We are working to adapt our system so that it implements it more closely. Conversely, there are certain aspects which we identified when defining our system that we believe could represent a useful supplement to the SAIBA concept, especially as regards FML specification (as proposed in [16], for instance).
Firstly, in our work on dialogue robustness and fluency it has become clear that interaction-level information such as that relating to turn management and to how the different available modalities for expression are going to be used (ECA voice and gestures, text and other on-screen elements, etc.), should preferably be determined at an early stage that is close to when the next course of action is decided, and which comes before the ECA’s specific verbal intention is formed. Conceptually, this sort of information would belong in a supra-FML level of communication act formation. We suggest this supra-FML level should also include general verbal and gestural indications corresponding both to explicit and non-declared communication intentions, but which are text-independent (indeed, this is a pre-textual stage). Together, all these information elements constitute what we have called the
<persuasion type=“discourage”> <performative type=“enquiry”> Are you sure you want to eat <emphasise> that? </emphasise> </performative>
</persuasion>
communication intention base (CIB). When forming the CIB a number of contextual parameters should be taken into account (scenario, user capability, user preferences and emotional state, modalities in use, interaction history, etc.).
Secondly, we believe FML may be usefully enhanced by including categories to account for what we have called the non-declared communication level, which is the part of the communication act devised to achieve an interaction goal that is not explicitly declared to the interlocutor (for instance, a gesture strategy to produce an effect on, or to induce a certain reaction from, the interlocutor). The interaction goal in question may not only not be declared, but actually intended to remain hidden from the user, a possibility that would allow specifying that a gesture sequence should be implemented, for instance, in such a way as to deceive the interlocutor.
5 ACKNOWLEDGEMENTS This work was carried out with the support of the European Union IST FP6 program through the COMPANION project, IST-34434, and the support of the Spanish Ministry of Science and Technology under project TEC2006-13170-C02-01.
REFERENCES [1] Boyce, S. J., Spoken natural language dialogue systems: user interface issues for the future. In Human Factors
and Voice Interactive Systems. D. Gardner-Bonneau Ed. Norwell, Massachusetts, Kluwer Academic Publishers: 37-62, (1999).
[2] B. López, Á. Hernández, D. Díaz, R. Fernández, L. Hernández, and D. Torre, Design and validation of ECA gestures to improve dialogue system robustness, Workshop on Embodied Language Processing, in the 45th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 67-74, Prague 2007.
[3] T. Bickmore, J. Cassell, J. Van Kuppevelt, L. Dybkjaer, and N. Bernsen, (eds.), Natural, Intelligent and Effective Interaction with Multimodal Dialogue Systems, chapter Social Dialogue with Embodied Conversational Agents. Kluwer Academic, 2004.
[4] M. White, M. E. Foster, J. Oberlander, and A. Brown, Using facial feedback to enhance turn-taking in a multimodal dialogue system, Proceedings of HCI International 2005, Las Vegas, July 2005.
[5] S. Oviatt, and R. VanGent, Error resolution during multimodal humancomputer interaction, Proc. International Conference on Spoken Language Processing, 1 204-207, (1996).
[6] K. Hone, Animated Agents to reduce user frustration, in The 19th British HCI Group Annual Conference, Edinburgh, UK, 2005.
[7] S. Oviatt, M. MacEachern, and G. Levow, Predicting hyperarticulate speech during human-computer error resolution, Speech Communication, vol.24, 2, 1-23, (1998).
[8] Cassell J., Thorisson K.R., The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Applied Artificial Intelligence, vol.13, pp.519-538, (1999).
[9] Cassell, J. and Stone, M., Living Hand to Mouth: Psychological Theories about Speech and Gesture in Interactive Dialogue Systems. Proceedings of the AAAI 1999 Fall Symposium on Psychological Models of Communication in Collaborative Systems, pp. 34-42. November 5-7, North Falmouth, MA, 1999.
[10] Massaro, D. W., Cohen, M. M., Beskow, J., and Cole, R. A., Developing and evaluating conversational agents. In Embodied Conversational Agents MIT Press, Cambridge, MA, 287-318, (2000).
[11] S. Oviatt, Interface techniques for minimizing disfluent input to spoken language systems, in Proc. CHI'94, pp. 205-210, Boston, ACM Press, 1994.
[12] Poggi I., Pelachaud C., De Rosis F., “Eye communication in a conversational 3D synthetic agent”, AI Communications 13, 3 (2000), 169-182.
[13] SAIBA: http://wiki.mindmakers.org/projects:saiba:main/ [14] B. López Mencía, A. Hernández Trapote, D. Díaz Pardo de Vera, D. Torre Toledano, L. Hernández Gómez,
and E. López Gonzalo, "A Good Gesture: Exploring nonverbal communication for robust SLDSs," IV Jornadas en Tecnología del Habla, Zaragoza, Spain, 2006.
[15] COMPANION, European Commission Sixth Framework Programme Information Society Technologies Integrated Project IST-34434, http://www.companions-project.org/.
[16] Van Oijen J. “A Framework to support the influence of culture on nonverbal behaviour generation in embodied conversational agents” Master’s Thesis in Computer Science. HMI - University of Twente; ISI - University of Southern California. August 2007.
A Linguistic View on Functional Markup Languages
Dirk HeylenHuman Media Interaction
University of TwenteThe Netherlands
Mark ter MaatHuman Media Interaction
University of TwenteThe Netherlands
ABSTRACTWe make a start with an inventory of the functionality thata Functional Markup Language needs to cover by lookingat the literature on some forms of nonverbal communicationand discussing the functions these behaviours serve. Also, ananalysis of conversations as they are found in the linguisticand computational linguistic literature provides pointers tothe elements that need to be incorporated in FML.
1. INTRODUCTIONThe functional markup language (FML) as envisioned by
the SAIBA framework1 forms the interface between two lev-els in the planning of multimodal communicative behavioursof embodied conversational agents, viz. the CommunicativeIntent Planning and the Behavior Planning. Descriptions inFML represent communicative and expressive intent withoutany reference to physical behavior. “It is meant to providea semantic description that accounts for the aspects thatare relevant and influential in the planning of verbal andnonverbal behavior.”
In order to arrive at a list of required features of FML weneed to consider the following question. What determineswhat people do in conversations? This may be a more gen-eral question than asking what functions are served by thebehaviours that are displayed in a conversation. The notionof function suggests a purpose and an intent. What fac-tors influence which behaviour is carried out (whether it iscarried out and the specific form of action). What role dointentions play? Rather than talking about functions onecould use the more general term determinant as the factorsthat determine an action are not just conscious intentions.Many elements of communicative actions are not consciouslyintended but result from some automatic mechanism whichmay have arisen from habit (personal), or convention (so-cial/cultural).
If we look at functional aspects of behaviours, or determi-nants more general, we need to look at least at two things.One is the kind of functions that behaviours serve and sec-ond the way in which the behaviours serve a function. Wefirst illustrate this idea and make a start with making aninventory of functions by looking at the functions that havebeen ascribed to two kinds of nonverbal behaviours: variouskinds of head movements and gaze which we take from ourearlier discussion on this topic [4]. Secondly we will lookat some theoretical perspectives on the association of formand function of communicative behaviours. Finally, we look
1http://wiki.mindmakers.org/projects:saiba:main/.
at the architecture of current conversational systems to getanother view on the same subject.
2. THE FUNCTIONS OF NONVERBAL BE-HAVIOURS
In this section we look at two related behaviours, headmovements and gaze, and analyse the functions and deter-minants that have been assigned to particular patterns inconversation by di!erent researchers. Amongst others, headmovements may be used to:
(1) signal yes or no, (2) signal interest or im-patience, (3) enhance communicative attention,(4) anticipate an attempt to capture the floor,(5) signal the intention to continue, (6) expressinclusivity and intensification, (7) control and or-ganize the interaction, (8) mark the listing orpresenting of alternatives, (9) mark the contrastwith the immediately preceding utterances. Fur-thermore, synchrony of movements may (10) com-municate the degree of understanding, agreement,or support that a listener is experiencing. Greateractivity by the interviewer (e.g., head nodding)(11) indicates that the interviewer is more inter-ested in, or more emphatic toward, the intervie-wee, or that he otherwise values the intervieweemore. Head movements serve as (12) accompani-ments of the rhythmic aspects of speech. Typicalhead movement patterns can be observed mark-ing (13) uncertain statements and (14) lexical re-pairs. Postural head shifts (15) mark switchesbetween direct and indirect discourse.
From this list of communicative processes that head move-ments are involved in one can extract a first list of thekinds of ways in which head movements operate in com-municative settings. They signal, enhance, anticipate, ex-press, control, organize, mark, communicate, indicate, andaccompany. These terms are often used in di!erent senses.Some may be used as synonyms in certain contexts and asantonyms in others.
1. Signal, express, communicate are verbs that are mostlyused to express the fact that behaviours can carrymeaning in various ways. The precise meaning of theterms may depend on who is using them as there arevarious technical definitions of these terms. For in-stance, turning to the nominalised equivalents of the
terms, a signal may be used as a synonym of a sign(a form/meaning unit) by some people or only as thephysical realisation (the form).
2. Mark, reflect, indicate are verbs that are similar to theprevious ones in many respects but whereas the previ-ous ones can be paraphrased, in some of their uses, as“means”, this is not appropriate for these kinds.
3. Accompany is still a looser notion than mark, reflectand indicate. An indication is a phenomenon that ac-companies another in such a way that the indicationcan be taken as a clue that the other phenomenon oc-curs which need not be the case for an accompaniment.
The distinction in verbs expresses a distinction in whichthe behaviour signifies something or relates to some otherphenomenon. It is a di!erence in semiotics. Notions thatfigure in this are (1) a notion of intentionality and (2) thenature of the relation between the signifier and what is sig-nified. In the case of “communication”, intentionality on thepart of the producer a) to produce the behaviour is typicallyinvolved and b) the intention for it to mean something forsomeone else. In these cases the sign is a behaviour whosemeaning is shared between the partners that communicatewith each other, by convention, for instance. Another typi-cal relation, which holds for many so-called “cues”, is one ofcausality, comprising the Gricean natural signs.
For the specification of FML it will be useful to delimitmore precisely what one wants the language to cover. Ifone wants FML to account for all the determinants of be-haviours this will also have to include non-intentional fac-tors. On the other hand, if one fixes FML as the “intented”determinants, one will need to find another way to integratethe non-intentional factors in a complete model.
Besides the question of how behaviours provide informa-tion (intended or unintended), one also needs to look at thekinds of information the behaviours provide. For the headmovement functions above this amounts to the following.
1. Yes/No: these are equivalent to propositions or fullutterances (speech acts) or sentences in the linguisticsense (depending on whether one considers the seman-tic, pragmatic or syntactic equivalent).
2. Interest, impatience, attention, understanding, empa-thy, uncertainty: can all be qualified as mental states.
3. Floor-grabs deal with the way the interaction proceeds;the role in which one participates in the conversation.
4. Listing, alternatives, contrast, switches between directand indirect discourse are notions that can be groupedunder the heading of discourse structure: the way inwhich information parts connect.
5. Rhythmic accompaniments of speech are of a di!erentorder. It is not clear whether these accompanimentsare meant to mark the rhythm and thus the informa-tion structure of what is being said or that they facil-itate speaking as some have claimed.
6. Movement during repairs are a reflection of the cogni-tive processes involved in language generation.
The main categories along which the content of the headmovement expressions can be categorised are the following.
1. Equivalents of linguistic expressions.
2. Expressions of mental states.
3. Regulators of interaction/participation.
4. Rhythmic accompaniments.
To further complement this list we look at gaze behaviours.Gaze behaviour has been observed to play a role in (16) in-dicating addresseehood, (17) e!ecting turn transitions, and(18) the display of attentiveness. A typical gaze pattern oc-curs (19) when doing a word search. Gaze (20) may reflectthe social status. Looking away (21) is used to avoid dis-traction, to concentrate, (22) to indicate one doesn’t wantto be interrupted. One looks to the other in order (23) toget cues about mood and disposition of the other, (24) toestablish or maintain social contact. Gazing away (25) mayreflect hesitation, embarrassment or shyness. Furthermore,gaze is used (26) to locate referents in abstract space, and(27) to request listeners to provide backchannels.
The list of verbs indicating the “function” of gaze are:
1. e!ecting
2. displaying
3. occurs when
4. accompany
5. to get cues
6. establish or maintain (contact)
7. locate referents
8. request actions
This list of verbs adds interesting new perspectives to thelist of functions. Several verbs refer to a function of thebehaviour that involves the interlocutor in some way or an-other. Gaze can both be an expressive signal and function asa request for action by the interlocutor. The verbs such asto get cues, establish contact, request actions shows the in-tention of the user behind actions that are directed towardsengaging the action of others. These can be called elicitingactions.
The following lists the domains in which gaze behavioursare involved:
1. Participation (interaction): addressing, floor (conver-sation/interaction management)
2. Cognitive expression (mental states): attentiveness,word search, distraction avoidance, concentration, hes-itation
3. Social (interpersonal) relations: expression of socialstatus
4. Elicitation/Monitoring: to evoke and get cues aboutmood and disposition
5. Contact regulation (part of conversation/interactionmanagement)
To look at specific behaviours in conversation and listwhat is involved in their production is one way to cometo an understanding of what might need to be specified ina functional markup language. Another way is by reviewingthe theoretical literature on conversation. We make a startin the next section.
3. CONVERSATIONAL FUNCTIONSThe linguistics, sociological and psychological literature
on conversation – or interaction more general – is very di-verse. There is an abundance of theoretical positions withinthe various fields which results in many di!erent analyses ofthe same phenomena. For the specification of a functionalmarkup language it will be important to have clear defini-tions of terms, possibly referring to the linguistic traditionthat provides the correct context for a term.
When one tries to determine what motivates a commu-nicative act one should take into account that in most cases,conversation is not an end in itself but part of other jointactivities that people engage in. It is a means to an end: by-ing co!ee, proving your argument, comforting someone, etc.and it is often intrinsically connected to non-linguistic ac-tivities that accompany the speech act: handing over a cupof co!ee and saying “here you are”. Saying “Thank you” inreply is a ritual act that acknowledges the previous act andin particular the fact that the other has done something foryour benefit which you are grateful for. Actions in conver-sation can be said to accomplish three di!erent, interrelatedgoals (often at the same time): taking care that business isexecuted (task), that conversation runs smoothly (system)and that the proper interpersonal relationship is establishedor maintained (social/ritual).
• Task dimension: actions that accomplish the businessat hand
• System dimension: conversation acts performed to makeconversations work properly as conversations
• Ritual dimension: actions handling social commitmentsand obligations
Conversation is an activity that the participants engage intogether. Actions by one participant are intrinsically depen-dent on actions by the other. A linguistic utterance (speechact) is intended to be heard, understood and acted upon byanother. An utterance by one often leads to another by theother participant.
Since Austin [1], it is common to assume that a commu-nicative action can be viewed from di!erent levels (locution,illocution, perlocution). In Clark’s version [2], a speaker actson four levels. (1) A speaker executes a behaviour for the ad-dressee to attend to. This could be uttering a sentence butalso holding up your empty glass in a bar (to signal to thewaiter you want a refill). (2) The behaviour is presented asa signal that the addressee should identify as such. It shouldbe clear to the waiter that you are holding up the glass tosignal to him and not just because of some other reason. (3)The speaker signals something which the addressee shouldrecognize. (4) The speaker proposes a project for the ad-dressee to consider (believe what is being said, except theo!er, execute the command, for instance). In this formula-tion of levels, every action by the speaker is matched by anaction that the addressee is supposed to execute: attend tothe behaviour, identify it as a signal, interpret it correctlyand consider the request that is made.
So the producer of a communicative act, acts on di!erentlevels. For each of these di!erent levels the producer expectsa matching act of the intended recipient of the act. Besidesproducing the act, the producer will therefore also monitorthe recipient who will indicate (through orientation, gaze,
facial expressions, back-channels, and other actions) thathe has or has not been able to hear or see the action, tounderstand what was meant by it, and whether he will fol-low it up as intended or not. In short there are processesof production and monitoring going on in parallel that arecomplemented by processes of reception and feedback. Allof these actions that go on in parallel are associated withdi!erent but related intentions.
In the next section we zoom in at some of the systemconstraints to refine the inventory of functions.
4. SYSTEM CONSTRAINTSGo!man ([3]) lists several kinds of normative principles
that are helpful to ensure e!ective transmission in conver-sations such as the principle that constrains interruptions,or against simultaneous talk, against withholding answers,norms that oblige the use of “back-channel” cues, that en-courage the use of “hold” and “all-clear” cues if the hearer isnot able to attend temporarily, and norms to show whetheror not the message has been heard and understood imme-diately following the utterance. The latter are back-channelcues which consist of nods, facial gestures and nonverbal vo-calizations from hearers during the talk of a speaker thatinform him “among other things, that he was succeeding orfail to get across [...] while attempting to get across.” (page12). Requirements such as these Go!man groups under theheading of “system requirements” or “system constraints”.He provides the following preliminary list of requirements.
1. There has to be a two-way capability for transceiv-ing readily interpretable messages. Interlocutors mayengage in certain actions to establish this capability.
2. Back-channel feedback capabilities “for informing onreception while it is occurring”.
3. Contact signals involve actions such as signalling thesearch for an open channel, the availability of channel,the closing of a channel, etcetera.
4. Turnover signals regulate turn-taking.
5. Preemption signals are “means of inducing a rerun,holding o! channel requests, interrupting a talker inprogress”.
6. Framing capabilities indicate that a particular utter-ance is ironic, meant jokingly, “quoted”, etcetera.
7. Constraints regarding nonparticipants which should noteavesdrop or make competing noise.
The latter constraints follow from social norms. Besidesthese, Go!man also mentions social norms that oblige re-spondents to reply honestly in the manner of Grice’s con-versational maxim. Besides these system constraints thattell how individuals ought to handle themselves to ensuresmooth interaction, an additional set of constraints can beidentified “regarding how each individual ought to handlehimself with respect to each of the others, so that he notdiscredit his own tacit claim to good character or the tacitclaim of the others that they are persons of social worthwhose various forms of territoriality are to be respected.”These, Go!man calls “ritual constraints”.
Several of the system constraints refer to elements in in-teraction that were encountered above already, other arenew. For instance the notion of pre-emption signals, fram-ing and norms were not mentioned above as such, thoughthey are related to some phenomenon discussed above. Forinstance, quoting (the head movements that occur when aperson quotes someone else) can be taken as a kind of fram-ing.
A third way to approach the subject of FML is by look-ing at the computational literature on dialogue systems orconversational agents. This has often borrowed freely fromvarious theoretical models but has given this its own twist.So, what aspects of conversation do current versions of dia-logue systems take into account.
5. MODELING DIALOGUESConversational agents as instantiations of dialogue sys-
tems may deal with more or less of the aspects mentionedin the previous paragraph, depending on their complexity.We give a very short introduction to dialogue systems basedin part on [6]. A conversation manager (dialogue system) isconcerned with the following tasks.
• Interpret contributions to the dialogue as they are ob-served.
• Update the dialogue context.
• Select what to do: when to speak and what to say,when to stop speaking, when to give feedback and how.
In the Trindi conception of conversation management ([5]),the parameters that are relevant for processing are kept ina so-called “information state”. Next, there are all kindsmodules that update the information state (e!ects of dia-logue moves or dialogue inferencing) and modules that selectactions given a particular configuration of the informationstate. These actions may involve updates to the informationstate as well.
The various parameters of the information state can beclassified in a number of ways. Traum partitions the infor-mation state and dialogue moves into a set of layers “eachdealing with a coherent aspect of dialogue that is somewhatdistinct from other aspects”. The following are some of thelayers that are distinguished.
• Contact (whether and how individuals are accessiblefor communication)
• Attention (who is attending to what)
• Conversation (which comprises the following points)
1. Participants (who is there, in what role)
2. Turn (who has the turn, who is claiming it...)
3. Grounding (is the information added to the com-mon ground; an important function of feedback)
4. Topic
5. Rhetorical (the meaning relations that hold be-tween di!erent utterances or between clauses inone utterance)
• Social commitments (obligations)
• Task
One can immediately see that the terms used here overlapto a great extent with the terms used in the previous section.Grounding is a new term, but actions such as back-channelsmentioned in the previous sections typically serve the func-tion to acknowledge that a message has been received andunderstood. Topic management is also a new feature re-lated to the structure of information on a discourse level.The rhetorical structure refers to an element of discoursestructure as well.
6. SUMMARYThe specification of a functional markup language can
build upon the existing analysis of behaviours, theoreticalnotions introduced in several disciplines from the humani-ties and the architecture of current conversational systems.A thorough analysis is required leading to a common under-standing of the central notions.
This kind of analysis will have repercussions on the defini-tion of the SAIBA framework. The notion of function needsto be defined more clearly as there is quite some variation inthe way in which a behaviour can function in conversations.
As the above suggests a participant in a conversation mayproduce a communicative behaviour or a particular part ofthe behaviour with the intention:
• to establish that there is a channel open for communi-cation
• to have the interlocutor pay attention to the commu-nicative behaviour
• to have the interlocutor understand that one is per-forming an informative communicative behaviour
• to inform the interlocutor of something, i.e. to meansomething, to send a message (more or less by defini-tion of communicative behaviour it seems)
• to make clear in what way the message should be in-terpreted (framing)
• to engage the interlocutor in a project (speech act,perlocutionary intent, task)
• to establish that there is a channel open for communi-cation
• to change the participant status of self or other (turn-taking/floor)
• to have the message directed to someone in particular(addressing)
• to show the structure of the message (for reasons ofclarity, emphasis - information structure and discoursestructure)
• to show reception and understanding of and attitudetowards what the other interlocutor is trying to com-municate
• to establish a particular interpersonal relation with theinterlocutor
• to express social status
• to convey a particular impression (impression manage-ment)
• to display a particular mental state
• to hide a subjective state such as an emotion
Etcetera.
AcknowledgementThe research leading to these results has received fundingfrom the European Community’s Seventh Framework Pro-gramme (FP7/2007-2013) under grant agreement number211486 (SEMAINE).
7. REFERENCES[1] Austin, J.A.: How to Do Things with Words. Oxford
University Press, London (1962)[2] Clark, H.H.: Using Language. Cambridge University
Press, Cambridge (1996)[3] Go!man, E.: Replies and responses. Language in
Society 5(3), 2257–313 (1976)[4] Heylen, D.: Head gestures, gaze and the principles of
conversational structure. International Journal ofHumanoid Robotics 3(3), 241–267 (2006)
[5] Traum, D., Larsson, S.: The informaiton stateapproach to dialogue management. In: J. vanKuppevelt, R. Smith (eds.) Current and New Directionsin Discourse and Dialogue, pp. 325–335. Kluwer (2003)
[6] Traum, D., Swartout, W., Gratch, J., Marsella, S.: Avirtual human dialogue model for non-team interaction(to appear)
Functions of Speaking and Acting -
An Interaction Model for Collaborative Construction Tasks
Stefan Kopp & Nadine Pfeiffer-Leßmann
Artificial Intelligence Group
Faculty of Technology, Bielefeld University
{skopp, nlessman}@techfak.uni-bielefeld.de
Abstract. This paper describes how a virtual agent assists a huam interlocutor in collaborative
construction tasks by combining manipulative capabilities for assembly actions with conversational
capabilities for mixed-initiative dialogue. We present an interaction model representing the evolving
information states of the participants in this interaction. It includes multiple dimensions along which
an interaction move in general can be functional, independent of the concrete communicative or
manipulative behaviors comprised. These functions are used in our model to interpret and plan the
contributions that both interactants make.
1! Introduction
Virtual humanoid agents offer an exciting potential for interactive Virtual Reality (VR). Cohabiting a virtual
environment with their human interlocutors, virtual agents may ultimately appear as equal partners that share the
very situation with their partner and can collaborate on or assist in any task to be carried out. To investigate such a
scenario the embodied agent Max (Kopp, Jung, Leßmann, Wachsmuth, 2003) is visualized in human-size in a
CAVE-like VR environment where he joins a human in assembling complex aggregates out of virtual Baufix
parts, a toy construction kit (see Fig. 1). The two interactants meet face-to-face over a table with a number of parts
on it. The human interlocutor—who is equipped with stereo glasses, data gloves, optical position trackers, and a
microphone—can issue natural language commands along with coverbal gestures or can directly grasp and
manipulate the 3D Baufix models to carry out assembly actions. Further, the human can address Max in natural
language and gesture. The agent is, likewise, able to initiate assembly actions or to engage in multimodal dialogue
using prosodic speech, gesture, eye gaze, and emotional facial expressions.
In this setting, the two interactants can become collaboration partners in a situated interaction task as follows; see
(Leßmann, Kopp & Wachsmuth 2006) for a detailed analysis. The partner wants (or is) to construct a certain
Baufix aggregate (e.g. a propeller) and is free to directly carry out assembly steps, either if they are known to her
or if she wants to give certain assembly procedures a trial. At any time, she can ask Max, who shares the situation
with her and attends her actions, for assistance. Max‘s task is then to collaborate with the human user in jointly
walking step-by-step through the construction procedure and to provide support whenever his partner does not
know how to go on constructing the aggregate.
!
Figure 1: In his CAVE-like Virtual Reality environment, Max guides the human partner through interactive construction procedures.
While the overall interaction is guided by the human‘s wish to build a certain assembly, once the human and
Max have engaged in the collaborative construction activity the scenario is symmetric in that roles may flexibly
switch between the interaction partners according to their competences. That is, either the human or Max may
carry out an action or may instruct the other to perform an action. This demands for the agent being able, both to
collaborate by taking actions in the world and conversing about them, possibly in an intertwined manner. The
scenario is hence characterized by a high degree of interactivity, with frequent switches of initiative and roles of
the interactants. The participants hence need to reconcile their contributions to the interaction with the need for
regulating it by performing multiple behaviors simultaneously, asynchronously, and in multiple modalities. That
is, their multimodal contributions are multi-functional, either communicative or manipulative, and the effects of
the latter can only be taken in from the situational context. We subsume each of those contributions that Max or
the human can perform under the term interaction move (as an extension of the common term dialogue move).
Enabling Max to engage in such an interaction requires to embed him tightly in the situational context and,
based on the perception of the human‘s interaction moves, to reason about what the proper next interaction move
may be right now. The result of this deliberation process is passed on to modules that generate the required
behaviors (Leßmann, Kopp & Wachsmuth 2006). This general layout resembles the SAIBA pipeline architecture
(Kopp et al. 2006), with dedicated representations to interface (1) between the planning of an interaction move
and its behavioral realization, and (2) between behavior planning and realization. In the remainder of this paper,
we describe the first representation as part of an information state-based interaction model that attempts to
explicate all the functional aspects of an interaction move which must be taken into account in order to keep track
of the interaction and adequately specify an interaction move independent of its behavioral realization.
2 Interaction Model
Adopting the information state approach to dialogue (Traum & Larsson, 2003), we frame an interaction model
that defines the aspects along which the collaborative interaction evolves from the agent’s point of view. This
model stipulates what facets of interaction Max has to take into account, without making any provisions as to how
these aspects can be fully represented or reasoned about.
2.1! Information state-based interaction model
The information state approach to dialogue modeling provides a framework for formalizing both, state-based/
frame-based as well as agent-based dialogue models in a unified manner (Traum & Larsson, 2003). It assumes
that each dialogue participant maintains information states (IS) that are employed in deciding on next actions and
are updated in effect of dialogue acts performed by either interactant. A particular dialogue manager, then, consists
in a formal representation of the contents of the ISs plus update processes that map from IS to IS given certain
dialogue moves. Several systems have been based on this framework, concentrating on different aspects of
dialogue, e.g., the GODIS system (Larsson et al. 2000), or in the WITAS project (Lemon et al. 2001). Traum &
Rickel (2002) have proposed a model of multimodal dialogues in immersive virtual worlds that comprises layers
for contact, attention, conversation, obligations, and negotiation. The conversation layer defines separate dialogue
episodes in terms of participants, turn, initiative, grounding, topic, and rhetorical connections to other episodes.
Rules state how communicative signals can be recognized and selected to cause well-defined changes to a layer.
However, most existing models have focused either on dialogues where the agents are only planning—with
the plan to be executed at a later time—or on dialogues where the agents are only executing some previously
created plan. As Blaylock et al. (2003) point out, this does not allow for modeling dialogues where mixed-
initiative collaboration and interleaved acting and planning take place, as in our setting. In addition, we posit that
not only spoken communicative acts but also manipulative actions must be characterized as interaction moves. We
therefore introduce a model that includes a layer accounting for the formation, maintenance, overlap, and rejection
of the goals of interactants. Goals cover the rational behind any kind of intentional action and abstract away from
their situational realization. The interaction model consists of the following layers:! Initiative: Who has brought up the goal that the interactants are pursuing in the current discourse segment.! Turn: Who has the turn. We distinguish between four states: my-turn, others-turn (or a unique name for a
specific interaction partner, respectively), gap, overlap.! Goals: The goals that have been pursued so far as well as the, possibly nested, goals that are still on the agent’s
agenda for the remainder of the interaction. Each goal may either have arisen from the agent’s own desires, or
was induced due to obligations following social norms or a power relation Max is committed to (Traum &
Allen 1994).! Content: The propositional content that has been or will be subject of the discourse, defined in a logic-based
notation.
!
! Grounding: The discourse state of content facts, denoting whether the agent assumes a fact to be new to the
conversant or part of the common ground (Clark & Brennan, 1991).! Discourse structure: The organization of the discourse in segments that can relate to goals and refer to or group
the entries in the content and goal layers. Each discourse segment has a purpose (DSP; Grosz & Sidner 1986)
and they are related based on the relationships among their DSPs, all of which are part of one intentional
structure. ! Partner model: What is known about the dialogue partner(s), also covering aspects of negotiation and
obligations. It is the basis of retrospective analysis and thus plays a central role for the agent being situated in
an evolving interaction.
2.2! Interaction moves
Any action by an interactant may alter the agent’s internal representation of the state of interaction. We focus here
on intentional actions, either communicative or manipulative, which make up for most of the progress of
interaction. They are selected and performed in order to achieve a desired state of the world. In mental attitudes
terms, we think of them as being produced following specific kinds of nested intentions, which in turn are
resultant of some goal-directed planning process. For example, when touching a bolt or saying “move the bolt
closer to the bar”, one might have the intention not just of manipulating the object but of mating it with a bar,
building a propeller, and assisting the partner.
Modeling such behaviors requires having an account of their functional aspects, in addition to their mere overt
actions. For manipulative acts, these aspects can easily be defined as the direct consequences (or post-conditions)
of the manipulation in the world, which in turn need to be related to the current intention structure. For
communicative acts, the functional aspects are not so easy to pinpoint despite of a long tradition of viewing
speaking as acting. Austin (1962) pointed out multiple acts (or “forces”) of making a single utterance: the
locutionary act of uttering words, and the perlocutionary act of having some effects on the listeners, possibly even
effecting some change in the world. To its perlocutionary end, an utterance always performs a certain kind of
action, the illocutionary act. Adopting this notion, Searle (1969) coined the term speech act which is supposed to
include an attitude and a propositional content. Other researchers have used terms like communicative acts
(Allwood, 1976; Poggi & Pelachaud, 2000), conversational moves (Carletta et al., 1997), or dialogue moves
(Cooper et al., 1999) for related ideas.
The notion of functional aspects of communicative actions is particularly beneficial to multimodal systems,
for it allows abstracting away from a signal’s overt form to core aspects that only got realized in certain ways
(Cassell et al. 2000; Traum & Rickel 2002). In Cassell et al.‘s model, the function a behavior fulfils is either
propositional (meaning-bearing) or interactional (regulative), and several behaviors are frequently employed at
once in order to pursue both facets of discourse in parallel. Poggi & Pelachaud (2000) define a communicate act,
the minimal unit of communication, as a pair of a signal and a meaning. The meaning includes the propositional
content conveyed, along with a performative that represents the action the act performs (e.g. request, inform, etc.).
Based on the interaction model laid out above, we define interaction moves as the basic units of interaction in
terms of the following slots:! Action: The illuocutionary act the move performs. The act can either be purely manipulative (connect,
disconnect, take, or rotate) or communicative. In the latter case, a performative encodes the kind of action as
described below.! Goal: The perlocutionary force of the move, i.e., what the move is meant to accomplish. This can be either a
desired world state (achieve something), or it can be the mere performance of an action (perform something).! Content: Information conveyed by the move, needed to further specify the action and the goal. This can
accommodate either propositional specifications (e.g. for language or symbolic gestures) but also an analog,
quantitative representation of imagistic content (e.g. for iconic gestures).! Surface form: The entirety of the move’s overt verbal and nonverbal behaviors, employed to convey all of the
aspects represented here. ! Turn-taking: The function of the move with respect to turn-taking, either take-turn, want-turn, yield-turn, give-
turn, or keep-turn.! Discourse function: The function of the move with respect to the segmental discourse structure, either start-
segment, contribute, or close-segment (cf. Lochbaum, Grosz, Sidner, 1999).! Agent: The agent that performs the move.! Addressee: The addressee(s) of the move.
These slots together capture the informational aspects that are relevant about an action in our model of
interaction. They are not independent from each other, nor are they self-sufficient. Instead, the slots are supposed
to mark the specific signification of particular informational aspects. In general, they provide a frame structure
that can be incrementally filled when generating an action through subsequent content planning and behavior
!
planning. Some of the slots may thereby remain empty, e.g., for smallish moves like raising a hand which may
solely serve a turn-taking function.
2.3! Semantics – Expectations – Obligations
The meaning of an interaction move is defined by what it realizes, i.e. by the effects it brings about in terms of the
aspects captured by the interaction model. For a manipulative move, the effects are directly implied by the action
and can be extracted via perception of the environment. Given powerful enough perceptive capabilities, the
current scene thus serves as an external part of the representation of the interaction state for the agent.
The effects of communicative moves can often not be defined in a clear-cut way as they depend on multiple
aspects of the interaction move and the context in which it is performed. For example, even a simple move that
fulfils the turn-taking function want-turn will result in a new state my-turn only when executed in the state gap (no
one has the turn). For this reason, information state-based systems typically employ a large number of update rules
to model the context-sensitive effects of particular moves. Furthermore, the effects of a communicative move
depend on social attitudes like the expectations a sender connects to a move or the obligations it imposes on the
recipient. Traum (1996) argues that obligations guide the agent’s behavior, without the need for recognizing a goal
behind an incoming move, and enable the use of shared plans at the discourse level. As Max is supposed to be
cooperative, obligations are therefore modeled to directly lead to the instantiation of a perform-goal in response to
an interaction move. If this move was a query or request, Max will thus be conducting the action asked for in a
reactive manner. In case of a proposal, he is only obliged to address the request, and his further deliberation
decides upon how to react. We hence explicitly encode the Action of each move by distinguishing between four
types of performative (e.g. cf. Poggi & Pelachaud 2000):
1. inform-performatives: provide informational facts, characterized by the desire to change the addressee’s beliefs
2. query-performatives: procure informational content to establish a new belief or to verify an existing one
3. request-performatives: request a manipulative action
4. propose-performatives: propose propositional content or an action
Derived from one of these general types, a performative can often be narrowed down through subsequent
specification during interpretation or generation (Poggi & Pelachaud 2000). The final performative will be tailored
to the content on which it operates (e.g., whether it asserts or requests a propositional fact, an action, or an
opinion) as well as to contextual factors like the actual situation, the addressee, the type of encounter, or the
degree of certainty. This can even happen retrospectively, when the performative of a previous move is fully
disclosed at a later point in the conversation. We represent these refinements using a compositional notation, e.g.
inform.agree or propose.action.
See (Leßmann, Kopp & Wachsmuth 2006) for a description of how the different performatives are endowed
with semantics, obligations and expectations, and how they are used along with the discourse function of a move
to state rules for identifying the holder of the initiative. To illustrate our model here, we analyse in Table 1 some
interaction moves of an example dialogue. To represent propositional facts in the goal and content slots, we use a
formal, logic-like notation in which predicates/relations are indicated by a capital letter, and unbound variables are
prefixed with a $. For example, “(Inform.ref $s)” represents that a reference will be communicated, with the
unbound variable $s being the referent. An unbound variable indicates underspecification inherent to the content
of the move (in this example, there is no particular entity being referred to, yielding an indefinite article). Note
further that the content and the goal slot together specify the informational aspects of the move, with some of
these aspects marked as being the goal.
3! Modeling situated interaction management
In this section we give a brief overview of how the functional aspects explicated in the move representation
inform interaction management. Interaction moves are the main data structure for interfacing in Max‘s
architecture between modules for perception, deliberation, and behavior planning. On the input side, interaction
moves are used to specify and structure the incoming information, possibly relating the information to external
objects or previous interaction moves; on the output side, they serve as a container which gets filled during the
generation process of an agent’s response. How the agent behaves in the interaction is determined by a Belief-
Desire-Intention control architecture (Bratman, 1987; Rao & Georgeff, 1991), for which we extend JAM/UM-PRS
(Huber 1999, Lee et at. 1994). It draws upon specific plans for analysing input moves as well as generating
responses in a context-dependent manner.
Plans can either directly trigger specific behaviors to act, but may also invoke dynamic, self-contained
planners that construct context-dependent actions or, again, plans. A judicious use of plans allows the agent to
reduce the complexity of controlling dynamic behavior and to constrain itself to work towards goals. Plans are
therefore kept as general as possible, using a constraint-bases representation, and are refined to situational context
!
not until necessary. That is, we too regard plans as mental attitudes, as opposed to recipes that just consist of the
knowledge about which actions might help achieve a goal (cf. Pollack 1992, Pollack 1990). When Max selects a
plan and binds or constrains its arguments, it becomes an intention on an intention stack, encompassing
information about the context and the goals responsible for it to come into existence. In result, the plan is
characterized by the agent’s attitudes towards the realization of his goal.
MAX USER MAX
Interaction Move “Let us build a
propeller.”
“Ok.” “First, insert a bolt in the middle of a
bar.”
Action propose.action inform.agree request.order
Goal (Achieve (Exists prop)) (Perform
(Inform.agree))
(Achieve (Connected $s $b $p1 $p2))
Content (Build prop we) (Build prop we) (Connect $s $b $p1 $p2) (Inst $s bolt)
(Inst $b bar)
(Center_hole $b $p2))
Surface form <words>t <words>t <words>t
Turn-taking take | give take | keep | give
Discourse function start-segment
(DSP=prop)
contribute start-segment
(DSP=prop-s1)
Agent User Max Max
Addressee Max User User
USER MAX USER MAX
“Which bolt?” “Any bolt.” User puts bolt into the first
hole of bar.
“No, that was the wrong hole.”
query.ref inform.ref connect inform.disagree
(Perform (Query.ref
$s))
(Perform (Inform.ref
$s))
(Achieve (Connected bolt-2
bar-1 port-1 port-3))
(Perform (Inform.disagree
(Connect …)))
(Inst $s bolt) (Inst $s bolt) (Connect bolt-2 bar-1 port-1
port-3)
(Not (Center-hole bar-1
port-3))
<words>t <words>t <manipulation> <words>t
take | give take | yield take | yield take | yield
contribute contribute contribute contribute
User Max User Max
Max User User
Table 1: An example dialogue analyzed into formal interaction moves.
3.1!! Dealing with incoming interaction movesA variety of plans are used for handling incoming interaction moves. Turn-taking functions are processed taking
into account the mental state of the agent, the goal he pursues, and the dominance relationship between the
interlocutors. A turn-taking model (Leßmann et al. 2004) is used that consists of two steps: First, a rule-based,
context-free evaluation of the possible turn-taking reactions takes into account the current conversational state and
the action of the partner‘s utterance. These rules are incessantly applied and integrated using data-driven conclude
plans to ensure cooperative dialogue behavior. The second step is the context-dependent decision upon different
response plans, possibly leading to new intentions.
The propositional content of an incoming interaction move, if present, is processed by plans for determining
the context it belongs to (e.g. to resolve anaphora). To this end, it is checked whether the move relates to one of
the current goals, or to an interaction move performed before. This is done by calculating a correlation value
!
between the content facts carried by the interaction move and the goal context. In addition, the agent needs to
resolve references to external objects in the virtual world, which is achieved using a constraint satisfaction-based
algorithm (Pfeiffer & Latoschik 2004).
If the agent succeeds in searching a candidate context, it adds an obligation-goal to handle the interaction
move as a sub-goal to the goal to which the move contributes to. Otherwise, a new obligation-goal is added as a
top-level goal. By associating the interaction move with one of his goals, the agent is able to deal with multiple
threads at the same time and keep apart their individual contexts. In result, incoming information is structured
according to its content, but also depending on the context—an important aspect of being situated in the ongoing
interaction. Finally, different plans are thus used to handle an incoming move depending on the action it performs.
For example, these plans may include verifying a proposition, answering a question, or constraining a parameter
involved in a plan in order to adapt it to the events occurring during the plan execution e.g. the usage of specific
objects or proposals made by the interlocutor.
3.2! Output planning
If the agent has the goal to achieve a certain state of the world he will reason about which courses of action may
lead to this goal. In result, he will formulate and aggregate situated plans that lead to the desired state of affairs.
Each behavior that any of these plans incorporates may be either a communicative or a manipulative move, both
represented as interaction move in terms of the same features defined above. Both kinds of behaviors hence stand
in a meaningful relationship to each other and can be carried out in a nested way when suitable. The decision is
based upon the role that the agent currently fulfils. If Max is to act as the instructor—his initial role upon being
asked by the human—he will verbalize the steps.
When Max has decided to employ a communicative act, he has to decide what meaning should be conveyed
(content planning) and how to convey it in natural language, gesture, etc (behavior planning). In general, the
behavior produced must be tailored to the agent’s current role in the collaboration, expressing his beliefs and
goals. Our approach to natural language generation starts with a communicative goal to be achieved and relies on
various sources of knowledge, including task and conversational competencies, a model of the addressee, and a
discourse history (McTear 2002; Reiter & Dale 2000). Crucially, this process is an integral part of the agent’s
plan-based deliberations and is carried out naturally by dedicated plans when Max’s mental attitudes decline him
towards making a verbal utterance.
The communicative goal derives directly from the currently active intention on the agent’s plan stack, e.g. to
request the interaction partner to connect two objects: request.order “user” “connect” “bolt-2” $obj3. The goal
of the generation process, then, is to determine all information about this communicative move needed to render it
as a multimodal utterance. Starting from a message (goal) as above, the general performative (action) is first
derived directly from the type of the intended act (in our example request.order). Content selection then works to
determine the information that specifies the parameters of the communicative act to concrete values and, if
possible, refines the performative. Discourse planning determines the move’s discourse function and the discourse
segment (DSP) it is contributing to, both being derivable from the actual structure and ordering of the plans on the
intention stack (cf. Lochbaum, Grosz, Sidner, 1999).
The final stages of output generation, not presented here, are multimodal behavior planning and realization; see
(Leßmann, Kopp & Wachsmuth 2006) for details on the methods used in the present scenario.
4! Summary
We have presented an approach to formally model the moves that interactants take in a mixed-initiative,
collaborative construction scenario. This scenario provides both the virtual agent and the human interloctur with
rich possibilities for interaction, comprising praxic actions and conversing about it as equal contributions. We have
defined a formal model that explicates and characterizes dimensions along which these situated interactions
evolve, and which an embodied agent must actively manage in order to be able participate as a collaborative
expert. Following an information state-based approach, we have further laid down in detail the notion of
interaction moves that encapsulate the different functions a contribution can fulfill with respect to the dimensions
of this conceptual framework. As these functions, among others not considered here (e.g. emotional display or
epistemic qualification), bear significant influence on the behavioral realization of an interaction move, FML
should be able to accommodate most or all of them.
ReferencesAllwood, J. (1976). Linguistic Communication in Action and Co-Operation: A Study in Pragmatics. In
Gothenburg Monographs in Linguistics 2, University of Gothenburg, Dept. of Linguistics.Austin, J. L. (1962) How to Do Things with Words. In Havard University Press, Cambridge, MA.
!
Blaylock, N., Allen, J., Ferguson, G. (2003). Managing communicative intentions with collaborative problem solving. In Jan van Kuppevelt and Ronnie"W. Smith, editors, Current and New Directions in Discourse and Dialogue, volume "22 of Kluwer Series on Text, Speech and Language Technology, pages 63-84. Kluwer, Dordrecht
Bratman, M. E., (1987). Intention, Plans, and Practical Reason. Harvard University Press, Cambridge, MACarletta. J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A. (1997) The reliability of a
dialogue structure coding scheme. Computational Linguistics 23:13-31. Cassell, J., Bickmore, T., Campbell, L., Vilhjalmsson, H., & Yan, H. (2000). Human conversation as a system
framework: Designing embodied conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied Conversational Agents, pp. 29-63. Cambridge (MA): The MIT Press.
Clark, H. H. & Brennan, S. A. (1991). Grounding in communication. In L.B. Resnick, J.M. Levine, & S.D. Teasley (Eds.). Perspectives on socially shared cognition. Washington: APA Books.
Cooper, R., Larsson, S., Matheson, Poesio, M., Traum, D. (1999) Coding Instructional Dialogue for Information States. Trindi Project Deliverable D1.1
Grosz, B. J., Sidner, C. L., (1986). Attention, Intentions, and the Structure of Discourse. In Computational Linguistics, Volume 12, Number 3, pp. 175-204: The MIT Press
Huber, M.J (1999). JAM: A BDI-theoretic mobile agent architecture. Proceedings Third Int. Conference on Autonomous Agents, pp. 236-243.
Kopp, S., Jung, B., Leßmann, N., Wachsmuth. I. (2003). Max – A Multimodal Assistant in Virtual Reality Construction. KI-Künstliche Intelligenz 4/03, pp 11-17
Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., Vilhjalmsson, H. (2006) Towards a common framework for multimodal generation in ECAs: The behavior markup language. Gratch, J. et al. (eds.): Intelligent Virtual Agents 2006, LNAI 4133, pp 205-217, Springer.
Larsson, S., Ljunglöf, P., Cooper, R., Engdahl, E., Ericsson, S., (2000). GoDiS - An Accommodating Dialogue System. In Proceedings of ANLP/NAACL-2000 Workshop on Conversational Systems, pp 7-10.
Lee, J., Huber, M.J., Kenny, P.G., Durfee, E. H. (1994). UM-PRS: An Implementation of the Procedural Reasoning System for Multirobot Applications. Conference on Intelligent Robotics in Field, Factory, Service, and Space (CIRFFSS), Houston, Texas, pp 842-849.
Lemon, O., Bracy, A., Gruenstein, A., Peters, S. (2001) Information States in a Multi-modal Dialogue System for Human-Robot Conversation. In Proceedings Bi-Dialog, 5th Workshop on Formal Semantics and Pragmatics of Dialogue, pp 57 - 67
Leßmann, N., Kranstedt, A., Wachsmuth, I. (2004). Towards a Cognitively Motivated Processing of Turn-taking Signals for the Embodied Conversational Agent Max. AAMAS 2004 Workshop Proceedings: "Embodied Conversational Agents: Balanced Perception and Action", pp. 57-64.
Leßmann, N., Kopp, S., Wachsmuth, I. (2006) Situated Interaction with a Virtual Human - Perception, Action, and Cognition. Rickheit, G., Wachsmuth, I. (eds.): Situated Communication, pp. 287-323, Mouton de Gruyter.
Lochbaum, K., Grosz, B. J., Sidner, C., (1999) Discourse Structure and Intention Recognition. In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text. R. Dale, H. Moisl, and H. Sommers (Eds.)
McTear, M. (2002). Spoken Dialogue Technology: Enabling the Conversational User Interface. In ACM Computer Surveys 34(1), pp. 90-169,
Pfeiffer, T. & Latoschik, M., E. (2004). Resolving Object References in Multimodal Dialogues for Immersive Virtual Environments. In Y. Ikei et al. (eds.): Proceedings of the IEEE Virtual Reality 2004. Illinois: Chicago.
Poggi, I. & Pelachaud, C. (2000). Performative Facial Expressions in Animated Faces. . In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied Conversational Agents, pp. 29-63. Cambridge: The MIT Press.
Pollack, M. E., (1990). Plans as Complex Mental Attitudes, P.R. Cohen, J. Morgan, and M. E. Pollack, eds., Intention in Communication, MIT Press, 1990
Pollack, M. E. (1992). The Use of Plans. In Artificial Intelligence, 57(1), pp.43-68Rao, A. & Georgeff, M. (1991). Modeling rational behavior within a BDI-architecture. In Proceedings Int.
Conference on Principles of Knowledge Representation and Planning, pp. 473-484.Reiter, E. & Dale. R. (2000). Building Natural Language Generation Systems, Cambridge University PressSearle, J.R. (1969). Speech Acts: An essay in the philosophy of language. In Cambridge University PressTraum, D. R. & Allen, F. A. (1994). Discourse Obligations in Dialogue Processing. In Proceedings of the 32nd
Annual Meeting of the Association for Computational Linguistics (ACL-94), pp. 1-8Traum, D. R. (1996). Conversational Agency: The TRAINS-93 Dialogue Manager. In Proc. Twente Workshop on
Language Technology 11: Dialogue Management in Natural Language Systems, pp. 1-11Traum, D. & Rickel, J. (2002). Embodied Agents for Multi-party Dialogue in Immersive Virtual World. In
Proceedings of AAMAS 2002, pp. 766-773, ACM Press.Traum, D. & Larsson, S. (2003). The Information State Approach to Dialogue Management. In R. Smith, J.
Kuppevelt (Eds.) Current and New Directions in Dialogue, Kluwer.
!
Functional Mark-up for Behaviour Planning
Theory and Practice
Brigitte Krenn+±
, Gregor Sieber+
+Austrian Research Institute for
Artificial Intelligence Freyung 6, 1010 Vienna
+43 1 53246212
±Research Studio Smart Agent
Technologies Hasnerstrasse 123, 1160 Vienna
{brigitte.krenn, gregor.sieber}@ofai.at
ABSTRACT
We approach the requirements analysis for an FML from a high-
level perspective on communication in general, and the current
state of developments in ECA communication in particular. Our
focus of assessment lies on the two basic units associated with a
communicative event, i.e., the communication partners involved,
and the communication act itself. Apart from coming up with a
selection of properties to be specified in FML, we argue that one
of the major challenges for a widely used FML is how much
freedom the specification leaves in terms of interconnecting
behaviour planning and intent planning, and how much or how
little it enforces the specification of semantic descriptions.
1. INTRODUCTION We approach the discussion of requirements for an FML from a
high-level perspective on communication and the current state of
developments in ECA communication. From a general point of
view questions arise such as: Who is communicating to whom in
which socio-cultural and situational context. What is the overall
interaction history of the communication partners, and what is the
history of the ongoing dialogue. What is the intention of the
communication and what is its content. Transferring these
questions to the ECA domain, at least leads to questions of
modelling the virtual character’s persona including some notion
of personality and emotion, and of modelling the communication
act itself, be it in terms of real-time action and response or in
terms of generating a complete dialogue scene in one go.
Our goal is mainly to come up with open questions and core
topics regarding a possible scope of an FML given the current
state of art in ECA communication. From a practical point of
view, we start from a narrowed down perspective on modelling
the communication partners and the communication act.
In section 2, we give a brief outline of the current state of ECA
development and its implications for the creation of a commonly
used mark-up or representation language at the interface of intent
and behaviour planning. We propose a set of person
characteristics and aspects of communication acts that need to be
considered in the specification of a functional mark-up language.
This is followed by a discussion of some basic building blocks
relevant for the computation of communicative events (section 3).
In section 4, we finally point out that one of the main challenges
of FML lies in finding a trade-off between detailed semantic
descriptions and interoperability of system components. We
round up our considerations with some words of caution
regarding the feasibility and desirability of a clear-cut separation
between intent and behaviour planning.
2. Current Situation in ECA Development --
Implications for the Creation of a Functional
Mark-up Language FML Work on computational modelling of communicative behaviour is
tightly coupled with the development of Embodied
Conversational Characters (ECAs). In ECA systems,
communicative events consist of (i) face-to-face dialogues
between an interface character and a user [1], (ii) an interface
character presenting something to the user [2], (iii) two or more
characters communicating with each other in a virtual or mixed
environment, e.g. [3]. On the one hand, there are ECA systems
where only the generation side of multimodal communicative
behaviour is simulated as it is the case with presenter agents
where the whole dialogue scene is generated in one go, e.g. the
NECA system [4]. On the other hand, there are systems where the
whole action-reaction loop of communication is computed, i.e.,
the system interprets the input of a communication partner and
then generates the reactions of the other communication partner(s)
and so forth. See the REA system [5] as an early example for the
complete process of behaviour analysis and behaviour generation.
Depending on the approaches pursued, the kind and complexity of
information required for processing greatly differs. This
influences the requirements on a functional mark-up or
representation language.
In order to realize communicative behaviour, first of all the
communicative intent underlying the behaviours needs to be
computed. To do this in a principled way requires a good deal of
understanding of the motivational aspects of human behaviour,
i.e., why a human individual (re-)acts in a particular situation in a
certain way. This requires theoretical insights into the underlying
mechanisms that determine the mental, affective and
communicative state of the agent. From psychology and social
sciences we have a variety of evidence that human behaviour is
influenced by such factors as cultural norms, the situational
context the individual is in, and the personality traits and the
affective system of the individual. All of which are huge areas of
research where a variety of models and theories for sub-problems
exist, but where we are still far from modelling the big picture of
how different aspects relate and which mechanisms interoperate
in which way(s). At the same time, we aim at building ECA
applications with characters that display human-like
(communicative) behaviour as naturally and believable as
possible. In other words, we have to smartly simulate human-like
communicative behaviour, which requires shortcuts at various
levels of processing. E.g. somewhere in the system it is stipulated
that, given certain context parameters, some character X wants to
express some fact Y in a certain mood Z. Such an internal state of
the system can be achieved by more or less complex processes.
To which extent these processes influence the inventory and the
mechanisms required for the FML still needs to be discussed.
This directly brings us to another crucial aspect for the design of
representation languages, i.e., the processing components used in
ECA systems. We need to study which subsystems are
implemented, what are the bits and pieces of information that are
required as input to the individual processing components, and
what kinds of information do the components produce as their
output. Especially if we aim at developing representations that
will be shared within the community, there must be core
processing components that are made available to and can be used
by the community. The requirement for reusability of components
touches a crucial aspect of system and application development.
Current ECA systems are built in order to realize very specific
applications. Accordingly all processing components are geared
towards optimally contributing to achieve the goals set out by the
application. In our understanding, this is one of the major reasons
why every group and almost every new ECA project has a
demand for and thus creates their own, very specific
representations. As a consequence the successful development of
representation languages that will be shared and further developed
in the community strongly depends on the ability to develop core
processing components for ECA systems that are flexible enough
to be customized for use in different applications and systems,
and even more important that the customization process of such
components provides a clear advantage over the new development
of specialized ones.
Summing up, we believe successful development of
representations that have a chance to be commonly used must be
flexible enough to allow, on the one hand, in depth representation
of theoretical insights into specific phenomena and, on the other
hand, provide an inventory of high-level representations of core
information that is basic to all systems generating communicative
behaviour. The availability of reusable processing components
that operate on this core is expected to foster the uptake of the
representation language within a wider community. These
considerations equally apply to the ongoing work within the
SAIBA [A] initiative on the development of a common behaviour
mark-up language (BML) [B] as well as to the newly started
endeavour of the development of a functional mark-up language
(FML) for the generation of multimodal behaviour.
In the remainder of the paper, we will start discussing a potential
inventory of an FML from the point of view of two major
building blocks of communicative events, namely the
communication partners and the communicative acts.
3. Some Basic Building Blocks to Realize
Communicational Intent Two basic units associated with a communicative event are the
communication partners involved, and the communication act
itself. See Table 1 for a tentative list of aspects of person
characteristics. The listed characteristics roughly relate to three
dimensions: 1. person information, such as naming, outer
appearance and voice of the character; 2. social aspects, including
the role a character plays in the communicative event, but also
including the evaluation of a character by the others based on the
outer appearance of a character, its gender, and with which voice
the character speaks; 3. personality and emotion. All this
influences how an individual (re-)acts in a certain
(communicative) situation. Even though it is not yet sufficiently
understood how these aspects interrelate to generate
communicative intent, in almost all current ECA systems emotion
plays an important role in intent and behaviour planning as well
as in behaviour realization.
In particular, appraisal models [6] have shown to be well suited
for intent planning, basic emotion categories [7] are widely used
when it comes to facial display, and dimensional models of
emotion have been successfully employed in speech synthesis [8].
Personality models have been integrated in agents to model
behaviour tendencies as well as intent planning, e.g. [9]. The Five
Factor Model of personality [10] is widely used in most of the
works. The interplay between personality and emotion has been
studied. [11], for instance, considers personality to ensure
coherency of reactions to similar events over time.
Thus, information on the emotional state of the communication
partners is important for planning and realization of the
communicative acts. From an emotion theoretical point of view, a
distinction between emotion proper, interpersonal stance, and
general mood of an agent should be possible in the representation
language, as well as a distinction between emotion felt and
emotion expressed. Due to culturally dependent display rules,
individuals will display different emotions depending on the
current social and situational context. A clear separation between
the role of emotion in intent planning versus behaviour planning,
however, is not easy to draw, and depends on the power of both
the intent and the behaviour planner. Some behaviour planners
will be able to make use of different aspects of emotion other
ones will only be able to handle emotion at utterance level.
Looking at a communicative event from a dialogue perspective
(cf. Table 2), we have a structuring of the dialogue into turns, and
a turn into individual communication acts. Communication acts
are either verbal or non-verbal. The verbal communication acts
are assigned with dialogue acts in order to specify communicative
intent, e.g. ask, inform, explain, refuse, etc. As for the non-verbal
communication acts communicative intent can be specified via
backchannel functions such as keep contact, signal understanding,
agree, disagree, etc. For an FML the question arises to which
extent functional labels of verbal and non-verbal communication
acts overlap and where the representational inventory differs. At
the level of communication act different strands of information
come together, such as information on the sender/receiver, on the
emotion expressed, on the communicative intent in terms of
dialogue acts and backchannel functions, as well as on
information structure in terms of links to the previously
communicated information versus providing new information. All
this has a potential to be encoded in FML, core aspects of which
we have listed in the following tables. In Table 1, we have also
included a number of features which are important for the
description of the participants of a communicative event, such as
participants, person, realname, gender, appearance, type, voice,
but which are not core FML features.
Table 1: Aspects of Person Characteristics – An Initial List
for Discussion
Property Description
participants Collection of personal descriptions of all
individuals (characters) that take part in
the communicative event.
person Description of an individual taking part
in the communicative event, including a
unique identifier and a nickname of the
character.
realname Specifies the real name of the character.
Useful in cases where real humans are
represented by avatars, and the
connection to the real person still needs
to be kept.
gender Specifies the gender of the character.
Gender may have various implications
on the behaviour of the character itself
and on how the character’s behaviour is
interpreted by the communication
partners.
type Specifies whether the individual
represented by the character is a human
or a system generated character. Useful
in a mixed environment where user
avatars and system agents interact.
appearance Determines the graphical realization of
the character, i.e. how the character
looks like, how it dresses, what the
neutral posture, the base-level muscle
tone and velocity of the character is.
voice Determines which voice should be used
for the character in speech synthesis and
what the basic prosody parameters are,
such as pitch level and speech rate.
personality Determines the personality type of a
character. The labels and values used
depend on the personality model
employed, e.g. extroversion,
neuroticism, agreeableness in case of a
simple factor model, but also labels such
as politeness and friendliness may be
useful in certain applications. Depending
on the underlying model, values may be
represented by labels or via integers or
floats.
role Role is a domain-specific attribute of the
character and determines the specific
role the character plays in the given
application, such as buyer or seller, pupil
or teacher, bully or bullied, husband or
wife, mother or child, story teller or
hearer etc. Thus role has a variety of
(implicit and explicit) social implications
which may be explicitly specified in the
FML or modelled inside a processing
component.
emotion Depending on the emotion theory (such
as dimensional model, appraisals,
emotion categories) the representations
of emotion differ. As a starting point for
emotion representations related to the
three different models see the work on
the emotion representation language
EARL [3].
emotionFelt Kind and intensity of emotional state of
the character.
emotionExpressed Kind and intensity of emotion displayed.
Felt emotion and displayed emotion are
not necessarily identical, cf. display
rules.
interpersonalStance How the affective relation to the
communication partner is.
mood How the base-level affective state of the
character is.
Table 2: Aspects of Communication Act – An Initial List for
Discussion
Property Description
turn A turn comprises a sequence of
communication acts of one speaker.
Turns are the main building blocks
which describe how the dialogue is
structured.
communicationAct Specifies a communicative act (as
opposed to a non-communicative act).
This may be a verbal or a nonverbal act,
each of which has a communicative
function or goal, and can be colored by
emotion. Note, because of the
embodiment of ECAs verbal acts
inherently contain bodily aspects. A
communication act can be a reaction to
some other communication act, and it
can introduce new information to the
dialogue. A communicative act has its
underlying producer-side intentions and
goals, such as provide or get
information, improve relationship,
maintain or gain power, cheat, lie, etc.
All these may require generalized high-
level representations as well as theory-
dependent in-depth representations.
dialogueAct Refers to a verbal communication act
and may consist of one or more
utterances. As a staring point for the
mark-up of the communicative intent,
models for dialogue act mark-up such as
the DAMSL [D] annotation scheme can
be used, but also agent mark-up
languages such as FIPA ACL [E] should
be taken into account. While DAMSL
(and its extension SWBD-DAMSL,
[12]) is a high-level framework that has
been developed for the annotation of
human dialogue, FIPA ACL has a
defined semantics for each
communicative act that is exchanged
between software agents. In practice,
however, for concrete ECA applications
additional application-specific labels
may be useful.
informationStructure Looking from a high-level and coarse-
grained perspective, information
structure anchors what is being
communicated onto what has previously
been communicated (theme) and what
the new contribution is (rheme).
Information structure also influences
prosody and thus may be a valuable
input for speech synthesis [13].
nonVerbalAct A communication act that entirely
consists of nonverbal behaviour. Typical
non-verbal acts in communicative
situations are backchannels. The
functional labels from Elisabetta
Bevacqua’s feedback lexicon could be a
good starting point here.
producer Who the producer of a verbal or
nonverbal act is.
addressee Who the addressee is. Producer,
addressee and hearer refer to the persons
specified in the participants list of the
communication event.
receiver The individual who feels addressed by
the producer’s utterance or nonverbal
act. Receiver and addressee are not
necessarily identical.
perceiver The overhearer or onlooker of a
communicative act. Perceivers in
contrast to receivers do not feel affected
by the communicative act. Producer,
addressee, receiver, perceiver are the
communication act side of person
characteristics.
4. Further Challenges: Separation of Intent
and Behaviour Planner Apart from coming up with a selection of properties to be
specified in FML, we suppose that one of the major challenges for
the specification of an FML is how much freedom the
specification leaves in terms of interconnecting behaviour
planning and intent planning. Consider the problem of deciding
whether to use a non-verbal act such as an iconic gesture to
convey a certain intention. This could, for example, be a good
solution in a situation where the addressee is busy talking to
someone else, where it would be impolite to interrupt due to
cultural or social restrictions, and where the agent would prefer
not to wait with the communicative act until the addressee has
finished the other conversation.
If completely independent planning components are assumed, a
rather detailed semantic description of the content to be
communicated and of the situation the agent is in is required.
Since FML should not contain information on the physical
realisation, and if intention planning does not get feedback from
behaviour planning, the component has no knowledge whether
there is a certain gesture available to the agent that will serve the
communicative intention. Thus the behaviour planning
component needs to receive input in a detailed enough semantic
description that allows for the decision that a) it would be good to
use a gesture in the current situation, b) there is a gesture that
conveys the meaning of the message such that no essential
information is lost. In contrast, a system with less distinct
boundaries between intention and behaviour planning would
require less detailed semantic descriptions. For instance, given the
intent planner has access to the gestures available in the system,
the intent planner would be able to decide to use a certain gesture
in the moment it defines the agent’s intentions. Thus there would
be no necessity for further serializing the information, reading it
in and interpreting it inside the behaviour planner.
In practice, not every system will be able to provide or process
detailed semantic information as may be required by a strict
separation of intent and behaviour planning. This may be due to
the real-time requirements of ECA systems, a lack of a suitable
semantic representation language, or the lack of suitable and
efficient semantic processing components.
The success of FML within the ECA community, thus, is also
likely to depend on how much - or how little - it enforces the
specification of semantic descriptions: on the one hand leaving
enough flexibility to remain usable in systems that do not make
use of detailed semantic representations, and on the other hand
providing enough semantic detail to ensure interoperability
between conforming components.
5. ACKNOWLEDGMENTS This research is supported by the EU-FP6 Cognitive Systems
Project IST-027596-2004 RASCALLI.
6. REFERENCES [1] Matheson, C., Pelachaud, C., de Rosis, F., and Rist, T. 2003.
MagiCster: Believable Agents and Dialogue, Künstliche
Intelligenz, special issue on “Embodied Conversational
Agents”, November 2003, 4, 24-29.
[2] Nijholt, A. 20076. Towards the Automatic Generation of
Virtual Presenter agents. In: Proceedings InSITE 2006,
Informing Science Conference, Salford, UK, June 2006, CD
Proceedings, E. Cohen & E. Boyd (ds.).
[3] Rehm, M., and André, E. 2005. From chatterbots to natural
interaction - Face to face communication with Embodied
Conversational Agents. IEICE Transactions on Information
and Systems, Special Issue on Life-Like Agents and
Communication.
[4] Krenn, B. 2003. The NECA Project: Net Environments for
Embodied Emotional Conversational Agents Project Note. In
Künstliche Intelligenz Themenheft Embodied Conversational
Agents, Springer-Verlag, 2003, p. 30-33.
[5] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L.,
Chang, K., Vilhjálmsson, H., and Yan, H. 1999.
"Embodiment in Conversational Interfaces: Rea."
Proceedings of the CHI'99 Conference, pp. 520-527.
Pittsburgh, PA.
[6] Ortony, A., Clore, G.L., and Collins, A. 1988. The Cognitive
Structure of Emotions. Cambridge University Press.
[7] Ekman, P. 2007. Emotions Revealed: Recognizing Faces and
Feelings to Improve Communication and Emotional Life.
2nd. ed. Owl Books, New York.
[8] Schröder, M. 2004. Speech and emotion research: an
overview of research frameworks and a dimensional
approach to emotional speech synthesis (Ph.D thesis). Vol. 7
of Phonus, Research Report of the Institute of Phonetics,
Saarland University.
[9] Andre, E., Klesen, M., Gebhard, P., Allen, S. and Rist, T.
1999. Integrating models of personality and emotions into
lifelike characters. In Proceedings International Workshop
on Affect in Interactions. Towards a New Generation of
Interfaces.
[10] Robert R. McCrae, and Paul T Costa, Jr. 1996. Toward a
new generation of personality theories: Theoretical contexts
for the five-factor model. In J. S. Wiggins (Ed.), The five-
factor model of personality: Theoretical perspectives (pp. 51-
87). New York: Guilford.
[11] Ortony, A. 2003. On Making Believable Emotional Agents
Believable. In R. Trappl, P. Petta, S. Payr (eds). Emotions in
Humans and Artefacts. MIT Press.
[12] Jurafsky, D., Shriberg, E., and Biasca, D. 1997. Switchboard
SWBD-DAMSL shallow- discourse-function annotation
coders manual, draft 13. Technical Report 97-01, University
of Colorado Institute of Cognitive Science, 1997.
[13] Baumann, S. 2006. The Intonation of Givenness - Evidence
from German. Linguistische Arbeiten 508, Tübingen:
Niemeyer (PhD thesis, Saarland University).
Web Links
[A] SAIBA http://www.mindmakers.org/projects/SAIBA
[B] BML http://www.mindmakers.org/projects/BML
[C] EARL http://emotion-research.net/earl/
[D] DAMSL
http://www.cs.rochester.edu/research/speech/damsl/RevisedManu
al/RevisedManual.htm
[E] FIPA ACL
http://www.fipa.org/specs/fipa00037/SC00037J.html
Thoughts on FML: Behavior Generation in the VirtualHuman Communication Architecture
Jina LeeUniversity of Southern
CaliforniaInformation Sciences Institute4676 Admiralty Way, # 1001Marina del Rey, CA 90292
David DeVaultUSC Institute for Creative
Technologies13274 Fiji Way
Marina del Rey, CA [email protected]
Stacy MarsellaUniversity of Southern
CaliforniaInformation Sciences Institute4676 Admiralty Way, # 1001Marina del Rey, CA [email protected]
David TraumUSC Institute for Creative
Technologies13274 Fiji Way
Marina del Rey, CA [email protected]
ABSTRACT
We discuss our current architecture for the generation ofnatural language and non-verbal behavior in ICT virtualhumans. We draw on our experience developing this archi-tecture to present our current perspective on several issuesrelated to the standardization of FML and to the SAIBAframework more generally. In particular, we discuss ourcurrent use, and non-use, of FML-inspired representationsin generating natural language, eye gaze, and emotional dis-plays. We also comment on some of the shortcomings of ourdesign as currently implemented.
1. OVERVIEWIn this paper, we discuss our experience developing multi-
modal generation capabilities within the ICT virtual humanarchitecture. This paper is intended to contribute to anongoing e!ort to standardize Functional Markup Language(FML) as a representation scheme for describing commu-nicative and expressive intents across diverse conversationalagents. Our discussion focuses on how our current approachto generating natural language, eye gaze, and emotional dis-plays relates to FML and to the SAIBA framework withinwhich FML has been characterized [8].
The SAIBA framework makes a distinction between pro-cesses of intention planning, behavior planning, and behav-ior realization. It then situates these processes within ageneration pipeline, and proposes two communication lan-guages to mediate between these processes: FML to specifythe result of intention planning to behavior planning, andBML to specify the result of behavior planning to behaviorrealization.
While there has been a lot of work on BML, there hasbeen comparatively less work on FML and the various real-world architectural issues associated with implementing theSAIBA framework. We begin with a high-level discussion ofsome of these architectural issues.
One high-level consideration is that the distinction be-tween intention planning, behavior planning, and behaviorrealization is only one of many organizing distinctions that
could be made in a communication/action planning frame-work. Some others include the following.
One can distinguish actions according to the di!erent kindsof intentions that can be behind them. Allwood [1] dis-tinguishes three types of communication: Indicate, Display,and Signal. A sender indicates information if that infor-mation is conveyed without conscious intention. Displaysare consciously shown, and signals are conscious showingsof the showing (i.e. intending the receiver to recognize theconscious showing). An embodied agent may perform anaction intentionally without intending to communicate any-thing; if another agent or person is present, important infor-mation may nevertheless be conveyed by indication. Shouldthe planning of actions that are not intended to be commu-nicative be part of the FML/BML pathway, or should theseactions reach the behavior realizer through some other chan-nel? Moreover, some behaviors that embodied agents needto realize (e.g., breathing) are not “intentional” in the rele-vant sense, and thus the notion of intention planning is in-appropriate. If information about agent state is relevant torealizing such behaviors, is this information also channeledto the realizer outside the FML/BML pathway?
Another organizing distinction could be the type of behav-ior. Traditionally, verbal behavior and non-verbal behaviorhave been generated at di!erent times and using di!erentmeans. Verbal communication has discrete units, a fairlyarbitrary relationship of form to meaning, and deep lexical,syntactic and semantic structures, while non-verbal commu-nication often is more continuous, has a closer relationshipof form to meaning, and shallow syntactic structure. Tradi-tional text generation often has more stages in processing,and uses more contextual information. Most SAIBA workhas focused on non-verbal behavior. Should the same path-ways be used for text generation and non-verbal behavior, orshould these paths be split (e.g., with text generated first)?And of course, this issue extends to other kinds of behaviorsthat are not realizing a communicative function.
Another architectural issue arises in real-time interactiveconsiderations. Even though the proponents of the SAIBAframework are keenly aware of the importance of real-time
interaction, the SAIBA framework remains suggestive of atraditional pipeline architecture of planning followed imme-diately by plan execution. This is fine for a virtual agentthat resides in a static environment. However, in a more dy-namic environment, an agent must respond to unexpectedevents in the environment. For example, many communi-cation decisions must rely not just on individual intentionplanning, but also on monitoring the e!ects of previouslyplanned action, and especially on monitoring new actionsby people and other agents. Intention planning thus musthave access to this information and must also be able toadjust or cancel communication that has been planned butnot yet performed. This suggests not only additional re-quirements on what is provided by the intention planner tothe behavior planner but also on what is provided by thebehavior planner and realizer to the intention planning.
Finally, there is a more general architectural question ofhow to modularize a real-world generation system in a waythat provides each module with all the sources of informationit needs. For example, as we discuss in further detail below,our current gaze generation system relies on fine-grained,dynamic information about upstream cognitive processing.Similarly, natural language generation can sometimes re-quire detailed information about the agent’s cognitive stateand other contextual factors. Such rich information needscan create pressures that work against maintaining a cleantheoretical modularity such as that suggested in the SAIBAframework.
In the remainder of the paper, we discuss our virtual hu-man architecture and then our perspective on how our cur-rent design might inform the standardization of FML.
2. ICT VIRTUAL HUMAN COMMUNICA-TION ARCHITECTURE
The virtual human project at ICT [14, 20, 17] has pro-duced several virtual humans and a developing architecture,which is depicted in Figure 1. In this section, we describethe control flow and representations involved in generatingmultimodal output within this architecture.
For intentional communication signals, the generation pro-cess starts with configurations of the agent’s informationstate that match a proposal rule. Examples include obli-gations to answer a question, ground or repair previouslycommunicated information, or make a suggestion. Theseproposals to communicate compete with many other goalsof the agent – both to say other things as well as to performother actions such as monitoring the communication of oth-ers or acting in the world. Once a proposal is selected, thegeneration process begins.
2.1 Natural language generationIn our current system, natural language generation (NLG)
occurs before non-verbal behavior generation (NVBG). Ingeneral, the dialogue manager initiates NLG by sending ageneration request to an external generator. However, cur-rently the dialogue manager sometimes bypasses the exter-nal generator if it already knows a good text string for itsdesired output, according to hand-implemented SOAR rules,or rules generated from an ontology. We have four di!erentexternal generators that may be used, including two statis-tical generators, a hand-crafted grammar-based generator,and a hybrid generator. [19] has more details on a previous
Figure 1: The virtual human system architecture.
version of the generation process.The dialogue manager sends requests to the generator in
the form of one or more speech acts and dialogue acts torealize. The messages to the generator are of the form givenin Figure 2. The vrGenerate message can be received byany external generator. In this case the dialogue manageris asking for a greeting speech act from the virtual humanelder-al-hassan to a human addressee, who plays the roleof a U.S. Army captain (captain). This act is also the re-sponse to a previous utterance. One or more generators canreply to this request with vrGeneration messages such asthose in Figure 3. There can be one or more vrGenera-
tion interp messages, each one with a candidate text forthis output and with an interpretation identifier (1) and aquality value (-3.742008). The vrGeneration done mes-sage tells the dialogue manager that the generator(s) arefinished sending interpretations.
Figures 4 and 5 show a similar request and response. Thistime another virtual human, doctor-perez, is trying to ne-gotiate, and wants to address a problem in a plan involvingmoving to downtown by telling the elder that his agreementis important for the success of the plan. When the dialoguemanager has received the generation results, it can decidewhich one to use (if there is more than one result), based onboth the quality of the generation and other factors (e.g.,whether it has said this same string before). The dialoguemanager might also decide to cancel the speech if it is nolonger relevant (or if, e.g., another character starts speakingand this character does not want to interrupt).
Thus, in our current architecture, NLG is not part of apure pipeline since the upstream dialogue manager choosesbetween alternative NLG outputs and sometimes cancelsoutput altogether. After the dialogue manager decides togo forward, a call is sent to carry out this utterance. Thiscall includes information on the speech acts and dialogueacts as well as the text, and results in an XML message
vrGenerate elder-al-hassan elder-al-hassan203
addressee captain
speech-act<A135>.type csa
speech-act<A135>.action greeting
speech-act<A135>.actor elder-al-hassan
speech-act<A135>.response-to gsym1
speech-act<A135>.addressee captain
Figure 2: Generator request
vrGeneration interp elder-al-hassan
elder-al-hassan203 1 -3.742008
hello captain
vrGeneration done elder-al-hassan
elder-al-hassan203
Figure 3: Generator response
being sent to the NVBG module.
2.2 Nonverbal and other physical behaviorsIn addition to dialogue management, a virtual human’s
cognitive processes include task planning, a gaze model,and an appraisal-based model of emotion. These processesprovide a range of information to our NVBG module [10]through FML-inspired constructs. This information includesa specification of the communicative intent (including thespeech acts and dialogue acts), the surface text of the utter-ance, the agent’s gaze state, and a range of factors associatedwith the emotion model.
In this section, we present our current use of FML-inspiredconstructs to pass gaze and emotion information to the be-havior planner. We will not discuss further the simple FMLelements we currently use to capture the communicative in-tent and the surface text. It is important to note, how-ever, that this is a hybrid approach that assumes NLG isupstream of the behavior planner but also assumes the in-tentional/semantic content can help refine non-verbal be-havior choices. In terms of the SAIBA framework, one wayto view this approach is that in some implementations bothFML elements and BML elements are passed to the behaviorplanner. More generally, this raises fundamental issues forFML and SAIBA as to what assumptions are being madein the framework about how verbal and nonverbal behaviorsare generated (or co-generated). We discuss this in greaterdetail in Section 3.3. Presently, we are actively consider-ing alternative generation schemes and therefore expect ourperspective on the appropriate FML elements to evolve asour design process continues. Our focus in this section is onaspects of our current use of FML that are somewhat morestable and, we believe, more transferable to other systems.
2.2.1 GazeThe reader may think that gaze is not a function but a
behavior, and thus should not be an element in FML at all,but rather solely in BML. In the abstract, we would tendto agree. However, given the real-time changes in humangaze directions and targets during communication, and themyriad functions that gaze plays in human cognitive andsocial behavior, it is important to consider its role in detail.
In our current virtual human system, the gaze model [11]
vrGenerate doctor-perez doctor-perez386
addressee elder-al-hassan
speech-act<A348>.motivation<V22>.reason
downtown
speech-act<A348>.motivation<V22>.goal
address-problem
speech-act<A348>.content<V21>.
modality<V23>.conditional should
speech-act<A348>.content<V21>.type action
speech-act<A348>.content<V21>.theme downtown
speech-act<A348>.content<V21>.event agree
speech-act<A348>.content<V21>.agent
elder-al-hassan
speech-act<A348>.content<V21>.time present
speech-act<A348>.addressee elder-al-hassan
speech-act<A348>.action assert
speech-act<A348>.actor doctor-perez
Figure 4: Generator request
vrGeneration interp doctor-perez
doctor-perez386 1 -2.9832053
you should agree to this before we can think
about moving elder
vrGeneration done doctor-perez doctor-perez386
Figure 5: Generator response
resides in the cognitive module and generates various gazecommands. The key principle behind the model is that gazeshould reflect the agent’s underlying cognitive state; thishas historically led us to locate it within the cognitive mod-ule, not the behavior planner. Since gaze movement is afast and immediate process, the gaze model is closely inter-twined with the agent’s task planner, dialog manager, andemotion model. Each of these components, which constitutethe cognitive module, generates a set of cognitive operatorsthat represent the agent’s internal processing. The role ofthe gaze model is then to associate these operators with cor-responding gaze behaviors.
The generated cognitive operators can be understood interms of several broad categories of cognitive processes inconversation. For example, as illustrated in Table 1, thereare cognitive operators related to conversation regulation,update of internal cognitive state, and monitoring of eventsor goal status. While most operators related to conversa-tion regulation generate gaze commands accompanying ver-bal utterances, others do not. For instance, monitoring forexpected/unexpected changes, attending to a physical stim-ulant in the environment, or checking a condition for a pur-sued goal are internal intentions that are reflected intention-ally or unintentionally through various nonverbal behaviors.Additionally, there are cognitive operators related to theagent’s coping strategies (discussed further below).
The gaze model associates these cognitive operators withgaze behaviors by providing a specification of both the phys-ical manner of gaze (e.g. target, type, speed, priority) andits functional role. The functional role, or the reason of thegaze command, is a description of the cognitive operatorthat triggers the gaze command. This may be a sub-phaseof a higher-level cognitive operator. For example, during
Category Cognitive Operator Gaze Reason- planning speech (look at hearer, hold turn, rejection,rejection goal satisfied, acceptance reluctant, remember-ing)
output-speech - speakingConversation - speech doneRegulation - speech done hold turn
listen-to-speaker - listen to speakerinterpret-speech - interpret speechexpect-speech - expect speechwait-for-grounding - expect (acknowledgment, expect repair)update-desire
Update Internal update-relevance - planningCognitive State update-intention
update-belief - monitor goalattend-to-sound - attend to sound objectcheck-goal-status - monitor goal
Monitor for Events / monitor-goal-status - monitor goal refreshGoal Status monitor-for-expected-e!ect - monitor for expected e!ect
monitor-for-expected-action - monitor expected action- monitor expected action (assert intention to performthe action, (take action against an action)- seek social support- monitor goal- avoidance
Coping Strategy Coping-focus - convey displeasure- accept responsibility- make amends- resignation- avoidance (by-distancing, by-wishing-away)
Table 1: Partial overview of cognitive operators, gaze reasons, and gaze behaviors
the output-speech phase, there are sub-phases such as plan-ning speech, speaking, complete speaking, holding the turn,etc. Table 1 shows how various gaze reasons correspond tocognitive operators in our system.
In our system, we use an FML <gaze> element with theproperties of gaze behaviors specified in the attributes andsend it to NVBG. NVBG then transforms it into a BML<gaze> element and sends it to SmartBody [18], the behav-ior realization module.
As the gaze model was originally developed, the gaze man-ner specified by the model provided parameters to a proce-dural animation of gaze by a behavior realizer. However,in our current work, we are providing the reason parame-ter to the behavior planner. This specification will allowfor more expressive variations as well as variations that mayalso be tied to other aspects of the body’s state as well ascapabilities of the animation system.
2.2.2 EmotionIn our system, we model both the generation of emotional
states that arise as the virtual human reacts to events as wellas how the virtual human copes as it attempts to regulateits emotional state. EMA (EMotion and Adaptation) [7] isthe emotion model in our virtual human system. EMA islargely based on Lazarus’ work on appraisal theory [9].
Appraisal
EMA assesses emotion-eliciting events into a range of ap-praisal dimensions (or checks or variables), such as perspec-
tive, desirability, likelihood, expectedness, causal attribu-tion, temporal status, controllability, and changeability. Theappraisal dimension is then mapped to generate various emo-tion labels and intensity of those emotions. For example,an undesirable and uncontrollable future state is mappedas fear-eliciting. In general, a set of appraisal patterns cangenerate one or more emotion labels.
Currently in our system, an FML <a!ect> element isused to specify both the emotion labels along with the inten-sity, target, and stance (leaked or intended) of the emotion.Whenever the agent’s emotion is re-assessed, this informa-tion is sent to the NVBG module, which uses it to modifythe gestures created. Note we discuss in this section howwe model “leaked” emotions, or more accurately “felt” emo-tions, as opposed to emotional expression used intentionallyas a signal, which we discuss in the Coping Strategy sectionbelow.
Once the appraisal dimensions are (re-)evaluated, theyare also used to generate Facial Action Unit codes, based onthe work of Ekman [5]. As opposed to emotion labels, theaction units are specified in BML (instead of FML) withinthe <face> element and sent to NVBG. Since NVBG re-ceives the action units in BML, it simply passes them toSmartBody. However, conceptually it should be the behav-ior planner that generates action units along with other ges-tures after receiving the agent’s a!ective state. In the future,we suggest alternative ways to express the agent’s a!ect de-pending on the level of detail available. Section 3 describesour proposed FML specifications.
Note that there is a range of research issues concerning themapping from appraisals and emotions to action units thatwe are glossing over here. Whereas several psychological the-ories have postulated a mapping from appraisal variables toaction units, they di!er on the specifics of the mapping. Fur-ther, given any specific appraisal, there may not be a uniquemapping to action units even given the same theory. Thereare individual di!erences in how to map appraisals or emo-tions to action units. There are also alternative theories thatpostulate that there is not a mapping between appraisalsto action units but rather mappings from emotions to ac-tion units. There are also issues in dynamics. Psychologicaltheories di!er in whether they postulate temporal orderingrelations between appraisal checks and whether they arguethat this ordering is reflected in temporal di!erences in theordering of associated action units. There are, finally, evensome psychologists that argue against facial expressions re-vealing ”true” underlying emotional states, instead arguingthat facial expressions are social signals.
Coping Strategy
EMA also incorporates a computational model for copingstrategy integrated with appraisal dimensions [7]. EMA an-alyzes the causality of events that produce the given ap-praisal dimensions and suggests strategies to either preservedesirable states or overturn undesirable states. These strate-gies may propose to execute certain plans, alter goals andbeliefs, or shift blame for an undesirable event to another en-tity. The coping strategies modeled in EMA are organizedby their impact on the agent’s focus of attention, beliefs, de-sires, or intentions. Table 2 gives an overview of the copingstrategies.
In the current virtual human system, coping strategies arepropagated to the behavior planner in two ways. One is byimplicitly influencing the agent’s a!ective state and gener-ating a new emotion label, which is then taken into accountduring behavior generation. The other is by directly influ-encing the nonverbal behaviors generated. In particular, thegaze model described above has certain gaze behaviors as-sociated with di!erent coping strategies. For example, seekinstrumental support shifts gaze towards some other agentwhereas Resignation causes the agent to avert gaze from itscurrent target. However, as with the case of appraisal, itis more appropriate to describe the coping strategy withinFML and let the behavior planner decide how this wouldinfluence the behavior generation process.
Coping also provides the agent with the means to con-vey emotional states intentionally, for example, by showingdispleasure or anger. This expression or signaling of emo-tional state may di!er from the true or felt underlying emo-tional state of the virtual human. It is this distinction whichmotivated the original FML ideas of distinguishing “leaked”from “intended” emotions; see our proposed FML <a!ect>element in Section 3.2.
Currently, modeling of coping strategies is not commonin virtual human systems. Unlike other cognitive operationsdescribed in this paper, coping strategies may not have animmediate e!ect in the behavior generation process. Rathera coping response may influence how the agent selects, plans,and executes its internal goals. This in turn has influenceson the choices of behaviors. On the other hand, a copingresponse can be an immediate reaction with well-defined be-havioral correlates, such as avoidance responses impacting
gaze or shifting blame impacting an expression of anger.
3. PROPOSED SPECIFICATIONS OF FMLIn this section, we propose several elements of FML based
on our current provisional use of FML-inspired constructs.
3.1 GazeAs described in the previous section, the key principle in
our model of gaze is that it should reflect the agent’s innerprocessing. In line with this, our current and proposed spec-ification of the <gaze> element in FML includes the reasonof the gaze command in fine-grained detail along with thetarget and type of gaze (see Table 3). This allows di!erentbehavior planners to represent the same communicative in-tent with varying expressivity depending on the capabilityof the virtual human system (e.g. full human embodimentvs. simplified character with only a head figure).
A second alternative is to back away from the commitmentthat the link from cognitive processes to behavior planner iscaptured solely in FML and the link from behavior plannerto realizer is captured solely in BML. Rather, various mod-ules along a path (or paths) may be allowed to add FMLor BML elements. This allows for considerable flexibility inhow modules are realized but may also impact the sharingof modules across research e!orts.
Finally, we could go even further towards a functionalspecification. FML may want to avoid even calling this el-ement ‘gaze’. Perhaps ‘attention’? However that also doesnot quite capture the range of functions that is performedby di!erent gaze types. That range might be best expressedby the general categories in Table 1: Conversation Regula-tion, Update Internal Cognitive State, Monitor, and CopingStrategy. In this view, the FML element would be one ofthose categories, with the Reason being a further specializa-tion of that element. We believe this view is most consistentwith the goals of specifying FML.
3.2 EmotionOur proposal for representing emotion in FML is to have
alternative ways to express the agent’s a!ect. These alter-native ways would be tied to the underlying class of emo-tion model used by a system. For instance, we suggest anFML structure that allows the system to either represent theemotion labels (categories) or the more detailed appraisal di-mensions. Table 4 gives the suggested structure of two FMLelements for this purpose. Here are examples of both cases:
1. Representing emotional label:<a!ect type=”joy” intensity=”1.0” target=”captain-kirk”/>
2. Representing appraisal dimensions:<a!ect type=”appraisals” target=”captain-kirk”/>
<appraisal type=”desirability” value=”0.2” />
<appraisal type=”controllability” value=”0.5” />...
</a!ect>
In the latter case, if the value of the a!ective type is ‘ap-praisals’, the type, target, and stance of the emotion shouldstill be specified. But we propose the <a!ect> element tohave an arbitrary number of <appraisal> elements embed-ded to represent the di!erent appraisal variables and values.
Table 2: Coping strategies modeled in EMACoping Strategy Description
Attention RelatedSeek Information Form a positive intention to monitor pending unexpected, or uncertain
state that produced the appraisal values.Suppress Information Form a negative intention to monitor the pending, unexpected or un-
certain state that produced the appraisal values.Belief Related
Shift Responsibility Shift a causal attribution of blame/credit from/towards self and to-wards/from other agent.
Wishful Thinking Increase/lower probability of a pending desirable/undesirable outcomeor assume some intervening act/actor will improve desirability.Desire Related
Distance/Mental Disengagement Lower utility attributed to a desired, but threatened state.Positive Reinterpretation / SilverLining
Increase utility of positive side-e!ect of some action with a negativeoutcome.Intention Related
Planning / Action Selection Form an intention to perform some external action that improves anappraised negative outcome.
Seek Instrumental Support Form an intention to get some other agent to perform an external actionthat changes the agent-environment relationship.
Make Amends Form an intention to redress a wrong.Procrastination Defer an intention to some time in the future.Resignation Abandon an intention to achieve a desired state.Avoidance Take action that attempts to remove agent from a looming threat.
Table 3: Proposed structure of <gaze> element in FMLElement: <gaze>
gaze-type A symbol describing the type of gaze at the target (e.g. avert, cursory, look, focus, weak-focus).
target The name of an object that the agent is gazing at or shifting gaze to, or averting in the caseof gaze aversion.
priority A symbol describing the priority of the cognitive operation that triggered this gaze command.reason A detailed rationale behind why we are doing the gaze (currently represented as a token).
Table 4: Proposed structure of <a!ect> element in FMLElement: <a!ect>
type Indicates the category of a!ect (joy, anger, fear, ...) or whether the a!ect will be representedby appraisal dimension (appraisals).
target Person who is possibly being targeted by the resulting a!ective behavior.stance Whether the emotion is intentionally given o! or involuntarily leaked (intended, leaked).intensity The intensity of emotion.
Element: <appraisal>
type A single appraisal variable (desirability, controllability, ...).value The intensity of the appraisal variable.
As discussed above, researchers have developed a numberof theories of emotions, each varying in how they model thedynamics of emotional processes. Here we have suggestedtwo ways to represent emotion from two emotion theories,namely the categorial theory of emotion and appraisal the-ory. The expressivity to represent not only the emotionlabels but also the appraisal variables allows the behaviorplanner to draw on a deeper understanding of the impactan event has for an agent and to generate behaviors accord-ingly. However, to employ models of other emotion theo-ries, more discussion is needed about how to represent theproperties of those models. In particular, we should also con-sider incorporating dimensional models such Mehrabian andRussell’s PAD (Pleasure-Arousal-Dominance) model [12] ormore recent work related to such dimensional models (e.g.,Core A!ect). Finally, we should also explore the emotionannotation schemes being developed by other consortiumssuch as the HUMAINE work [15].
3.3 Language Generation and FMLAs discussed in Section 2.1, we currently use a system-
specific representation scheme to formulate NLG requestsand responses. We have not attempted to transform thisscheme into an FML representation that might be used acrossdi!erent systems. In this section, we discuss some of thechallenges we believe would be associated with standardiz-ing a messaging protocol for NLG across systems.
In general, our perspective is that if NLG is to be as-similated into the SAIBA framework, it should be viewedas part of behavior planning rather than intent planning.This is because, first, at a conceptual level, language useis planned behavior. Indeed, NLG systems typically frametheir language generation problem as one of planning a lin-guistic output that accomplishes an incoming communica-tive intention or communicative goal [13]. Second, in manysystems, there may be advantages in terms of naturalnessand e"ciency of communication that come with planningverbal and non-verbal behavior simultaneously, as in, forexample, [2].
Let us consider, then, what the implications for the stan-dardization of FML would be if NLG were to be generallysituated within the behavior planning stage of the SAIBAframework. In the canonical NLG pipeline [13], an NLGalgorithm is internally divided into three successive stages:document planning, microplanning, and realization. Docu-ment planning is the process of deciding what informationshould be communicated, while microplanning and realiza-tion plan an output text that achieves this communicativegoal. An intuitive approach would therefore locate docu-ment planning within the intent planning stage of SAIBA,and locate microplanning and realization within the behav-ior planning stage.
To understand the implications for FML, we need to lookat the typical inputs needed by microplanners and realiz-ers. While the division of labor between microplanning andrealization, and the interface between them, varies consid-erably between systems [13], we may generally observe thatboth processes depend on relatively rich input specificationsto achieve high quality output. For example, one subtaskthat microplanners typically solve is the generation of refer-ring expressions (GRE) for particular objects or individualsthat are implicated in the communicative goal. In general,GRE requires as input a ranking of the relative salience of
various objects and properties in the non-linguistic context,as well as the dialogue/discourse history, so that an appro-priate level of detail can be selected for the referent of theexpression (e.g., the choice of a pronoun versus a complexdefinite noun phrase); see, e.g., [3, 16].
More generally, the fact that microplanning and realiza-tion involve fine-grained lexical choices can add additionalinput requirements. For example, the SPUD microplanner[16] requires as input the communicative goal (expressedas a set of logical formulas), a grammar, and a represen-tation of the current context (including elements of dia-logue/discourse history as well as non-linguistic context).Because SPUD expects the communicative goal to be ex-pressed using logical formulas, it would not be trivial totranslate a virtual human generation request such as thosein Figures 2 and 4 into a communicative goal for SPUD. Fur-ther, the input context representation needs to extend downto the granularity of lexical semantics in the language to begenerated. One way of providing this information to SPUDis to provide a knowledge interface, as in [4]. The knowledgeinterface allows SPUD to interactively query for salience in-formation and to evaluate semantic constraints associatedwith alternative lexical choices in the current context. Thiscreates another question about how to provide, within theSAIBA framework, an NLG module with all the resources itpotentially needs. It would seem that an FML-ized gener-ation request would either need to carry a quite exhaustivedescription of context, or else the generator would need tobe provided with some mechanism by which upstream mod-ules can be interactively queried for additional informationas needed.
Another challenge is that di!erent realizers can also ex-pect di!erent input formats. For example, the FUF realizer[6] requires as input a functional description, which is a hi-erarchical set of attribute-value pairs that partially specifythe lexico-syntactic structure of the output utterance. TheOpenCCG realizer [21] requires as input the logical form ofthe utterance to be realized, expressed (in XML) as a se-mantic dependency graph or (equivalently) in a hybrid logicdependency semantics formalism. Typically, for a given real-izer, a paired microplanner draws on a lexicon and/or gram-mar, as well as various domain-specific rules and context in-formation, to automatically translate a communicative goalinto the appropriate inputs to the realizer. The challengefor FML is that the particular representation scheme thatis chosen for FML should aim to remain compatible with,and easily converted into, the particular input formats andinternal pipelines assumed by such di!erent NLG compo-nents. We do not immediately see how to achieve this goal,especially given the widely varying approaches to NLG thatare currently being explored. However, this is an area wheredetailed discussion between researchers might yield an op-erational interim approach.
4. CONCLUSIONIn this paper we have presented our implementation of
multimodal generation capabilities in the ICT virtual hu-man architecture. We have drawn on our experience withthis architecture to present our perspective on the standard-ization of FML elements for generating eye gaze, emotionaldisplays, and natural language. While our conclusions havegenerally been tentative, we hope to have achieved our aimof furthering the ongoing discussion of FML and the SAIBA
framework as a useful approach to multimodal generationacross diverse conversational agents.
5. ACKNOWLEDGMENTSThis work was sponsored by the U.S. Army Research,
Development, and Engineering Command (RDECOM), andthe content does not necessarily reflect the position or thepolicy of the Government, and no o"cial endorsement shouldbe inferred.
6. REFERENCES[1] J. Allwood. Bodily communication - dimensions of
expression and content. In B. Granstrom, D. House,and I. Karlsson, editors, Multimodality in Languageand Speech Systems, pages 7–26. Kluwer AcademicPublishers.
[2] J. Cassell, M. Stone, and H. Yan. Coordination andcontext-dependence in the generation of embodiedconversation. In Proceedings of INLG, 2000.
[3] R. Dale and E. Reiter. Computational interpretationsof the gricean maxims in the generation of referringexpressions. Cognitive Science, 19(2):233–263, 1995.
[4] D. DeVault, C. Rich, and C. L. Sidner. Naturallanguage generation and discourse context:Computing distractor sets from the focus stack. InProceedings of the 17th International Florida ArtificialIntelligence Research Society Conference (FLAIRS2004), pages 887–892, 2004.
[5] P. Ekman and W. Friesen. The Facial Action CodingSystem (FACS): A technique for the measurement offacial action. Consulting Psychologists Press, PaloAlto, CA, USA, 1978.
[6] M. Elhadad. FUF: the universal unifier user manualversion 5.0. Technical Report CUCS-038-91, 1991.
[7] J. Gratch and S. Marsella. A domain-independentframework for modeling emotion. Cognitive SystemsResearch, 5(4):269–306, 2004.
[8] S. Kopp, B. Krenn, S. Marsella, A. N. Marshall,C. Pelachaud, H. Pirker, K. R. Thorisson, and H. H.Vilhjalmsson. Towards a common framework formultimodal generation: The behavior markuplanguage. In IVA, pages 205–217, 2006.
[9] R. Lazarus. Emotion and Adaptation. OxfordUniversity Press, New York, NY, USA, 2000.
[10] J. Lee and S. Marsella. Nonverbal behavior generatorfor embodied conversational agents. In Proceedings ofthe 5th International Conference on Intelligent VirtualAgents, 2006.
[11] J. Lee, S. Marsella, J. Gratch, and B. Lance. Therickel gaze model: A window on the mind of a virtualhuman. In Proceedings of the 6th InternationalConference on Intelligent Virtual Agents, 2007.
[12] A. Mehrabian and J. A. Russell. An approach toenvironmental psychology. MIT Press, Cambridge,MA, USA; London, UK, 1974.
[13] E. Reiter and R. Dale. Building Natural LanguageGeneration Systems. Cambridge University Press, NewYork, NY, USA, 2000.
[14] J. Rickel, S. Marsella, J. Gratch, R. Hill, D. Traum,and W. Swartout. Toward a new generation of virtualhumans for interactive experiences. IEEE IntelligentSystems, 17:32–38, 2002.
[15] M. Schroder, L. Devillers, K. Karpouzis, J.-C. Martin,C. Pelachaud, C. Peter, H. Pirker, B. Schuller, J. Tao,and I. Wilson. What should a generic emotion markuplanguage be able to represent? In Proc. 2ndInternational Conference on A!ective Computing andIntelligent Interaction (ACII), pages 440–451, 2007.
[16] M. Stone, C. Doran, B. Webber, T. Bleam, andM. Palmer. Microplanning with communicativeintentions: the spud system. ComputationalIntelligence, 19(4):314–381, 2003.
[17] W. R. Swartout, J. Gratch, R. W. Hill, E. H. Hovy,S. Marsella, J. Rickel, and D. R. Traum. Towardvirtual humans. AI Magazine, 27(2):96–108, 2006.
[18] M. Thiebaux, A. Marshall, S. Marsella, andM. Kallmann. Smartbody: Behavior realization forembodied conversational agents. In Proceedings of the7th International Conference on Autonomous Agentsand Multiagent Systems, To appear.
[19] D. Traum, M. Fleischman, and E. Hovy. Nl generationfor virtual humans in a complex social environment.In Working Notes AAAI Spring Symposium onNatural Language Generation in Spoken and WrittenDialogue, March 2003.
[20] D. Traum, W. Swartout, S. Marsella, and J. Gratch.Virtual humans for non-team interaction training. InIn proceedings of the AAMAS Workshop on CreatingBonds with Embodied Conversational Agents, July2005.
[21] M. White, R. Rajkumar, and S. Martin. Towardsbroad coverage surface realization with ccg. In Proc.of the Workshop on Using Corpora for NLG:Language Generation and Machine Translation(UCNLG+MT), 2007.
The FML - APML language
Maurizio ManciniUniversity of Paris 8
140 rue de la Nouvelle France93100, Montreuil, France
Catherine PelachaudUniversity of Paris 8, INRIA
INRIA Rocquencourt, MiragesBP 105, 78153 Le Chesnay Cedex, France
1. INTRODUCTIONIn this paper we present a new version of the APML (Af-
fective Presentation Markup Language, [6]) representationlanguage, called FML-APML. This new version encompassesthe tags of APML as well as other tags related, for exam-ple, to world references and emotional state. The presentedlanguage has been developed in the Greta framework [12].Greta is an ECA (Embodied Conversational Agent) thatstarting from a representation of its communicative inten-tion, plans the verbal (speech) and nonverbal signals (fa-cial expressions, head movements, gestures) that must beproduced in order to convey it. We use the FML-APMLlanguage to model the agent’s communicative intention.
2. RELATED WORK: APMLAPML is an XML-based markup language for represent-
ing the agent’s communicative intention and the text to beuttered by the agent [6]. APML tags refer to the possibleinformation a person may seek to communicate: informationon the world, on the speaker’s mind and on the the speaker’sidentity. Based on the Poggi’s work [13], the APML lan-guage encodes the first and second types of information inECAs [6]. In APML, each tag corresponds to one of thecommunicative intentions described in [13], namely:
• certainty: this is used to specify the degree of certaintythe agent intends to express.
Possible values: certain, uncertain, certainly not, doubt.
• meta-cognitive: this is used to communicate the sourceof the agent’s beliefs.
Possible values: planning, thinking, remembering.
• performative: this represents the agent’s performative[1][14].
Possible values: implore, order, suggest, propose, warn,approve, praise, recognize, disagree, agree, criticize,accept, advice, confirm, incite, refuse, question, ask,inform, request, announce, beg, greet.
• theme/rheme: these represent the topic/comment ofconversation; that is, respectively, the part of the dis-course which is already known or new for the conver-sation’s participants.
• belief-relation: this corresponds to the metadiscour-sive goal, that is, the goal of stating the relationshipbetween di!erent parts of the discourse; it can be used
to indicate contradiction between two concepts or acause-e!ect link.
Possible values: gen-spec, cause-e!ect, solutionhood,suggestion, modifier, justification, contrast.
• turnallocation: this models the agent’s metaconversa-tional goals, that is, the agent’s intention to take orgive the conversation floor.
Possible values: take, give.
• a!ect : this represents the agent’s emotional state. Emo-tion labels are taken from the OCC model of emotion.
Possible values: anger, disgust, joy, distress, fear, sad-ness, surprise, embarrassment, happy-for, gloating, re-sentment, relief, jealousy, envy, sorry-for, hope, satis-faction, fear-confirmed, disappointment, pride, shame,reproach, liking, disliking, gratitude, gratification, re-morse, love, hate.
• emphasis: this is used to emphasize (that is, to con-vey its importance) what the agent communicates ei-ther vocally (by adding pitch accents to the synthe-sized agent’s speech) or through body movements (byraising the eyebrows, producing beat gestures, etc.).
Possible values: low, medium, high.
3. FML-APML OVERVIEWIn the SAIBA framework [8][17], the FML language en-
codes the agent’s communicative intentions. FML-APML isan evolution of APML and presents some similarities anddi!erences. The FML-APML tags are an extension of theones defined by APML, so all the communicative intentionsthat we can represent in APML are also present in FML-APML. We introduced the following changes in creatingFML-APML:
• Temporization of tags: APML tags have a nestingstructure imposed by the way in which the languageis defined. For example the top-level tag must alwaysbe a performative tag. The other tags, for examplethe one representing the agent’s certainty, must benested inside a performative:
<apml><performative type="inform"><rheme certainty="certain">I’m the Greta agent
</rheme>
</performative></apml>
The timing of these tags (i.e. the starting and ending ofa certain communicative intention) is inferred from theduration of text nested inside the tags. In the aboveexample, the performative, a!ect and certainty com-municative intentions have the same starting and du-ration time. It is not possible, for example, to extendthe three communicative intentions for a time slightlylonger than the spoken text.
In FML-APML each tag contains explicit timing data,similarly to BML tags. We also maintain coherencebetween the two languages defined inside the SAIBAframework. So, in FML-APML we can freely definethe starting and ending time of each tag, or make tagsreferring to each other using symbolic labels. This alsoallows us to specify tags that are not linked to any spo-ken text. That is, with FML-APML we can define thecommunicative intention of non-speaking agents: forexample we can represent the listener’s communica-tive intention (e.g. the listener can have the intentionto communicate that it is approving what the speakersays).
• Emotional state: we have extended the way in whichthe agent’s emotional state is coded. In the APML rep-resentation, we can only specify the actually expressedemotion. In FML-APML we can model more complexsituations, for example, if the speaker is feeling a cer-tain emotion but he hides it by showing another, fake,emotional state [10]. We base our extension on EARL[15].
• Information on the world: when communicating withothers, we could have the intention of communicatingsome physical or abstract properties of objects, per-sons, events. For example, we can accompany speechwith hand shapes that mimic the shape of an object,or perform large arm movements to give the idea of an“amazing” event. APML syntax allowed one to specifyonly some of these kinds of intentions, sometimes in atoo generic way. In APML the signal information waserroneously considered instead of the communicativeintention: for example, the deictic tag could be usedto explicitly perform deictic gestures. In FML-APML,we can specify that the agent is referring to an en-tity in the world, and eventually one of its properties.We leave the behavior planning system with the taskof deciding if, to refer to this entity, the agent has toperform a deictic gesture, mimic its property, etc.
In the next Sections we give an overview of the FML-APML syntax: we present the FML-APML tags; then wedescribe the tags’ attributes and temporization.
4. FML-APML TAGS: COMMON ATTRIBUTESAND SYNCHRONIZATION
FML-APML tags are used to model the agent’s commu-nicative intention. Each tag represents a communicative in-tention (to inform about something, to refer to a place, an
object or a person, to express an emotional state, etc.) thatlasts from a certain starting time, for a certain number ofseconds. The attributes common to all the FML-APML tagsare:
• name: the name of the tag, representing the commu-nicative intention modeled by the tag. For example,the name performative represents a performative com-municative intention [7].
• id : a unique identifier associated to the tag; it allowsone to refer to it in an unambiguous way.
• type: this attribute allows us to better specify the com-municative meaning of the tag. For example, a per-formative tag has many possible values for the typeattribute: implore, order, suggest, propose, warn, ap-prove, praise, etc.. Depending on both the tag name(performative) and type (one of the above values), ourBehavior Planning module determines the nonverbalbehaviors the agent has to perform.
• start : starting time of the tag, in seconds. Can beabsolute (time 0 corresponds to the start of the FML-APML file) or relative to another tag. It represents thepoint in time at which the communicative intentionmodeled by the tag begins.
• end : duration of the tag. Can be a numeric value (inseconds) relative to the beginning of the tag or a refer-ence to the beginning or end of another tag (or a math-ematical expression involving them). It represents theduration of the communicative intention modeled bythe tag.
• importance: a value between 0 and 1 which determinesthe probability that the communicative intention en-coded by the tag is communicated through nonverbalbehavior. We describe this attribute in detail in Sec-tion 5. It also modulates the number of modalitieson which the communication happens, as explained inSection 5.
The timing attributes start and end also allow us to modelthe synchronization of the FML-APML tags. They both canassume absolute or relative values. In the first case, the at-tributes are numeric non-negative values, considering time 0as the beginning of the FML-APML file. In the second casewe can specify the starting or ending time of other tags,or a mathematical operation involving them. Note that theoptional end attribute allows us to define communicative in-tentions that start at a certain point in time and last untilnew communicative intentions are defined. Here is an ex-ample of absolute and relative timings.
<FML-APML><tag1 id="id1" start="0" end="2"/><tag2 id="id2" start="2" end="3"/>
</FML-APML>
In the above FML-APML code, tag1 starts at time 0 andlasts 2 seconds; tag2 starts at time 2, and lasts 3 seconds.All the timings are absolute, that is, they are both relativeonly to the beginning of the actual FML-AMPL file (equiv-alent to time 0).
<FML-APML><tag3 id="id3" start="0" end="2"/><tag4 id="id4" start="t1:end+1"end="t1:end+3"/>
</FML-APML>
In this case, the first tag is the same as before. On the otherhand, tag2 has a relative timing as it starts as the first tagends and lasts for 3 seconds.FML-APML tags can be attached and synchronized to thetext spoken by the agent. This is modeled by including aspecial tag, called speech, in the FML-APML syntax. Withinthis tag, we write the text to be spoken along with synchro-nization points (called time markers) which can be referredto by the other FML-APML tags in the same file. For ex-ample:
<FML-APML><speech id="s1"><tm id="tm1"/>what are you
<tm id="tm2"/>doing
<tm id="tm3"/>here
<tm id="tm4"/></speech>
<tag3 id="id3" start="s1:tm2" end="s1:tm4"/></FML-APML>
With the above code, we specify that the communicativeintention of tag3 starts in correspondence with the worddoing and ends at the end of the word here.
5. FML-APML IMPORTANCE ATTRIBUTEWe say that a message is important if it has a particular
relevance to the Sender’s goals: if a message is importantwe want to be sure that it is delivered to the receiver. Thesame situation occurs with communicative intentions.
Not all the communicative intentions we communicate toothers have the same level of importance. Poggi et al. [14]note that, in the domain of goals (not necessarily commu-nicative goals) di!erent people may attribute a di!erent im-portance to the same goal. For example, generous people at-tribute high importance to the goal of being helpful towardothers; an independent person attributes high importance
to the goal of making choices freely and without the others’help. De Carolis et al. [3] show that in nonverbal discourseplanning the association of nonverbal signs to verbal infor-mation can be done by giving goals a priority. The conceptof urgency defined by Castelfranchi [5] seems to be relatedto importance: it is possible to sort the agent’s goals de-pending on their urgency, and choose to display those goalswhich have a higher urgency value. Importance is also citedby Theune [16]. She claims that gesture frequency has tobe increased if the speaker attaches a high importance tothe message being communicated. For their conversationalagents, Cassell et al. [4] choose to activate many modalitiesat the same time if the information importance is high. Forexample, information which is new or in contrast with re-spect to what has already been said is considered as having ahigher priority and thus more modalities are activated. Theimportance of body actions is also referred to by Nayak etal. in [9]. In this case the importance level is directly trans-lated into the priority of the corresponding body action, andhigher priority actions are chosen first during behavior gen-eration, while lower priority actions are discarded in case ofconflict.
In FML-APML we introduce an attribute, common toeach tag, called importance. Depending on its value, theagent may change the way the corresponding communica-tive intention is encoded. Similarly to the works proposedabove, in our system the FML-APML importance attributeallows us to sort the concurrent agent’s communicative in-tentions giving them a higher (resp. lower) priority if theirimportance is high (resp. low). We ensure that more im-portant communicative intentions are communicated first bythe agent while the least important intentions are eventuallycommunicated by free communication modalities. Then, weuse the same importance parameter to choose the multiplic-ity of multimodal behaviors. As the importance raises, weincrement the number of modalities on which the agent’sintentions are communicated. If, for example, importanceis low and the agent is giving the user directions to reacha particular place in the environment, it produces only aniconic gesture. If importance is very high, it adds redun-dancy: the agent produces a deictic eye gesture (looking atthe target in space), rotating the torso towards this position,while performing an iconic gesture.
6. EMOTION TAGEmotion has a central role in communication and ECAs
should be able to communicate their emotional state in or-der to increase e!ectiveness of interaction with humans. Inthe FML-APML language we have introduced the emotiontag, which models the speaker’s felt and expressed emotionalstate. The former is the emotional state the speaker is reallyexperiencing (which can be caused by an event, a person, asituation, etc.) while the latter is the one the speaker wantsto communicate to the others. These two emotional statescan be completely di!erent: for example, a person can pro-duce a “polite smile” to his superior even if he is angry athim. In general, people can show (expressed state is thefelt one), suppress (the felt state is expressed the less possi-ble), mask (the expressed state is di!erent from the felt one)their emotional state [11]. In FML-APML we model theserelations between felt and expressed emotional states by in-cluding the syntax of the EARL (Emotion Annotation andRepresentation Language) language, described in [15]. The
emotion tag allows us to specify complex emotional states,as reported in [2]. We can for example model situations inwhich our agent is feeling a particular emotional state butsimulates another emotion, hiding the felt one. This is doneby controlling the felt and expressed emotional states withthe regulation attribute of the emotion tag. The possiblevalues of the regulation attribute are:
• felt : this indicates that the tag refers to a felt emotion;
• fake: this indicates that the tag refers to a fake emo-tion, an emotion that the agent aims at simulating;
• inhibit : the emotion in the tag is felt by the agent butit aims at inhibiting it as much as possible;
Let us consider the following example:
<FML-APML><emotion id="e1" type="anger" regulation="felt"intensity="0.5" start="0" end="3"/><emotion id="e2" type="joy" regulation="fake"intensity="0.9" start="0" end="3"/>
</FML-APML>
The agent’s real emotional state is medium anger (the reg-ulation attribute of the emotion tag is set to felt ; intensityis 0.5, in a range going from 0 to 1) but it wants to hide itwith an intense fake happiness (the regulation attribute ofthe emotion tag is set to fake; intensity is 0.9).
7. WORLD TAGAs explained in [13], while communicating with others, we
seek to convey our knowledge about the world: objects andtheir characteristics (size, shape, location, etc.), events (realor abstract), places (relation, distance, etc.). Compared toAPML, the FML-APML language introduces a world tag toindicate such kind of communicative intention. The tag hasthe following attributes:
• ref type: the first attribute identifies the class of thereferenced world entity: an object, a place, a time, anevent. This attribute is required.
• ref id : is an identifier that we can use to specify oneor more world entities. This attribute is required.
• prop type (optional feature): allows us to refer to aproperty of the referenced entity: its shape, locationor duration.
• prop value (optional feature): describes the value ofthe property specified with the previous attribute.
So, in FML-APML we can refer to an object in the world ina generic way, for example, if we want to refer to a book:
<FML-APML><world id="w1" ref_type="object" ref_id="book"/>
</FML-APML>
Or, we can refer to the book which is on the table:
<FML-APML><world id="w1" ref_type="object" ref_id="book"prop_type="location" prop_value="table"/>
</FML-APML>
8. CONCLUSIONSIn this paper we present FML-APML, a language which
is used to model the communicative intention of an ECA. Itis an extension of the previously developed APML languageand it improves some of the APML weakness and missingfeatures. We propose FML-APML as an implementation ofthe FML language of the SAIBA framework.
9. REFERENCES[1] J. L. Austin. How to Do Things with Words. The
William James Lectures at Harvard University 1955.Oxford University Press, London, 1962.
[2] E. Bevacqua, M. Mancini, and R. Niewiadomski. Anexpressive eca showing complex emotions. In ArtificialIntelligence and the Simulation of Behaviour:Artificial and Ambient Intelligence, Newcastle,England, 2007.
[3] B. De Carolis, C. Pelachaud, and I. Poggi. Verbal andnonverbal discourse planning. In Workshop onAchieving Human-like Behaviors. Autonomous Agents,2000.
[4] J. Cassell and S. Prevost. Distribution of semanticfeatures across speech & gesture by humans andmachines. In Proceedings of the Integration of Gesturein Language and Speech., 1996.
[5] C. Castelfranchi. Reasons: Belief support and goaldynamics. Mathware & Soft Computing, 3:pp.233–247,1996.
[6] B. DeCarolis, C. Pelachaud, I. Poggi, andM. Steedman. APML, a mark-up language forbelievable behavior generation. In H. Prendinger andM. Ishizuka, editors, Life-Like Characters, CognitiveTechnologies, pages 65–86. Springer, 2004.
[7] S. Duncan. The dance of communication. InterimReports of the ZiF: Embodied Communication inHumans and Machines, 2006.
[8] S. Kopp, B. Krenn, S. Marsella, A. Marshall,C. Pelachaud, H. Pirker, K. Thorisson, andH. Vilhjalmsson. Towards a common framework formultimodal generation in ecas: the behavior markuplanguage. In Proceedings of the 6th InternationalConference on Intelligent Virtual Agents, 2006.
[9] V. Nayak. Emotional expressiveness through the bodylanguage of characters in interactive gameenvironments. PhD thesis, Media Arts and TechnologyUniversity of California, Santa Barbara, 2005.
[10] R. Niewiadomski and C. Pelachaud. Intelligentexpressions of emotions. In A!ective Computing andIntelligent Interaction, volume 4738 of Lecture Notesin Computer Science, pages 12–23. Springer, 2007.
[11] M. Ochs, R. Niewiadomski, C. Pelachaud, andD. Sadek. Intelligent expressions of emotions. InJ. Tao, T. Tan, and R. W. Picard, editors, A!ectiveComputing and Intelligent Interaction, FirstInternational Conference, volume 3784 of LectureNotes in Computer Science, pages 707–714. Springer,2005.
[12] C. Pelachaud. Multimodal expressive embodiedconversational agents. In MULTIMEDIA ’05:Proceedings of the 13th annual ACM internationalconference on Multimedia, pages 683–689, New York,NY, USA, 2005. ACM Press.
[13] I. Poggi. Mind, hands, face and body. A goal and beliefview of multimodal communication. Weidler, Berlin,2007.
[14] I. Poggi and C. Pelachaud. Performative facialexpressions in animated faces. In Embodiedconversational agents, pages 155–188. MIT Press,Cambridge, MA, USA, 2000.
[15] M. Schroder, H. Pirker, and M. Lamolle. Firstsuggestions for an emotion annotation andrepresentation language. In L. Devillers, J.-C. Martin,R. Cowie, E. Douglas-Cowie, and A. Batliner, editors,Proceedings of the International Conference onLanguage Resources and Evaluation: Workshop onCorpora for Research on Emotion and A!ect, pages88–92, Genova, Italy, 2006.
[16] M. Theune. ANGELICA: choice of output modality inan embodied agent. In International Workshop onInformation Presentation and Natural MultimodalDialogue, pages 89–93, Verona, Italy, 2001.
[17] H. Vilhjalmsson, N. Cantelmo, J. Cassell, N. E.Chafai, M. Kipp, S. Kopp, M. Mancini, S. Marsella,A. N. Marshall, C. Pelachaud, Z. Ruttkay, K. R.Thorisson, H. van Welbergen, and R. van der Werf.The behavior markup language: Recent developmentsand challenges. In 7th International Conference onIntelligent Virtual Agents, 2007.
!"#$%#"&'(%')(*+,'-.("'(#/,(!*01*(23%4,5&367(%')(
8&'9,:$,'-,9(;&3(2<=((((
(
!"#$%&'()**+&,'
HMI, University of Twente, The Netherlands
-".$%/0"1)*234*3145'
>(((0'#3&)$-#"&'(
673'89:;9'&0<.4,='"*&4>"'$.<'8%*)&*%.4?'9@34*?':4*34*%.4?';37&A%.<?'94%=&*%.4?'&"'
=&B.<' $&0*.<"'0.4*<%C)*%4@' *.' *73' $%4&5'C37&A%.<'.$'&'7)=&4.%>1'D4' *73'89:;9'23C'
E&@3?'&4>'%4'E)C5%0&*%.4"'"*3==%4@'$<.=''*73'%4%*%&*%A3?'*73'*2.'$%<"*'$&0*.<"'".=37.2''
&<3' 4.*' >3&5*'2%*7' $)<*73<?' !"#$%"& !"'& ()*#+!,-.'& #/& !"'& 0*'+.'*!(#-& 1##*2& ",3' C334'
"*&*3>' &5".' <3034*5,' ' FG?' HI1' :4' *73' 89:;9' $<&=32.<+?' *73' $.55.2%4@' *7<33' =&B.<'
E<.03""%4@'"*&@3"'&4>'0.<<3"E.4>%4@'=&B.<'=.>)53"'&<3'%>34*%$%3>'FJIK''
G1 L5&44%4@'.$'&'0.==)4%0&*%A3'%4*34*'
M1 L5&44%4@'.$'=)5*%=.>&5'C37&A%.<"'*7&*'0&<<,'.)*'*7%"'%4*34*'
N1 (3&5%-&*%.4'.$'*73'E5&443>'C37&A%.<"'
'
673<3' 7&"' C334' "*3E"' =&>3' *.2&<>"' "E30%$,%4@' *73' %4*3<$&03' ;OP' Q;37&A%.)<'
O&<+)E'P&4@)&@3R'C3*2334'"*&@3'M'&4>'N?'&4>'*73'*.E%0'.$'*7%"'E&E3<'Q&4>'2.<+"7.ER'
%"' *.'=&+3' *73' $%<"*' "*3E' *.' E<.E."3' &' "%=%5&<' 5&4@)&@3' C3*2334' "*&@3"' G' &4>' M1' :4'
0.4430*%.4'2%*7';OP'%*'7&"'*)<43>'.)*'C,'&'05."3<'5..+'*7&*'*73<3'&<3'"3A3<&5'0<%*%0&5'
%"")3"'27%07'%"'7&<>'*.'%40.<E.<&*3'%4*.'3%*73<'*73'C37&A%.)<'E5&44%4@'.<'*73'C37&A%.)<'
<3&5%-&*%.4'"*&@31'S.<' %4"*&403?'&0*)&5'07&<&0*3<%"*%0"'.$' *73'34A%<.4=34*' *73'&@34*' %"'
&0*%4@' %4' QA%"%C%5%*,?' 07&<&0*3<%"*%0' .$' *73' @<.)4>?' 5.0&*%.4' .$' =.A%4@' .CB30*"R' =&,'
7&A3'&4'%4$5)3403'.4'*73'>3*&%5"'.<'*73'07.%03'.$'*73'C37&A%.)<'*.'C3'<3&5%-3>1':*'7&"'
4''-& 3$%%'3!'5& !",!& /''54,.6& /+#)& !"'& 07#+152& (3& -''5'5& !#& 4'& ,41'& !#& *1&4' 03<*&%4'
C37&A%.)<"K' 31@1' %$' %*' %"' =%"*,?' )"3' C%@' 7&4>' 2&A3"' &4>' 5.)>' A.%03' Q"3530*%.4' .$'
=.>&5%*,?'&=E5%*)>3R'?'%$'*73'*3<<&%4'%"'A3<,'<.)@7?'.43'0&44.*''<)4'A3<,'$&"*'Q*%=%4@'.$'
C37&A%.)<R1' DCA%.)"5,?' %$' .43' 7&"' &' ".E7%"*%0&*3>' <3&5%-3<?' 0&E&C53' *.' *&+3' 0&<3' .$'
E7,"%0&5' C&5&40%4@' &4>' "%=)5&*%.4?' *73<3' %"' 4.' 433>' *.' "E30%$,' *73' C37&A%.)<' 2%*7'
<3"E30*'*.'*73'")<$&03'0.4>%*%.4"?'&"'*73'<3&5%-3<'2%55' *&+3'0&<3'.$'*7%"'&"E30*1'D4'*73'
.*73<'7&4>?''$.<'&'<3&5%-3<'2%*7.)*'")07'&'$3&*)<3?'*73'E5&443<'7&"'*.'E<.A%>3'&"'=)07'
>3*&%5'&C.)*'*73'C37&A%.)<'*.'C3'<3&5%-3>'&"'E.""%C53?'&4>'*73'<3&5%-3<'2%55'B)"*'5%*3<&55,'
<3&5%-3' *73' "E30%$%3>' C37&A%.)<1' ' T3403?' &"' "..4' &"' 23' >.' 4.*' "3**53' 27&*' *73'
<3"E.4"%C%5%*%3"'.$'*73'%4A.5A3>'*2.'=.>)53"'&<3?'23'.E34'*73'"E&03'8'&4>'433>'$.<'8'
>%$$3<34*' A&<%&4*"1' :4' *73' 0&"3' .$' ;OP?' *7%"' 53>' *.' *73' 4.*%.4' .$' 53A35' .$' >3*&%5'
"E30%$%0&*%.4"?' &4>' *73' >%$$3<34*%&*%.4' C3*2334' 0.<3' &4>' E<.E<%3*&<,' 5&4@)&@3'
353=34*"1'833'S%@)<3'G1'
'
''
'
2"+$3,(>1'The core BML specification of a behavior can be further refined through
greater levels of description, while namespaces can provide general extensions. From [6]. '
O.<3.A3<?'%*'2&"'4.*%03>'*7&*'*73'$33>C&0+'$<.='<3&5%-&*%.4'*.'C37&A%.)<'E5&44%4@'
%"' 3""34*%&5?' *.' +4.2' %$' *73' C37&A%.)<' 0.)5>' C3' <3&5%-3>' &"' E5&443>?' .<' 2%*7' ".=3'
=.>%$%0&*%.4'&*' &551' :*'2&"'&5".'4.*3>' *7&*' ".=3273<3'&4')E>&*3'.$' *73' $<&=3'.$' *73'
2.<5>'%"'*.'C3'*&+34'0&<3'.$1'S.<'%4"*&403?'%4'*73'S3&<'U.*'","*3=?'*73'3$$30*'.$'&'E)"7'
=&,' C3' *7&*' *73' .*73<' &@34*' 7&"' $&5534?' .<' *73' "35$' &@34*' @.*' 7)<*1'V734' >%"0)""%4@'
")07'%"")3"'<35&*3>'*.'C37&A%.)<"'&*'*73';OP'2.<+"7.E"?'&'$33>C&0+'=307&4%"='2&"'
35&C.<&*3>'.4?'&4>'&5".?'''*73'<.53'.$'%4*34*'E5&44%4@'2&"'<&%"3>1'
V3'C35%3A3'*7&*'%*'%"'4303""&<,'*.'%>34*%$,'*73'0&"*'.$'<.53"'.$'=.>)53"'&4>'$&0*.<"'&*'
*73'A3<,'C3@%44%4@'.$'*73'2.<+'"*&<*%4@'.4'SOP'>3"%@41':4'*7%"'"7.<*'E&E3<'23'2.)5>'
1(6'&!#&*#(-!&#$!&/,.!#+3&#/&!"'&9(!$,!(#-&,-5&!"'&,.!(-%&:%'-!23&.",+,.!'+(3!(.3&7"(."&
>.' 7&A3' &4' %4$5)3403' .4' *73' C37&A%.)<1' 673' =&B.<' W)3"*%.4' %"?' 7.2' *.' %40.<E.<&*3'
*73"3'$&0*.<"'%4*.'*73'89:;9'$<&=32.<+1'L&<*%0)5&<5,?'7.2'*73,'"7.)5>'C3'>%"*<%C)*3>'
&=.4@'*73'=&B.<'E<.03""%4@'=.>)53"X'9"'&'0.4"3W)3403?'27&*'&<3'*73'<3W)%<3=34*"'
E."3>'$.<'*73'SOP'5&4@)&@3X''
V3' &>><3""' *73' 8%*)&*%.4' &4>'9@34*' <35&*3>' %"")3"' %4' *73' 0.=%4@' M' 07&E*3<"1' :4'
Y7&E*3<'Z?'23'@%A3'0.40<3*3'3[&=E53"?'2%*7'*73'%4*34*%.4'*7&*'*73,'2%55'"3<A3'&"'0&"3"'
.$'>%"0)""%.41''S%4&55,'23'")=')E'W)3"*%.4"'&4>'<30.==34>&*%.4"'<35&*3>'*.'*73'SOP'
"E30%$%0&*%.41'
?((@/,(*+,'#((
:4'0.4430*%.4'2%*7'E5&44%4@'0.==)4%0&*%A3'$)40*%.4'&4>'C37&A%.)<?'*73'$.55.2%4@'
07&<&0*3<%"*%0"'.$'*73'&0*%4@'&@34*'&<3'<353A&4*K'
! A,3-,B#"&'(-%B%C"D"#",9E(;,-&!"'&"$),-#(5&03''2<& 0"',+2<&),=&4'&3'-3'&
73&*X''S<.='*73'E.%4*'.$'A%32'.$'&'7%@75,'=.>)5&<'&<07%*30*)<3?''&4>'$<.='
*73' 3$$%0%340,' .$' %=E53=34*&*%.4?' %*' %"' <353A&4*' 7.2' *73' E3<03E*%.4' %"'
<3&5%"3>K'31@1'%$'"34"%4@'C,'A%"%.4'%"'"%=)5&*3>'A%"%.4'5%+3'%4'FZI?'.<'%$'*73'
03,7-&*#+!(#-2&#/&!"'&7#+15&!"'&,%'-!&(3&,.!(-%&(-&(3&5'+(>'5&4=&#!"'+<&-#-&
A%"%.4\C&"3>'=3&4"1''
! F1&)"D.G(%-#"&'(-%B%C"D"#",9E(V7&*'&<3'*73'*<&>%*%.4&5'.)*E)*'=.>&5%*%3"'
.$' *73' 7)=&4.%>K' 0&4' 73' *&5+?' =.A3' &<.)4>X' V7&*' .*73<' =3&4"' Q.$'
5.0.=.*%.4?'&)@=34*3>'0.==)4%0&*%.4R'>.3"'*73'&@34*'7&A3X'V7&*'%"'%*"'
E7,"%0&5'"*&*3?'27%07'=&,'%4$5)3403'*73"3'C.>%5,'0&E&C%5%*%3"X']1@1'7&4>"'
$)55?'.<'3[7&)"*3>X'
! H'&5D,)+,( &;( #/,( 5&3D)E( ?"'& -'.'33(!=& !#& ",>'& 3#)'& 07#+15&
6-#71'5%'2&/#+&,%'-.=&(3&:@&.#))#-&3'-3'A&B#+&!"'&3,6'&#/&3#.(,1&,-5&
0.==)4%0&*%A3' C37&A%.)<?' ".=3' +4.253>@3' .$' *73' E7,"%0&5'
07&<&0*3<%"*%0"' &4>' ".0%&5' E<.*.0.5"' .$' 0.==)4%0&*%.4' &<3' 433>3>1' ]1@1'
$<.='27&*'&'>%"*&403'0&4'.43'C3'73&<>?'27&*'%"'*73'>%"*&403'*.'"*&4>'%4'
$<.4*'.$'&'E3<".4'*.'07&*'2%*7'7%=?'27&*'%"'".0%&5'"*&*)"'.$'*73'E3<".4'*.'
C3'@<33*3>?'3*01'
! !"#$%&'( )*#$%)%+,( 673' %>34*%*,' .$' *73' &@34*?' %4' *3<="' .$' &@3?' @34>3<?'
".0%&5'"*&*)"?'E3<".4&5%*,^'C)*'&5".'07&4@%4@'$&0*.<"'&"'=..>?'E7,"%0&5'.<'
=34*&5' "*&*3?' &"' 2355' &"' &' E3<".4&5' 7%"*.<,' &4>' ' E&"*' &$$&%<"' 2%*7' &4>'
+'1,!(#-3"(*&!#&&!"'&(-!'+1#.$!#+&,11&0.,-&4'&3''-2&(-&7",!&,-5&"#7&*'#*1'&
&<3' >.%4@?' 2734' 0.==)4%0&*%4@' 2%*7' 3&07' .*73<1' T3403' *73"3' $&0*.<"'
=)"*'C3'&A&%5&C53'*.'*)43'*73'%4*34*'&4>'C37&A%.)<'.$'7)=&4.%>"1'
'
I(@/,(!"#$%#"&'(
673'7)=&4.%>' %"' 0.==)4%0&*%.4'2%*7'7%"' <3&5' .<' A%<*)&5' %4*3<5.0)*.<' %4' &' <3&5' .<'
A%<*)&5'2.<5>'"%*)&*%.41'673'2.<5>'"%*)&*%.4'=&,'C3'07&<&0*3<%"3>'&"'&'*,E3'Q31@1'.E34'
E)C5%0' "E&03?' ' ' +'3!,$+,-!<& & #//(.'<&CDA& 9#)'& %'-'+,1& 6-#71'5%'& ,4#$!& ,.!#+3& ,-5&
&0*%A%*%3"' %4' *73"3' "%*)&*%.4"'=&,'C3')"3>?'C)*'0.40<3*3'E&<&=3*3<"' Q31@1' 5.0&*%.4R'.$'
"E30%$%0'E&<*%0%E&4*"'=&,'C3'.$'%4*3<3"*'*..1'''673'$.55.2%4@'&"E30*"'.$'*73'"%*)&*%.4'&<3'
<353A&4*K'
! J"9"C"D"#.(%')(%$)"C"D"#.(C,#5,,'(#/,(/$4%'&")(%')(#/,("',#3D&-$#&3((
V73<3' %"' *73' &>><3""' &4>' *.E%0'.$' ' 0.==)4%0&*%.4X' ' :"' 73_"73' A%"%C53X''
V7&*' &<3' *73' .C"*&053"' .$' *73' 2.<5>X' T.2' %"' *73' *%=3' .$' *73' >&,?'
5%@7*%4@X'T.2'%"'4.%"3X':"'&'<3$3<<3>'.CB30*'A%"%C53'$.<'C.*7'*73'7)=&4.%>'
&4>'7%"'&>><3""33X'
! K#/,3( B/.9"-%D( -"3-$49#%'-,9( &;( #/,( 9"#$%#"&'( ( 673' @<.)4>' *73'
7)=&4.%>' %"' *.' 4&A%@&*3' .4?' 7.2' 0<.2>3>' *73' 34A%<.4=34*' %"' *.'=.A3'
,+#$-5<&"#7&04$3=2&E!",!&(3<&5=-,)(.,11=&.",-%(-%D&!73'34A%<.4=34*'%"X''
! @/,( 9&-"%D( %9B,-#9( &;( #/,( 9"#$%#"&'( V7&*' %"' *73' 0)<<34*' 5.0&*%.4'
Q.$$%0%&5\E<%A&*3?'05."3>'8'.E34R?'27&*'%"'*73'"*&@3'.$'*73'3A34*'@.%4@'.4'
E'A%A& 3#)'4#5=& *+'3'-!(-%<& ,& .'+')#-=& %#(-%& #-<& CD?' 27&*' %"' *73'
E3<".4&5'&4>'$.<=&5'<35&*%.4"7%E'C3*2334'*73'0.4A3<"&4*"X''
L(8%9,(9#$)",9(&;(+3,,#"'+(
;35.2'23'*&+3'".=3'3[&=E53"'.$'@<33*%4@"')4>3<'*73'5..E?'&4>'%>34*%$,'7.2'".=3'
.$'*73''&$.<3=34*%.43>'$&0*.<"'%4$5)3403'*73'%4*34*'&4>_.<'C37&A%.)<'E5&44%4@'"*&@3"1'
'
:4'.)<'3[&=E53'23'&"")=3'&'A%<*)&5'<3"*&)<&4*'2%*7'&'2&%*3<'QVR'&4>'@)3"*"'Q`G?'
FG<&CD1'D)<'7)=&4.%>'T'.$'%4*3<3"*' %"'&'E<.$3"".<?'&4>'73' %"'"%**%4@'%4'7%"'$&A.)<%*3'
<3"*&)<&4*1'T3'7&"'&4'&EE.%4*=34*'2%*7'&'$<%34>'S?'73'%"'0730+%4@'%$'S'%"'&5<3&>,'*73<3?'
C)*' 0&44.*' "33'7%=1'T3' 4.*%03"' *7.)@7' *7&*' .43'.$' *73'@)3"*"?'`G' %"' &' 0.553&@)3' 73'
433>"'*.'*&5+'*.1'`G'%"'"%**%4@'2%*7'7%"'C&0+'*.'T?'34@&@3>'%4'>33E'0.4A3<"&*%.4'2%*7'
*73'.*73<'E3<".4'`M'&*'7%"'*&C531'T'>30%>3"'*.'2&%*'$.<'*73'<%@7*'=.=34*'*.'&EE<.&07'
`G1''O3&427%53'".=3'432'E3.E53'`N''&4>'`Z'&<<%A31'`N'%"'&4'3[\0.553&@)3''27.='T'
>.3"' 4.*' 5%+3?' `Z' %"' &' "*)>34*' 73' +4.2"' C,' *73' $&031' 673' 2&%*3<' V?' &' @..>'
&0W)&%4*&403'.$'T?'"7.2"')E'*.'*&+3'7%"'.<>3<1'''
'
:4'*7%"'"0343?'23'3[E30*'*73'$.55.2%4@'@<33*%4@"'*.'*&+3'E5&03?'%4'&4&5.@,'*.'&'<3&5'
5%$3'"%*)&*%.4K'
G1 T'%"'@&-%4@'&<.)4>?'3A34'0730+%4@'".=3'<3=.*3'0.<43<"'C,'"*&4>%4@')E?''
*.'$%4>'&4>'@<33*'S?'C)*'&"'73'@3*"'0.4A%403>'*7&*'S'%"'4.*'*73<3?'>30%>3"'*.'
"%*'>.24'&4>'2&%*1'
M1 T'"%*"'%4'&'E."%*%.4'*.'C3'&C53'*.'"33'27&*'`G'%"'>.%4@?'&"'7%"'%4*34*%.4'%"'*.'
@<33*'7%='2734'*73'<%@7*'=.=34*'&<<%A3"1'
N1 673'2&%*3<'V'@<33*"'T?'%4'&4'%4$.<=&5'2&,?'C)*'.45,'%$'73'7&"'4.*'>.43'%*'
3&<5%3<'&5<3&>,1'
Z1 9'@<33*"'T'2%*7'&'"=%53'&4>'C.2?'C)*'T'&A.%>"'3,3'0.4*&0*"'2%*7'`N?'&4>'
E<3*34>"'*7&*'73'7&"'4.*'4.*%03>'7%=1'''
H1 `Z'@<33*"'T'E.5%*35,?'&4>'T''4.>"'C&0+'*.'7%"'"*)>34*1'
'
S<.=' &4' %4*34*' E.%4*' .$' A%32?' 23' 7&A3' 0&"3"' .$' ' Q4.*' *.R' 0%+''!2& ,3& (-!'-!1' 673'
$.55.2%4@'&"E30*"'%4$5)3403'27&*'@<33*%4@'2%55'*&+3'E5&03'%4'3&07'0&"3K'
'
G1 A,39&'%D"#.( &;(ME(T' %"' &' B.A%&5' E3<".4?' ".' 73' )")&55,' @<33*"' E3.E53' 73'
+4.2"?'&4>'<3*)<4"'@<33*%4@"?'3A34'C,')4+.24"1' '67%"' %"' 5%+3'&' <3$53[K' %$'
".=3C.>,'4.>"'&*'7%=?'73'4.>"'C&0+'
M1 1&)"D.( -%B%C"D"#",9( &;(ME(T' %"' <&*73<' 0.<E)534*?' ".' 73' >.3"' 4.*' 5%+3' *.'
=.A3''&<.)4>'=)07?'C)*'73')"3"'7%"'A.%03'&4>'$&031'
N1 @/,(B/.9"-%D(9"#$%#"&'E'9"'T'%"'.)*'.$'"%@7*'.$'`G?'T'0&4'3%*73<'"7.)*'&*'
7%=' .<' 2&5+' )E' *.' 7%=1' T.23A3<?' `G' "%*"' *..' $&<' $<.=' 7%=' *.' C3''
&>><3""3>'C,''"E3307?'".' 'T'0730+"'7.2'73'0.)5>'2&5+')E'*.'7%='%4'*73'
<&*73<'0<.2>3>'E5&031''
Z1 @/,(9&-"%D(%9B,-#(&;(#/,(9"#$%#"&'E(:*'%"'4.*'E.5%*3?'&4>'4.*'E<.=%"%4@'*.'
%4*3<<)E*'&'0.4A3<"&*%.4'*73'.*73<'E3<".4'%"'7%@75,'%4A.5A3>'%41''
H1 N,D%#"&'9/"B(C,#5,,'(#/,("'#,3D&-$#&39E(T'>.3"'4.*'*&+3'*73'%4%*%&*%A3'*.'
@<33*' `Z?' 43%*73<' V' 4.<' `N1' T.23A3<?' 73' <3*)<4"' *73%<' @<33*%4@"' %4'
>%$$3<34*'2&,"K'87&+%4@'7&4>"'2%*7'V' %4'&' B.A%&5'2&,?'4.>>%4@'C&0+' *.'
`Z?'C)*'7%>%4@'7%="35$'C37%4>' *73'=34)'0&<>' *.'&A.%>'@&-3'0.4*&0*'2%*7'
`N1'
'
V73<3?'.4'27&*'53A35Q"R'*.'*&+3'0&<3'.$'*73"3'>%"*%40*'E&<&=3*3<"X'D43'3[*<3=3'
&EE<.&07'%"'*.'>3&5'2%*7'3A3<,*7%4@'.4'*73'0.@4%*%A3'53A35?'&4>'0<3&*3'0.==)4%0&*%A3'
(-!'-!&0%+''!2&#/&1#7&%+,-$1,+(!=<&7(!"&*,+,)'!'+3'%4'SOP'")07'&"K'
'!"#$%&!'())*&+,)-.)(!"/012345"#$%&'($!"/012346"&&+,)-.)(7,)(+89-$:*;<")*+,-."&+,)-.)(7=8>;<".-/0"&$%&'($1&(2%$3456738!"9:,;<="&>:+*-9?)<">.*?;"&)9@:(89#)9*798:+)<"@;=,A@"&B&%&CD(6E&38$345#2(F(2!"G"#$H8(!"@A:@."IJJJ&!A"#$%&
'
D43'=&,'&<@)3'*7&*'*73'&C.A3'E&<&=3*3<"'.45,'%4$5)3403'7.2'*73'%4*34*'.$'@<33*%4@''%"'
*.'C3'<3&5%-3>'%4'*3<="'.$'C.>%5,'C37&A%.)<?'".'*73,'"7.)5>'C3'*&+34'0&<3'.$'.4'*73'
C37&A%.)<'E5&44%4@'53A351'673'C37&A%.)<'E5&443<'*734'%"'%4$.<=3>'&C.)*'*73'@<33*%4@''
%4*34*?'2%*7'E&<&=3*3<"K''!"#$%&C'&(($#68(%K(&!"L;:?*<G"#$%&'($!"L;:?*<M"#24E%$345!":;?N-A:-<NG"&!A"#$%&
'
6734' *73';37&A%.)<'L5&443<'7&"'&003""' *.'>3*&%5"'.4'/012345B& '/012346?' *73%<'
3[&0*' 5.0&*%.4' %4' 102CDE1D4C5?' &4>' 7&"' *73' +4.253>@3' &C.)*' 0%<0)="*&403"' 5%+3'
4.%"3' &4>' &<<&4@3=34*' .$' *&C53"' %4' 102CDE1D4C51' 67%"' =&,' 53&>' *.' &' 0.=E53['
@<33*%4@'C37&A%.)<?'0.4"%"*%4@'.$'C37&A%.)<'"*3E"'2&%*'$.<'*)<4'.EE.<*)4%*,?'&EE<.&07'
/012345?' 2&%*' $.<' @&-3' 0.4*&0*' 2%*7' /012345?' &4>' %4%*%&*3' 7&4>' "7&+3' 2%*7'
/0123451'
O(((!$44%3.(&;(-/%DD,'+,9((
673'%"")3"'>%"0)""3>'".'$&<'0&4'C3'")==&<%-3>'&"'&'5%"*'.$'07&5534@3"?'W)3"*%.4"'*.'
C3' >30%>3>' )E.4' &' E<%.<%' *.' "3**%4@' .)*' 2%*7' *73' >3"%@4' .$' SOP1' 673,' %4A.5A3' *73'
5&4@)&@3'.$'SOP?'*73'$33>C&0+'C3*2334' *73'=.>)53"?'&4>' *73'E5&03=34*'&4>'&003""'
.$'+4.253>@3'2%*7%4'.<'.)*"%>3'*73'*7<33'=.>)53"'.$'S%@)<3'G'%4'FJI1'''
'
G1 V7&*' *,E3' .$' %4*34*"' &<3' 23' 5..+%4@' &*X' D45,' 0.==)4%0&*%A3?' .<' &5".'
5.0.=.*%A3?'=&4%E)5&*%A3' &4>' .E3<&*%.4&5X'94'&'#&D&+.(&;( "'#,'#9( %"' *.'
C3'>3A35.E3>?'27%07'=&,'C3'3[*34>3>1'''
M1 V7&*'&<3'C&"%0?'353=34*&<,'%4*34*')4%*"X'a.'23'=.>35'3&$#"',(&3(3,;D,PQ
D"6,(C,/%R"&39(C%9,)(&'("'#,'#(Q31@1'@3**%4@'%4$.<=&*%.4'&C.)*'*73'2.<5>'
C,'A%"%.4RX':$',3"?'23'34>')E'2%*7'&'A3<C."3'>3"0<%E*%.4'.$' Q&5*3<4&*%A3"'
.$R' %4*34*"' &4>' 0.=E)*&*%.4&55,' "5.2?' $33>C&0+' $<.=' *73' ;37&A%.)<'
H',1(I,!(#-A&@/&-#!<&7"'+'&,+'&!"'3'&0(51'&(-!'-!32&5',1!&7(!"J'
N1 T.2'*.'=.>35'&4>'4%"'#%"'' !"'&01(/'\!()'2&#/&(-!'-!3?'*73%<'0.\3[%"*3403?'
*73%<'E<%.<%*%3"X' '94'3A34*'.$' *73'2.<5>'=&,'=&+3'&4' %4*34*'.C".53*3?'.<'
5.23<'%*"'E<%.<%*,1''
Z1 :4'*73'89:;9'$<&=32.<+'*73<3'%"'4.*7%4@'&"")=3>'&C.)*'*73''%#$3,(%')(
;&34( &;( 6'&5D,)+,( -&'-,3'"'+( #/,( F-/%'+"'+G( 5&3D)! *73' C37&A%.<' %"'
*&+%4@' E5&03' %41' ' 8&=3' %"' *<)3' &C.)*' *73' "),'#"#.( F%')( &#/,3(
-/%3%-#,3"9#"-9G( &;( #/,( /$4%'&")9( %4' *73' Q<3&5' .<' E7,"%0&5R' 2.<5>1'
T.23A3<?' =&4,' &"E30*"' .$' *73' 2.<5>' >.' =&**3<?' &"' E<30.4>%*%.4"' .<'
"3530*%.4' 0<%*3<%&' $.<' E5&44%4@' &4' %4*34*1' V73<3' &4>' %4' 27&*' $.<=&*' *.'
"*.<3?'&4>'7.2'*.'@3*'&003""'*.'<353A&4*'2.<5>'%4$.<=&*%.4X'''
H1 T.2' %"' *73' 89:;9' $<&=32.<+?' &4>' E&<*%0)5&<5,?' SOP?' <35&*3' *.' &#/,3(
-&+'"#"R,( %3-/"#,-#$3,9?' ")07' &"' FM?' NIX'V7&*' 0&4'23' 53&<4' $<.=' *73"3'
,**1(.,!(#-3J& ;,-& !"'3'& 4'& $3'5& !#& #$!1(-'& ,& 0.#+'& +'K$(+')'-!32& ,3& #/&
3[E<3""%A%*,' .$' SOPX' ' V7&*' "7.)5>' C3' *73' <35&*%.4"7%E' *.' ,4&#"&'(
4%36$B'FbIX'
J1 L&<*%0)5&<5,?' 7.2' *.( "'#,3B3,#( D&5QD,R,D( ;,,)C%-6( $<.=' *73' C37&A%.<'
<3&5%-3<' *.' @%A3' )E' &4' %4*34*?' &4>' C&"3>' .4' *73' %4$.<=&*%.4' @343<&*3'
&4.*73<?'>%$$3<34*'.43X''
S(((N,-&44,')%#"&'9((
;35.2':'E)*'$.<2&<>'=,'%>3&"'7.2'*.'>3&5'2%*7'".=3'.$'*73'%"")3"'<&%"3>1'
G1 P3*2"'5%"*'*73'"#$%&'()%*'&4>'%+)(*("+%,!-%.('*'.$')"%4@_0<3&*%4@'&4'SOP1':'
*7%4+'.$'=3<%*"'&"K''
&1 &55.2%4@' "*,53>_%4>%A%>)&5' =)5*%=.>&5' C37&A%.<' 3[E<3""%4@'
$)40*%.4"?'>3"%@4%4@?'>3$%4%4@'&4>'=&,'C3?''
C1 &4&5,-%4@' C37&A%.<' $.<' 7)=&4.%>"' .4' &' $)40*%.4&5' 53A35?' %4' &'
)4%$.<='5&4@)&@3'
M1 P3*2"'&>.E*_>3A35.E'&4'"+'"/"01!"2!23+&'("+*'Q%4*34*"R?'2%*7'0&*3@.<%3"'5%+3'
C%.5.@%0&5' $)40*%.4"' Q31@' ' 4.*' *.' @3*' "*%$$?' *.' @&*73<' %4$.<=&*%.4R?'
0.==)4%0&*%A3' $)40*%.4"' 2%*7' ")C\0&*3@.<%3"' &"' E<3"34*%4@?' 0.4A3<"%4@'
2%*7'&'"%4@53'%4*3<5.0)*.<''.<'2%*7'&'A&<,%4@'4)=C3<?'$)40*%.4"'<35&*3>'*.'
>.=&%4"'5%+3'QA3<C&5R'*)*.<%4@?'0.4*<.55%4@'31@1'&4'.<073"*<&'.<'*<&$$%01''
N1 9'Y.<3'SOP?'2%*7'&'0.<<3"E.4>%4@'23+&'("+4.1'Q&4&5.@'*.'0%*'34.1R'=&,'
C3'>3"%@43>'$.<'0.==)4%0&*%A3'$)40*%.4"1'a.=&%4\"E30%$%0'0.5530*%.4"'.$'
$)40*%.4"'=&,'C3'0E5)@@3>''%42?'*.'34<%07'C37&A%.<1'
Z1 :4'.)<'.4*.5.@,'53*2"'C3'0.4"0%.)"'&C.)*'*73''(-%!&4>',3.4'("+4/!4*5%&''.$'
*73'$)40*%.4"?'C.*7'2734'>3"%@4%4@'*73'>3"0<%E*%.4'.$'$)40*%.4"'&4>'2734'
E<.03""%4@' *73=1' ']1@1'3[E<3""%4@'&' "*&*3'&"' 5.4@'&"'4.'3[E5%0%*' ' 034>'.$'
"*&*32' %4*34*' %"' @%A34' Q")07' &"' 3[E<3""%4@' &4' 3=.*%.4&5' "*&*3R?' A3<")"'
3[E<3""%4@' &4' %4*34*' &"' 5.4@' &"' *73' @.&5' %"' 4.*' &07%3A3>?' C)*' 2%*7' &'
=&[%=)=' >)<&*%.4' Q31@1' @3**%4@' *73' $5..<' %4' 0.4A3<"&*%.4R1' :4' *73' 5&**3<'
0&"3?' =307&4%"="' *.' 0730+' *73' $)5$%55=34*' .$' *73' %4*34*?' &4>' E<.A%>%4@'
$33>C&0+'&C.)*'%*'7&"'*.'C3'>3A%"3>1'
H1 :4'*73'>3$%4%*%.4'.$'$)40*%.4"?'5.%&"+,('("+*!4+,!&"+*%63%+&%*?'&"'2355'&"'
E.""%C5,?'>30.=E."%*%.4'%4*.'=.<3'<3$%43>'$)40*%.4"'"7.)5>'C3'>3&5*'2%*71'
S.<' %4"*&403?' &' E<30.4>%*%.4' $.<' ' 0@<33*' $<.=' 05."32' =&,' C3' E<.[%=%*,?'
2%*7' &' <3&5%-&*%.4' .$' 4.>?' .<' "=%53?' &4>_.<' ".=3' A3<C&5' @<33*%4@?' 27%53'
0@<33*' $<.=' &' >%"*&4032' &"")=3"' &' >%"*&4*' C)*' A%"%C53' .*73<' E3<".41' :*'
"7.)5>'C3'>30%>3>'273<3' &4>'2734' *7%"' 0.4>%*%.4"' &<3' 0730+3>1'8%=%5&<'
0.4"%>3<&*%.4"' &EE5,' $.<' &"")=3>' &4>' 3[E30*3>' 0.4"3W)3403"' .$' &'
$)40*%.4'*.'C3'073+3>'.<'2&%*%4@'*.'C3'*<)3'Q31@1'*73'@<33*%4@'<3*)<43>R1'
J1 S%4&55,?' %*' =&,' C3' )"3$)5' *.' &4&5,"3?' $.<=&55,' .<' %4$.<=&55,?' 7.2' %4'
=3>%&*3>' 2.<5>"' 0.=E53[' $)40*%.4"' &4>' %*"' 0.4"3W)3403"' &<3'
0.==)4%0&*3>1' S.<' %4"*&403?' @<33*%4@' "33="' *.' C3' .4' *73' .43' 7&4>' &4'
3A3<,\>&,'&4>'7%@75,'<.)*%43'&0*?'&4>?'.4'*73'.*73<'7&4>?'2%*7'&'=)5*%*)>3'
.$'A&<%&4*"'&4>'")C*53*%3"1'T.2'&<3'@<33*%4@"'3[E<3""3>'&4>'<3*)<43>'31@1'
%4'830.4>'P%$3X'V7&*'%"'*73'"*&*3'.$'*73'&<*?'&"'.$'"%*)&*3>'3[E<3""%A%*,X''
'
*-6'&5D,)+,4,'#9(
67%"'<3"3&<07'7&"'C334'")EE.<*3>'C,'*73'`96]'E<.B30*?'$)4>3>'C,'*73''U3*73<5&4>"'
D<@&4%-&*%.4'$.<'80%34*%$%0'(3"3&<07'QUVDR'&4>'*73''U3*73<5&4>"':Y6'(3"3&<07'&4>'
:44.A&*%.4'9)*7.<%*,'Q:Y6'(3@%3R1'
N,;,3,'-,9(
G1 ;3A&0W)&?']1?'(&.)-&%.)?'91?'L3*3<"?'Y1?'Y&<%>&+%"?'`1?'c&<E.)-%"?'c1?'L35&07&)>?'
Y1?' O&40%4%?' O1' QMddJR1' O)5*%=.>&5' "34"%4@?' %4*3<E<3*&*%.4' &4>' 0.E,%4@' .$'
=.A3=34*"' C,' &' A%<*)&5' &@34*1' :4K' 7."&%%,(+0*! "2! 7%.&%5'("+! 4+,! 8+'%.4&'()%!
9%&:+"/"0(%*!;789<=>?1'
M1 S)4@3?' e1?' 6)?' f1' &4>' 63<-.E.)5."?' a1' Gggg1' Y.@4%*%A3'O.>35%4@K' c4.253>@3?'
(3&".4%4@' &4>' L5&44%4@' $.<' :4*355%@34*' Y7&<&0*3<"1' :4' L<.01' .$' 8:``(9LT' gg?'
P."'94@353"?'Y91'
N1 81'c.EE?'Y1';30+3<?':1'V&07"=)*7K''
673'h%<*)&5'T)=&4'O&['\'O.>35%4@']=C.>%3>'Y.4A3<"&*%.41':4K'c:'MddJ'\'a3=.'
L<3"34*&*%.4?'][*34>3>'9C"*<&0*"?'EE1'MG\MZ?'MddJ1'
Z1 L3*3<"?' Y1' QMddJR'@!7%.&%5'34//1AB4*%,! 9:%".1! "2!C(+,!C",%/! 2".! @0%+'!
8+'%.4&'("+! 8+('(4'("+?' L3*3<"?' Y1' :4*3<4&*%.4&5' e.)<4&5' .$' T)=&4.%>'
(.C.*%0"'Q:eT(R?'NQNR?'EE1'NMG'\'NZd?'H1 67#<%"".4?'c1' (1' QMddbR1'9A&*&<' :4*355%@3403' :4$)"%.4-c3,'U.*32.<*7,' :"")3"1'
c3,4.*3' E<3"34*&*%.4?'>T#/( 0'#,3'%#"&'%D(8&';,3,'-,( &'(8&4B$#,3(U3%B/"-9(
%')(*3#";"-"%D(0'#,DD"+,'-,?'N:9'Mddb?'9*734"?'`<3303?'O&,'Nd\NG?'GMN\GNZ1'
J1 h%57B&5="".4?'T1?''Y&4*35=.?'U1?''Y&""355?'e?''Y7&$&%?'U1?'c%EE?'O1?'c.EE?'
81?'O&40%4%?'O1?'O&<"355&?'81?''O&<"7&55?'91?'L35&07&)>?'Y1?''()**+&,?'!"1?'
67#<%"".4?' c1?' h&4' V35C3<@34?' T1?' ' h&4' >3<' V3<$?' (' QMddbRK' 673'
;37&A%.<'O&<+)E' P&4@)&@3K' ' (3034*'a3A35.E=34*"' &4>' Y7&5534@3"?' Y1'
L35&07&)>?'e\Y'O&<*%4?']1'94><i?'`1'Y.553*?'c1'c&<E.)-%"?'a1'L35i'Q3>"RK'
:4*355%@34*'h%<*)&5'9@34*"?'L<.01'.$':h9db?'L&<%"?'PU9:'ZbMM?'EE1'gg\GGG1'
8E<%4@3<'h3<5&@?';3<5%41'
b1 VNY' :40)C&*.<' `<.)E' (3E.<*' QMddbR'7**EK__22212N1.<@_MddH_:40)C&*.<_3=.*%.4_f`(\3=.*%.4_'
'
Applying the SAIBA framework to the Tactical Languageand Culture Training System
Prasan Samtani, Andre Valente and W. Lewis JohnsonAlelo, Inc
11965 Venice Blvd, Los Angeles CA 90066{psamtani,avalente,ljohnson}@tacticallanguage.com
1. INTRODUCTIONThe Tactical Language and Culture Training System (TLCTS)
helps learners acquire basic communicative skills in a foreignlanguage and culture. The system is broadly divided intotwo main sections. In the Skill Builder, learners are coachedthrough a set of lessons on language and culture by a vir-tual tutor. The tutor o!ers attributional and motivationalfeedback in order to keep the learner on track [4].
Following the acquisition of skills, learners must proceedto complete missions in a simulated environment populatedwith virtual characters - the Mission Game. Learners canspeak to AI characters in the game through a microphone,and can select appropriate gestures using the mouse. Inorder to successfully complete missions, the learner mustdisplay a mastery of the specific linguistic skills in the tar-get language, as well as a knowledge of the culture. Thelearner is accompanied by an aide character, who can o!errecommendations if the player gets stuck.
Figure 1: TLCTS Mission Game
Three training systems have been built so far using TLCTS:Tactical Iraqi (for Iraqi Arabic), Tactical Pashto(for Pashto,spoken in Afghanistan), and Tactical French. Tactical Iraqiis currently in use by thousands of learners in the US armed
Cite as: Title, Author(s), Proc. of 7th Int. Conf. on Au-tonomous Agents and Multiagent Systems (AAMAS 2008),Padgham, Parkes, Müller and Parsons (eds.), May, 12-16., 2008, Estoril,Portugal, pp. XXX-XXX.Copyright © 2008, International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.
forces as they prepare for deployment overseas. Reportsfrom users suggest that the program is engaging and highlye!ective. Anecdotal reports indicate that trainees start ac-quiring basic communication skills very rapidly, and acquirebasic functional proficiency in conversational Arabic rele-vant to their mission in just a few weeks.
The Mission Game is fundamentally a social simulation.Processing and generating multimodal behavior is a funda-mental component of the Tactical Language and CultureTraining System. The user controls a player character andmust communicate with agents in the scenario in order toachieve his/her goals. The agents receive a stimulus fromthe user and produce an output intent. A key challengein these types of simulations is to produce the behavior ofautonomous non-player characters given abstract specifica-tions of communicative intent. This process has been docu-mented extensively in previous publications [4][5]
In this paper, we describe the framework we adopted forgenerating multimodal behavior in TLCTS and draw impli-cations for the design of FML. We start by describing theframework we adopted, which closely matches the SAIBAFramework1. Then, we discuss the representation we cre-ated for communicative acts in TLCTS. We compare thisrepresentation to elements of FML, and propose some mod-ifications to FML as well as the addition of a language torepresent context information. Finally, we present our con-clusions.
2. BEHAVIOR GENERATION IN TLCTSPreviously, we had simplified the behavior generation prob-
lem in TLCTS by (a) treating the inputs and outputs to theagents as symbols and (b) maintaining a strict one-to-onemapping from intent to behavior (see figure 2). As our sys-tem was scaling up, this was no longer a feasible solution,as it forced authors to micromanage the behavior. Any in-put from the user that does not map perfectly to the set ofacceptable output symbols produced a generic “What wasthat?” response from the agents.
The process of intent planning in the current-generationversion of TLCTS is realized through a state machine. Ear-lier versions of TLCTS used Psychsim, a social simulationframework developed at USC/ISI[8], however, its usage wasdiscontinued for runtime performance reasons. The impor-tance of following social norms with respect to the targetculture has to be encoded implicitly within the state ma-chine description by the authors. While this approach had
1http://wiki.mindmakers.org/projects:saiba:main/
worked excellently when the system was small, it began tolead to inconsistency as the system grew in size. Other lim-itations include:
1. It does not lend itself to distributed processing becauseall processing is done at a single module (the MissionManager).
2. The input processing is too simple, relying on a map-ping process to match a speech act to each input (ut-terance and/or gesture).
3. Speech acts are currently represented as symbols, whichdoes not easily lend itself to the representation of com-plex speech acts; e.g., acts that need parameters (“whatis the name of this object?”).
4. The behaviors of all agents are decided based on acentralized finite state machine, which is computation-ally e"cient but is limited in expressivity and hard tomaintain.
5. Conversational outputs are produced manually withbundles of speech recordings and animations, and se-lected by the finite state machine.
6. It assumes that there is only one player participatingin the simulation.
Figure 2: Old pipeline
To solve these limitations, we designed an improved be-havior generation for TLCTS, shown in figure 3. The basicprocess adopts the SAIBA Framework [6]. The latest userinput (in the form of a speech act) is passed to agents thatperform intent planning; that is, decide which communica-tive acts (if any) to perform. The action is specified as acommunicative act (usually a speech act) that goes througha behavior generation step that ultimately produces charac-ter behavior that is realized through the game engine (cur-rently Unreal Engine).
To generate culturally and contextually appropriate be-havior that automatically adapts based on the current so-cial and environmental context, we must encode a signifi-cant amount of knowledge about the dialog, the world, andthe target culture. Some of these areas are well researched.For example, the maintenance of dialog context is a popular
Figure 3: New output pipeline.
topic amongst researchers. The three knowledge bases (di-alog context, cultural context and environmental context)carry separate sets of information. For example, the dialogcontext contains knowledge such as the current and previoustopics of conversation and the level of formality of the dia-log. When generating behavior, the translation rules selectappropriate behaviors; for example, selecting “marHaba” inan informal Arabic conversation and “as-salaamu 9aleykum”in more formal ones. The cultural ontology specifies appro-priate behaviors for an intent in a particular culture; forexample, nodding could show approval or disapproval de-pending on the culture. The corresponding input pipelinealso uses these knowledge bases.
3. REQUIREMENTS FOR REPRESENTINGCOMMUNICATIVE ACTS IN TLCTS
Developing a system to model dialog in TLCTS is com-plex, because communicative acts are used within the systemfor both generation and interpretation. There are a largenumber of potential inputs a learner can produce, and weneed a system that can e!ectively manage all of them with-out undue e!ort from the authors. On the speech recog-nition side, we introduced the use of utterance templates,which use a more flexible grammar definition syntax. Formore details on utterance templates, see [7].
Using communicative acts that are useful for interpre-tation places a unique requirement on our system. Ourcommunicative acts need to be more content-rich than theywould be if they were simply used for generation of behav-ior. We identified three main roles for communicative actswithin TLCTS:
To specify function What does the act do? Does it in-form, make a request, o!er a greeting, accept a pro-posal?
To modulate function How polite or forceful was the speaker?What level of redress was used when making a request?
To specify the context under which the act is applicableIs it directed at a male or a female listener? Is itused between people of equal standing or to someoneof higher standing?
In addition to the requirements of authorability and flex-ibility, our development of systems for multiple target lan-guages and cultures places a new requirement: adaptability.Our plan is to construct libraries of communicative acts thatare reusable across di!erent languages. However, each tar-get culture has its own definition of what acceptable behav-ior is. Therefore, we need to be able to define how di!erent
communicative acts are interpreted in di!erent cultures andsituations. In order to do this, we set about creating a uni-fied representation of context, further described in Section5.1.
4. THE TLCTS COMMUNICATIVE ACT ON-TOLOGY
Several researchers have worked on creating representa-tions of communicative intent. We initially looked at FrameNet,on online lexical resource based on semantic frames [1]. FrameNetpossesses many of the features that we require within a rep-resentation language, however, it is significantly larger andmore complex than we would like. Since it is used by thenatural language community, it tends to contain extraneouslinguistic information that we do not require (such as wordsenses). Its size and complexity are also a barrier to au-thorability. However, it has been an excellent resource forus as we create our ontology, and we adapted several of itsconcepts.
We then looked at the paper by David Traum and Eliza-beth Hinkelman [9], which presents a typology that identifiesthe various functions of communicative acts. A summary ofthis is presented in table 1. We adopted this as the startingpoint of our ontology.
Turn-taking These acts modeltaking and receivingthe turn within aconversation
take-turn,release-turn,keep-turn, assign-turn
Grounding These acts are usedto frame the corespeech acts
initiate, continue,ack, repair, req-repair, req-ack,cancel
Core SpeechActs
These are the tradi-tional types of speechacts
inform, whq, ynq,accept, request,reject, suggest,eval, req-perm,o!er, promise
Argumentation These are used tobuild more complexactions out of thecore speech acts
elaborate, sum-marize, clarify,q&a, convince,find-plan
Table 1: Traum and Hinkelman’s typology of speechacts
We found that several categories of core acts in Traum’soriginal typology could be seen as subclasses of one another.In addition, there are acts that were not covered by theontology - including o!ering greeting, thanks, support etc.Based on the work done by Feng, Hovy, Kim and Shaw at ISI[3], we added a new class of core speech acts called ‘social’acts, which include the aforementioned types. The resultingspecification of communicative acts is shown in table 3. Thedi!erent classes for core acts are organized in a hierarchy,shown in table 2.
4.1 Modulating communicative actsWe mentioned that the structure of communicative acts
was divided into three broad categories. The first, describingfunction, was described in the above section. The secondcategory describes how communicative acts are modulated.
core
inform request evaluate socialinform
accept reject o!er promise
request
request-info
whq ynq
request-action
eval
compliment criticize
social
greet thank
Table 2: Class Hierarchy for core acts
There are many possible models for modulating commu-nicative acts. Our interest is primarily in models of polite-ness. We adopted a model of politeness based on Brown andLevinson’s theory of politeness[2]. The constructs used arethe following:
Degree of imposition The degree of imposition inherentin the act itself. Independent of politeness tactics used.For example, asking someone to lay down and put theirhands flat on the ground is inherently more imposingthan asking them the time.
Negative face threat Degree to which the act a!ects thereceiver’s negative face (desire to remain autonomous)
Positive face threat Degree to which the act a!ects thereceiver’s positive face (desire to be appreciated/avoidcriticism)
4.2 Specifying applicabilityThe final part of the communicative act specification deals
with when an act is applicable. Certain communicative actsare only appropriate at certain times in the day (for exam-ple, “Good night"). Others are only applicable when thereceiver is female (“Yes ma’am"). We need a way to spec-ify the context in which a communicative act is applicable.This is separate from the description of context itself, whichdescribes the actual context in which the conversation takesplace. In particular, this feature of being able to specifythe applicability of a communicative act is especially usefulwhen there is a mismatch between the actual context and theapplicable context of the communicative act - for example,saying “Yes ma’am" to a commanding o"cer. Functionally,this act is very similar to “Yes, sir", but it is likely to be in-terpreted as a serious insult. Therefore, all communicativeacts are interpreted based on the current context.
5. MATCHES AND MISMATCHES WITH FMLSince at a high level we are adopting the SAIBA frame-
work, we decided to evaluate whether we could adopt itsrepresentation languages, FML and BML. We examined the
GroundingValue Descriptioninitiate Start the grounding of the conver-
sation, typically includes greeting,pleasantries
continue Continue the grounding of the con-versation, for example, introducingoneself, asking for information
ack Acknowledge an action/suggestionmade by the hearer
repair Repair ground that was lost bymaking an inappropriate/o!ensivestatement?
req-repair Request the receiver to initiate re-pairing the grounding of the conver-sation
req-ack Prompt the listener for an acknowl-edgement (for example: “How doesthat plan sound?")
cancel Nullify the impact of a previous ac-tion (for example: “Disregard whatI said earlier")
Core-Actinform Informs the hearers about some
subject (person/place/question)accept Informs the hearer that a particular
o!er has been acceptedreject Informs the hearer that a particular
o!er has been rejectedo!er Propose a suggestion/action/deal
to the hearerpromise A promise to the hearer that a par-
ticular action will be performed ata later time
request Can be a request for information oran action
request-info A request for some informationwhq A request for specific information
about a subject ("What is yourname?")
ynq A simple yes/no OR true/false typequestion - does not have to be a yesor no - eg: ("Was he young or old?")
request-action A request for some action to be per-formed ("sit down")
eval O!ers an evaluation to the hearercompliment A positive evaluationcriticize A negative evaluationsocial An act that only has a social func-
tion (and no other)greet Serves to initiate a conversation.thank Thank a hearer for something they
have done/will do. Can also be usedsarcastically.
Table 3: Core Act Specification
structure of BML, and found it extremely suitable with re-gards to our needs. While we have not implemented a BMLrealizer, we use a messaging system to realize behavior, andwe perceive that incorporating BML should not be too dif-ficult.
However, we feel there are some fundamental mismatchesbetween our needs for representing function and the currentproposal for FML. First, part of the the mismatch is relatedto the lack of context in the SAIBA framework. We feel thatthe mapping from FML to BML cannot take place withoutthe use of context, and propose that a unified descriptionof context (both cultural and environmental) is absolutelynecessary within the SAIBA framework. In addition, we be-lieve that there are some representational problems even atthe top level structure of FML, which need to be addressedin order for SAIBA to be a complete framework for a widevariety of interactive social simulations. Below we outlinewhat such representation of context should contain.
Second, the structure of FML as proposed seems to invertcore and auxiliary elements. We believe that at the top level,a unit of function needs to represent a communicative act(as we do in our system). Our understanding is that theclosest match to a communicative act in the current FML isthe performative element.
We therefore propose that the performative element shouldbe the top level element within an FML block. Our analysisof communicative acts indicates that the remaining elementsabove performative are either unnecessary or out of place inthe structure. For example, the decision of whether to takethe turn by force or to wait for the turn to be freed shouldbe made by the agent, and thus it is not an attribute ofthe communicative function - see [10] for more details. Fur-ther, we recommend that the topic element should be movedwithin the performative element.
5.1 Representing context - a new language?As mentioned above, a fundamental issue with the origi-
nal SAIBA framework is that there is no explicit represen-tation of context. In our opinion, the mapping from FMLto BML cannot take place without a very detailed represen-tation of context. For example, even an extremely simpletask such as exiting a conversation is dependent on the timeof day (“Have a great day” vs “Good night”). An alterna-tive to maintaining an explicit context could be to movethis information into the FML representation. However, webelieve is against the goal of SAIBA, which is modularizethe generation of communicative behavior into function andrealization.
We therefore propose that there should be a formal lan-guage explicitly representating context. We tentatively namethis new language "Context Markup Language" or CML.CML would be divided broadly into three modules:
Dialog context Includes the history of what has happenedin the dialog, the current topic of conversation, thelevel of tension, etc. This is updated very frequently.
Environmental context This includes information aboutthe time of day, current setting (certain settings, likeplaces of worship, can influence how certain commu-nicative functions should be realized), etc. This is up-dated less frequently (whenever the location changesor at significant points in time (noon, dusk)).
Cultural context Provides information on the culturally
appropriate way to express certain communicative func-tions. For example, the palm-over-heart gesture to ex-press sincerity among Iraqis, or the folded hands toexpress respect in Hindu culture. This can be consid-ered read-only for each culture and does not change.This module would also contain representations of so-cial norms — what is acceptable and unnacceptable ina culture, and what are appropriate (or at least com-monly acceptable) responses to norm violations.
If context is explicit maintained in a separate module, it ispossible for FML to focus solely on representing the desiredfunction of a communicative act. In our view, FML shouldnot encode any information on how this act is to be realized.
We have started a research project to define the detailedstructure of CML. Our working assumption is that CMLshould be based on an ontology of context. Most of thecontent inside the modules will be propositional - mostlyfacts, statements. Appendix A provides an example of anontology representing environmental context, written in KIFsyntax and implemented in the knowledge representationsystem PowerLoom2. We intend to propose a detailed XMLstructure at a later date.
6. CONCLUSIONThe TLCTS Mission Game is a practical, heavily used so-
cial simulation that aligns well with the SAIBA framework.However, we found that some features in the original FMLproposal did not match our needs. Based on these needs andour experience in social simulations, we recommended somemodifications to FML that will make it a batter match withour needs and, we believe, the needs of other groups as well.In addition, we proposed that an additional langauge calledCML should be developed to SAIBA, unifying the represen-tation of context information as an aid in the translationfrom FML to BML for a target situation and culture.
7. REFERENCES[1] Baker, Collin F., Fillmore, Charles J., and Lowe, John
B. The berkeley framenet project. In Proceedings ofthe COLING-ACL, 1998.
[2] P. Brown and S. Levinson. Politeness: Someuniversals in language usage. Cambridge UniversityPress, 1987.
[3] D. Feng, E. Shaw, J. Kim, and E. Hovy. Learning todetect conversation focus of threaded discussions.
[4] Johnson, W.L., Beal, C., Fowles-Winkler, A., Lauper,U., Marsella, S., Narayanan, S., Papachristou, D.,Valente, A., and Vilhjalmsson, H. Tactical languagetraining system: An interim report. In proceedings ofIntelligent Tutoring Systems, 2004.
[5] Johnson, W.L., Marsella, S., and Vilhjalmsson, H. Thedarwars tactical language training system.Interservice/Industry Training, Simulation andEducation Conference, 2004.
[6] S. Kopp, B. Krenn, S. Marsella, A. N. Marshall,C. Pelachaud, H. Pirker, K. R. Thorisson, and H. H.Vilhjalmsson. Towards a common framework formultimodal generation: The behavior markuplanguage. In Proceedings of Intelligent Virtual Agents2006, pages 205–217, 2006.
2See http://www.isi.edu/isd/LOOM/PowerLoom.
[7] J. Meron and W. Johnson. Improving the authoring offoreign language interactive lessons in the tacticallanguage training system. SLaTE Workshop on Speechand Language Technology in Education, 2007.
[8] D. V. Pynadath. Psychsim: Agent-based modeling ofsocial interactions and influence. In ICCM, pages243–248, 2004.
[9] D. R. Traum and E. A. Hinkelman. Conversation actsin task-oriented spoken dialogue. ComputationalIntelligence, 8:575–599, 1992.
[10] H. H. Vilhjálmsson, C. Merchant, and P. Samtani.Social puppets: Towards modular social animation foragents and avatars. In D. Schuler, editor, HCI (15),volume 4564 of Lecture Notes in Computer Science,pages 192–201. Springer, 2007.
APPENDIXA. ENVIRONMENTAL ONTOLOGY IN POW-
ERLOOM(defmodule "TACTLANG-ENVIRONMENT":includes("PL-USER"))(in-module "TACTLANG-ENVIRONMENT")
(defconcept environment)
(defconcept time-of-day:documentation "Represents the time of dayfor the environmental context")
(assert (and (time-of-day morning)(time-of-day afternoon)(time-of-day evening)(time-of-day night)))
(assert (closed time-of-day))(deffunction current-time ((?e environment))
:-> (?t time-of-day))
(defconcept physical-thing)
(defconcept person ((?t physical-thing)))
(defconcept vector)(deffunction vector-x ((?v vector)) :-> (?x FLOAT))(deffunction vector-y ((?v vector)) :-> (?y FLOAT))(deffunction vector-z ((?v vector)) :-> (?z FLOAT))
(deffunction location ((?t physical-thing)):-> (?v vector))
(deffunction geo-distance((?a1 physical-thing) (?a2 physical-thing)):-> (?dist FLOAT))
(defconcept place ((?t physical-thing)))(deffunction current-place ((?person)) :-> (?p place))
(defconcept place-of-worship ((?p place)):documentation "A place of worship may imposeadditional restrictions on what is culturallyappropriate. The exact restrictions should be
represented in the cultural ontology")
(assert (and (place-of-worship mosque)
(place-of-worship temple)(place-of-worship church)(place-of-worship synagogue)(place-of-worship gurdwara)))
(defconcept market ((?p place)):documentation "A market may ease restrictionson appropriate behavior - for example, thoseon maintaining physical distance or onappropriate volume of voice.")
(defconcept weather-conditions)(defconcept precipitation-conditions)(assert (and (precipitation-conditions clear)
(precipitation-conditions rain)(precipitation-conditions snow)(precipitation-conditions hail)(precipitation-conditions fog))
(defconcept overhead-conditions)(assert (and (overhead-conditions sunny)
(overhead-conditions cloudy)(overhead-conditions partially-cloudy))
(deffunction weather-precipitation ((?w weather)):-> (?p precipitation-conditions))
(deffunction weather-overhead ((?w weather)):-> (?p overhead-conditions))
(deffunction weather ((?p place)):-> (?w weather))
!"#$%&'"(%)*+$,"+'"-./0*%+/"1&2$&)&/*3*%+/""'$+4"53/637'"*+"8!9#!"
!"##$%&'()*+,)-%%.#&"#/&01(%2(##&34&5*61(%%.#&7$#2$1&8.1&9#"):%(%&"#/&;$%(<#&.8&=#2$))(<$#2&9<$#2%&
"#/&>?*..)&.8&7.-@A2$1&>?($#?$&3$:B+"CDB&E#(C$1%(2:F&=?$)"#/&GHIJ&IKK&LMHNHF&MJNOP&
L*"##$%F&2*.1(%%.#PQ1A4(%&
&
&
!"#$%!&$!"#$!%&'()!#*+%!,%! )#&(!-*-$'! &.)',/01$(!)#$!*&2!,%!34564!*./!)#$!
%0.1)&,.*+! 2*'70-! +*.80*8$! 9:;<=! %',2! )#$! -$'(-$1)&>$! ,%! )#$!
?$.)$'!%,'!4.*+@(&(!*./!A$(&8.!,%!5.)$++&8$.)!48$.)(!9?4A54=!*)!
B$@7C*>&7!D.&>$'(&)@E! !"#$! ($1,./!#*+%! -',>&/$(! *! F'&$%! #&(),'&1!
,>$'>&$G!,%!)#$!%0.1)&,.*+!'$-'$($.)*)&,.!,%!1,220.&1*)&>$!&.)$.)!
&.! *! +&.$! ,%! 1,220.&1*)&>$! #02*.,&/(! *./! '$+*)$/! (@()$2(H!
()*')&.8!G&)#!I*./*+%!*./!+$*/&.8!0-!),!,.$!,%!)#$!$*'+@!-',-,(*+(!
%,'!:;<!&.!)#$!34564!%'*2$G,'7E!
&'()*+,-)./'01/#234)5(/6).5,-7(+,.!5EJEK! L!,(-8-5-'9/ :0()99-*)05)MN! O.,G+$/8$! B$-'$($.)*)&,.!
:,'2*+&(2(! *./! ;$)#,/(! P! !"#$%&' #()' &*"+,-&.' "%,"%&%(-#-+/('
0#(12#1%&.'"%,"%&%(-#-+/(&E'
;)0),'9/$),<.!4+8,'&)#2(H!A$(&8.H!3)*./*'/&Q*)&,.H!<*.80*8$(H!"#$,'@E!
=)>?+,1.!R2F,/&$/! ?,.>$'(*)&,.*+! 48$.)(H! :0.1)&,.*+! B$-'$($.)*)&,.H!
;0+)&2,/*+!?,220.&1*)&,.H!S02*.!?,2-0)$'!5.)$'*1)&,.E!
@A" :B$%C6D&$:CB/4(! )#$! 34564! 1,.(,')&02!8$*'(! 0-! %,'! )#$! ($1,./!-#*($! ,%! )#$!
-+*..$/! G,'7! ,.! '$-'$($.)*)&,.(! %,'! 20+)&2,/*+! 8$.$'*)&,.H! G$!
G,0+/! +&7$! ),! (022*'&Q$! G#*)! G$! 1,.(&/$'! (,2$! ,%! )#$! 7$@!
*(-$1)(! ,%! )#&(! G,'7! *(! G$++! *(! 8&>$! *! >&$G! ,%! (,2$! ,%! )#$!
#&(),'&1*+!',,)(!)#*)!)#$!$%%,')!8'$G!,0)!,%E!!
"#$! 34564! 9(&)0*)&,.H! *8$.)H! &.)$.)&,.H! F$#*>&,'H! *.&2*)&,.=!
$%%,')! $T&()(! %&'()! *./! %,'$2,()! %,'! )#$! -0'-,($! ,%! &.1'$*(&.8! )#$!
(@.$'8@! G&)#&.! )#$! '$($*'1#! 1,220.&)@! %,10($/! ,.! 20+)&2,/*+!
1,220.&1*)&,.! &.! ',F,)(! *./! >&')0*+! #02*.,&/(E! ?,2-0)$'!
8'*-#&1(! G,'7! #*(! $.C,@$/! $.,'2,0(! (011$((! &.! ()*./*'/&Q*)&,.!
$%%,')(!%,'!G#*)!G$!($$!*(!)#$!+,G$()!+*@$'!&.!*!()*17!0-,.!G#&1#!
(@()$2(! *./! *F()'*1)&,.(! +*@$'! $>$'U&.1'$*(&.8! 1,2-+$T&)@! *./H!
$>$.)0*++@H! &.)$++&8$.1$E! "#$! ,F($'>*)&,.! &(! (&2-+$N! 4(! 2,'$!
-$,-+$! /,! '$($*'1#! &.! &.)$'*1)&>$! #02*.,&/(! )#$! -,)$.)&*+! %,'!
/0-+&1*)&,.!,%!$%%,')!&.1'$*($(E!"#$!1,.(,')&02!#*(!*//'$(($/!)#$!
-',F+$2!,%!20+)&2,/*+! 8$.$'*)&,.!V,.$!*F()'*1)&,.! +$>$+W! *F,>$!
)#$! 8'*-#&1(! +$>$+X! )#&(! '$(0+)$/! &.! )#$! 6;<! $%%,')H! G#&1#! G*(!
F*($/!,.!*!+,)!,%!-'&,'!G,'7E!!
4)! )#$!6;<!+$>$+!G$!($$!8',0-&.8(!,%!-'&2&)&>$!2,>$(H!G#*)! &.!
',F,)&1(! &(! 1*++$/! $U2,>$(H! &.),! +*'8$'H! 2,'$! 1,2-+$T! ($)(! ,%!
&.()'01)&,.(H!&.),!G#*)!&.!)#$!',F,)&1(!G,'+/!&(!,%)$.!'$%$''$/!),!*(!
*1)&,.! *./! G#&1#! &.! 6;<! *'$! 1*++$/! F$#*>&,'(E! 6;<! &(! )#0(! *!
+*.80*8$! %,'!/$(1'&F&.8!$>$.)(! )#*)!*'$! (0--,($/! ),!#*--$.E!"#$!
$>$.)(! *'$! .,)! *(! (&2-+$! *(! 8'*-#&1(! 1,22*./(! 9,)#$'G&($! )#$@!
G,0+/!.,)!(*>$!)#$!/$>$+,-$'!*.@!)&2$=H!(,!)#$@!1*..,)!'$-'$($.)!
)#$!(*2$!%&.$!+$>$+!,%!/$)*&+!P!*./!)#*)!&(!-'$1&($+@!)#$!-,&.)E!"#$@!
*'$! #&8#$'! +$>$+! )#*.! $U2,>$(! *./! )#$'$%,'$! 1*.! F$! 0($/! ),!
-',8'*2!+*'8$!($)(!,%!$U2,>$(!&.!(&.8+$!()',7$(E!S,G$>$'H!#02*.!
F$#*>&,'! &(!#&8#+@!1,2-+$TH! *./! C0()!*(!F*(&1!1,2-0)$'!8'*-#&1(!
1,22*./(! *'$! .,)! G$++U(0&)$/! ),! /$(1'&F$! 1,2-+$T! #02*.,&/!
F$#*>&,'(H!6;<!&(!.,)!1,.>$.&$.)!%,'!'$-'$($.)&.8!+,.8!1#*&.(!,%!
20+)&2,/*+!$>$.)(!P!G#*)!&.!)#$!4E5E!G,'+/!*'$!1*++$/!-+*.(E!
R.)$'!:;<!P!%0.1)&,.*+!2*'70-!+*.80*8$E!"#$!*&2!G&)#!:;<!&(!
),!/$>$+,-! )#$!.$T)! +$>$+!,%!/$(1'&-)&,.! +*.80*8$!0-!%',2!6;<H!
,.$! )#*)!1*.!/$(1'&F$!G#*)!(#,0+/!#*--$.! &.!*!20+)&2,/*+!*8$.)!
*)!G#*)!G$!1,0+/!1*++!*!%0.1)&,.*+!+$>$+!P!'$-'$($.)&.8!&.!$(($.1$!
G#*)!)#$!*8$.)Y(!F$#*>&,'9(=!(#,0+/!*1#&$>$!P!&)Y(!8,*+(E!!
"#$!34564!1,.(,')&02Y(!*--',*1#!),!)#&(!$%%,')! &(!F*($/!,.!,.$!
)$.$)!)#*)!&(!$T)'$2$+@!&2-,')*.)!%,'!)#$!(011$((!,%!*1#&$>&.8!)#$!
()*)$/! *&2(H! &E$E! &.1'$*($/! 1,++*F,'*)&,.H! $*($! ,%! (#*'&.8! '$(0+)(!
*./! *1)0*+!G,'7&.8! (@()$2(N! *! 1+$*'! ($-*'*)&,.!,%! '$-'$($.)*)&,.!
+*.80*8$! *./! )#$! -',1$(($(! )#*)! -',/01$! *./! 1,.(02$! )#$!
+*.80*8$E!"#&(! &(H! %,'! )#$!2,()!-*')H!-'*1)&1*++@!2,)&>*)$/!*./! &(!
F*($/!,.!*!+,.8!#&(),'@!*./!1*.H!&.!,0'!>&$GH!#$+-!7$$-!)#$!$%%,')!
,.! *! -',(-$',0(! )'*17E! 4! (011$((%0+! ($-*'*)&,.! 7$$-(! ,-$.! )#$!
-,((&F&+&)@!)#*)!*.@,.$!1*.!1'$*)$!)#$&'!,G.!-+*..&.8!2$1#*.&(2(E!
:,'! )#&(! ),! F$! -,((&F+$! )#$! %0)0'$!:;<!1*..,)! *./!20()! .,)! -0)!
1,.()'*&.)(!,.!)#$!7&./(!,%!-',1$(($(!)#*)!1,.(02$!*./!-',/01$!&)E!
Z#$)#$'! )#&(! &(! -,((&F+$! '$2*&.(! ),! F$! ($$.E!60)! (&.1$! )#$!2*&.!
'$($*'1#! %,10(! &.! *')&%&1&*+! &.)$++&8$.1$! *./! 1,220.&1*)&>$!
#02*.,&/(! ,.! )#$! ),-&1! ,%! 20+)&2,/*+! 8$.$'*)&,.! *'$! )#$!
2$1#*.&(2(!)#*)!1,.)',+!*./!-',/01$!)#$!F$#*>&,'(H!)#&(!20()!F$!*!
%'$$! >*'&*F+$H! 0.1,.()'*&.$/! F@! )#$! +*.80*8$(! )#*)! )#$! -',1$(($(!
G,'7!G&)#E!"#$! +*.80*8$H!:;<H!G&++!/$(1'&F$! )#$! &.)$.)&,.(! )#*)!
*.! *8$.)! 2*@! #*>$! &.! G#*)! &)! /,$(E! "#&(H! ,%! 1,0'($H! (#,0+/! F$!
-,((&F+$!),!/,!G&)#,0)!(*@&.8!*.@)#&.8!*F,0)!)#$!2$1#*.&(2(!)#*)!
*'$!'$[0&'$/!),!2*.&-0+*)$!)#,($!&.)$.(&,.(E!!
5)!&(!&2-,')*.)!),!%,++,G!)#$!F*(&1!&/$*!F$#&./!)#$!34564!$%%,')!,%!
+*@$'(! ,'! F*./(! P! )#*)! &(H! *! 8&>$.! 2*'70-! +*.80*8$! &(! .,)! ,.+@!
+&2&)$/! &.! )#*)! )#$'$! *'$! /$)*&+(! P! +,G$'U+$>$+! )#&.8(! P! )#*)! &)!
1*..,)! 9*./! (#,0+/! .,)=! '$-'$($.)H! )#$'$! G&++! *+(,! F$! +*'8$'! P!
#&8#$'!+$>$+!P!)#&.8(!)#*)!&)!(#,0+/!.,)!'$-'$($.)X!6;<!*./!:;<!
*'$!1,.()'*&.$/!),!F*./(!,%!,-$'*)&,.E!"#$($!F*./(!*'$!+&2&)$/!F@!
)&2$!*./!(1*+$H!)#*)!&(H!)#$!)&2$(1*+$(!1,>$'$/!F@!6;<!*'$!(2*++$'!
)#*.! )#,($! 1,>$'$/! F@! :;<E! <&7$G&($H! *(! G$! F0&+/! :;<! )#$'$!
2*@! F$! +*'8$! )&2$(1*+$(! %,'! G#&1#! :;<! G&++! F$! &.*--',-'&*)$X!
)#&(H!#,G$>$'H!'$2*&.(!),!F$!($$.E!
Z$! G&++! .,G! )0'.! ),! (,2$! ,%! )#$! #&(),'&1*+! -'$1$/$.)(! %,'! )#$!
10''$.)!:;<!$%%,')(E!
EA" /FDB&$:CB/G/:B$HB$:CB/%HI%H#HB$!$:CB/:B/;!B6!JF/5.!)#$!\2&'!*'1#&)$1)0'$H!,.!G#&1#!)#$!1,220.&1*)&>$!#02*.,&/!
I*./*+%! G*(! F0&+)! L]M! 9:&80'$! ]=H! *! .02F$'! ,%! &/$*(! G$'$!
-'$($.)$/! )#*)! '$+*)$! ),! )#$!-'$($.)! $%%,')E!"#$!2*&.!1,2-,.$.)(!
'$+$>*.)!),!:;<!&.1+0/$!)#$!3*-+/('4*5%)20%"!943=H!G#&1#!1,0+/!
'$1$&>$! *./! $T$10)$! 8,*+(! '$-'$($.)&.8! F,)#! %0.1)&,.*+! *./!
F$#*>&,'*+!(-$1&%&1*)&,.(E!!
!
F-*2,)/@K/$L)/;'01'98GM<-,/?'./5'7'39)/+8/,)'9N(-<)/8'5)N(+N
8'5)/5+0O),.'(-+0./?-(L/L2<'0/2.),.A/B+(-5)/;'01'98P./*'Q)/
,).7+01-0*/(+/(L)/L'01/*).(2,)R./-0(),7,)()1/8205(-+0/N/-A)A/
7+-0(-0*/N/?-(L-0/S'/L2<'0N/9-T)/1)9'>U/+8/VWWNXWW/<.A/
^.! )#$! -$'1$-)&,.! (&/$H! \2&'! #*/! *! ($)! ,%! -',1$(($(! 1*++$/!
620-+$/)#0' 7%&*"+,-/"&! )#*)! 1,0+/! *88'$8*)$! &.%,'2*)&,.! %',2!
8(+$/)#0'9%"*%,-/"&!9F,)#!)#$($!2,/0+$(!G,'7$/!G&)#!'$*+U)&2$!
-$'1$-)0*+!/*)*!8$.$'*)$/!F@!)#$!F$#*>&,'!,%!*!-$'(,.=X!)#$!,0)-0)!
,%! )#$($!G$'$! V(7$)1#$(W! P! /$(1'&-)&,.(! ,%! #02*.! F$#*>&,'! P! *)!
>*'&,0(! +$>$+(! ,%! /$)*&+E! RT*2-+$(! ,%! (,2$! ,%! )#$! #&8#$'U+$>$+!
/$(1'&-),'(!*'$!8&>$.!&.!"*F+$!]E!!
$'39)/@A/HY'<79)/L-*L),N9)O)9/1).5,-7(+,./-0/;'01'98/
!"#"$!%&'($)
&*+"$!%&'($)
,*$&"$!%&'($)
,*$&%-*.+%./%0112-)
/*3%&'($)
*22(133"$!%41)
!(11&)
!(11&%/*55"67)
!
D-,.!)#$!'$+&*F+$!/$)$1)&,.!,%!*.@!,%!)#$($!/$(1'&-),'(!9*./!)#$&'!
)$2-,'*+! &.)$'U'$+*)&,.(#&-(=H!7%*+)%"! 2,/0+$(! G,0+/! %&'$! 8,*+(!
%,'! F$&.8! *1#&$>$/! )#',08#!2,>$2$.)! ,'! (-$$1#! &.! )#$!I*./*+%!
*8$.)E! ! "#$! 41)&,.! 31#$/0+$'! G,0+/! '$1$&>$! )#$($! 8,*+(! *./!
8$.$'*)$! )#$!*--',-'&*)$!*.&2*)&,.!1,22*./(H! 9&E$E! $U2,>$(=E! 5.!
\2&'! )#$! 43! &(! )#$! +*()! (),-! F$%,'$! F*++&()&1! $T$10)&,.! ,%!
*.&2*)&,.E!",!$.*F+$!&.)$''0-)*F&+&)@H!)#$!*F&+&)@!),!1*.1$+!*1)&,.(!
[0&17+@! %,'! *.@!/$8'$$!,%! %'$$/,2H! )#$!43!G,0+/!.$>$'!1,22&)!
2,'$! )#*.! J__! 2(! *)! *! )&2$! ),! )#$! *.&2*)&,.! +$>$+! F$+,GE! "#$!
8,*+(! 8$.$'*)$/!F@! )#$!A$1&/$'(! 1,0+/! (-$1&%@! )#$! (#*-$`+,,7!,%!
*.! *1)&,.H! $E8E!5#():"#+&%:,#0$:!/";#")H!G#*)! &)! (#,0+/!*1#&$>$H!
$E8E!1"%%-H!,'!F,)#H!$E8E!1"%%-:5#,,+0<E!"#$!43!'$(,+>$/!)#$($!8,*+(!
/,G.! ),! )#$!8'*-#&1(! +$>$+!F@!($+$1)&.8!F$)G$$.!,-)&,.(!(01#!*(!
G#$)#$'!),!8'$$)!G&)#!G&.7H!.,//&.8!,'!G*&>&.8H!$)1EH!/,G.!),!)#$!
+$>$+!,%!-'&2&)&>$!*.&2*)&,.!1,22*./(E!"#0(H!)#$!43!#*./+$/H!&.!
,.$! -+*1$! G&)#! *! (&.8+$! 2$1#*.&(2H! F,)#! G#*)! G$! G,0+/! +*)$'!
'$%$'!),!*(!)#$!6;<!+$>$+!*./!:;<!+$>$+E!/
VA" F%!ZH#/CF/FDB&$:CB#/:B/%H!/"#$!8$.$'*+!*--',*1#!,%!I*./*+%`\2&'!G*(!2*&.)*&.$/!&.!*!+*)$'!
*8$.)! 1*++$/! BR4! 9:&80'$! J=H! *+)#,08#! BR4a(! *'1#&)$1)0'$!
'$-+*1$/! (#*'$/! /$(1'&-),'! F+*17F,*'/(! G&)#! *! %&T$/! 2$((*8&.8!
-&-$+&.$!)#*)!-*(($/!*',0./!*!20+)&2,/*+!="#$%!LJME!!!
!
!
F-*2,)/EA/%H!/?'./'/,)'9N).('()/'*)0(/5'7'39)/+8/<29(-<+1'9/
0'(2,'9/9'0*2'*)/*)0),'(-+0/'01/201),.('01-0*A/
4! 0($'! $>$.)! 8$.$'*)$/! *.! &.-0)! :'*2$! &.! BR4a(! (@()$2! )#*)!!
1,.)*&.$/!*!%&$+/!%,'!,F($'>$/!>&(0*+!,'!*0/&F+$!F$#*>&,'!*./!)G,!
%&$+/(! %,'! %0.1)&,.*+! &.)$'-'$)*)&,.(! ,%! )#$($! F$#*>&,'(N! 4!
9"/,/&+-+/(#0!&.)$'-'$)*)&,.!91,.)$.)!'$+*)$/=!*./!*.!>(-%"#*-+/(#0!
9-',1$((!'$+*)$/=!&.)$'-'$)*)&,.E!!"#$($!&.)$'-'$)*)&,.(!G$'$!*//$/!
F@! *.!8()%"&-#()+(1' 6/)20%! F$%,'$! *!7%*+&+/(' 6/)20%! G,0+/!
)#$.!0($! )#$! &.)$'-'$)*)&,.(! ),!1'$*)$!*.!*--',-'&*)$! '$(-,.($! %,'!
BR4E! !"#$! '$(-,.($!G*(! $.1,/$/! &.! *.!,0)-0)!:'*2$! (&2&+*'! ),!
)#$! &.-0)! :'*2$! &.! )#*)! &)! 1,.)*&.$/! )#$! (*2$! b',-,(&)&,.*+! *./!
5.)$'*1)&,.*+! %&$+/(H! G#&1#! G$'$! .,G! %&++$/! G&)#! 1,220.&1*)&>$!
%0.1)&,.(! )#*)! .$$/$/! ),! F$! '$*+&Q$/! F@! )#$! *8$.)E! ! "#$! ,0)-0)!
:'*2$! G*(! ($.)! )#',08#! *! ?%(%"#-+/(' 6/)20%! )#*)! 8$.$'*)$/!
*--',-'&*)$! F$#*>&,'(H! %0+%&++&.8! )#$! %0.1)&,.(! F@! -+*1&.8! *!
/$(1'&-)&,.!,%!)#,($!F$#*>&,'(!&.!*!(-$1&*+!,0)-0)!%&$+/E!!!
"#$'$%,'$!BR4a(!1$.)'*+!/$1&(&,.!2$1#*.&(2!,.+@!,-$'*)$/!,.!*!
%0.1)&,.*+!'$-'$($.)*)&,.!,%!)#$!0($'a(!&.-0)!*./!-',/01$/!,.+@!*!
%0.1)&,.*+!'$-'$($.)*)&,.!,%!#$'!1,220.&1*)&>$!&.)$.)E!!!
BR4a(!5.)$'*1)&,.*+!%0.1)&,.(!G$'$!&.!-*')!/'*G.!%',2!I*./*+%a(!
/$(1'&-),'(!*./!*'$!(#,G.!&.!"*F+$!JE!!!
$'39)/EA/:0(),'5(-+0'9/8205(-+0./-0/%H!/
!"#"$!%&'($)
&*+"$!%&'($))
,*$&"$!%&'($)
+115"$!%&'($)
2(855"$!%&'($)
6"3&1$"$!)@-/'#'&,%#A%"B'
,*$&"$!%0112-*.+))
!"#"$!%0112-*.+))
1951.&"$!)@&/$%'+(,2-B)
5(131$&'@;+-5+('*/(C%"&#-+/(#0')+&-#(*%B'
"$#"&"$!)@-/'&-#"-'#'*/(C%"&#-+/(B'
61*#"$!@-5%'*/(C%"&#-+/(B'
!
BR4a(!b',-,(&)&,.*+! %0.1)&,.(!G$'$! &.! )#$! %,'2!,%!4,%%*5'3*-&H!
G#&1#!G$'$!%,'!)#$!2,()!-*')!/,2*&.!/$-$./$.)H!F0)!/&>&/$/!&.),!
+$,%"#-+C%&H! +(-%""/1#-+C%&! *./! )%*0#"#-+C%&E! ! "#$'$! G*(! *+(,! *!
(-$1&*+!"+-2#0!1*)$8,'@!)#*)!1,.)*&.$/!*!1"%%-+(1!*./!*!!#"%;%00E/
XA" FZJ/:B/#I!%=/5.%+0$.1$/! F@! G,'7! ,.! BR4! *./! )#$! .$$/! %,'! (,2$)#&.8! 2,'$!
+&8#)G$&8#)H!6R4"!G*(!F0&+)!*(!*!),,+!%,'!8$.$'*)&.8!20+)&2,/*+!
1,U>$'F*+!F$#*>&,'!F*($/!,.!*.*+@Q&.8! )#$! )$T)! ),!F$!(-,7$.!LcME!!
D.+&7$! I*./*+%! *./! BR4H! 6R4"! ,.+@! /$*+)! G&)#! 20+)&2,/*+!
8$.$'*)&,.H! .,)! -$'1$-)&,.E! ! "#$! 2,()! 1,2-'$#$.(&>$!
&2-+$2$.)*)&,.! ,%! 6R4"! $T&()$/! *(! -*')! ,%! )#$! 3-*'7! (@()$2!
9:&80'$!c=H!G#&1#!*0),2*)$/!*>*)*'!F$#*>&,'! &.!*.!,.+&.$!>&')0*+!
$.>&',.2$.)!F*($/!,.!1#*)!2$((*8$(!$T1#*.8$/!F@!&)(!0($'(!LKME!!!
!
F-*2,)/VA/C09-0)/'O'(',./'2(+<'()1/-0/#7',T/3'.)1/+0/
8205(-+0'9/'00+('(-+0/+8/()Y(/)Y5L'0*)1/3>/2.),.A/
5.! )#&(! &2-+$2$.)*)&,.H! $*1#! 1#*)! 2$((*8$! 8,)! *.*+@Q$/! *./!
*..,)*)$/!&.!)$'2(!,%!>*'&,0(!/&(1,0'($!%0.1)&,.(!9:&80'$!K=E!!!
:'&&1(*$.1'&*%(%DE$#,FE'&,%#A%"DE,%"&/(FE;:.6*'31;'
'':&/141;:&'($'-<,%DE-#A%E;'
'''':*.&"8$;:$1,;))8&>$!!:<$1,;:<*.&"8$;'
'''':8-=1.&;))#&2!!:<8-=1.&;:<&'($;:<&/141;'
'':(/141;)
'''':&'($'-<,%DE1+C%E'-#"1%-DE,%"&/(GE;'
'''':145/*3"3'-<,%DE,5"#&%E;'
'''''':(101(1$.1'-<,%DEC+&2#0E'-#"1%-DE$#,H$+(%E;'
'''''''':(101(1$.1'-<,%DE-%I-2#0E'&/2"*%DE,%"&/(JE;'
'''''''''':8-=1.&'+)DE$#,H$+(%E;))(,2$'
'''''''''''':145/*3"3'-<,%DE;/")E;'
'''''''''''''':$1,;))8,+/!!:<$1,;'
'''''''''''':<145/*3"3;))))
'''''''''':<8-=1.&;)
'''''''':<(101(1$.1;:<(101(1$.1;:<145/*3"3;:<&'($;:<(/141;)
:<.6*'31;:<'&&1(*$.1;)
)
F-*2,)/XA/$L)/()Y(/[*-O)/L-</.+<)/*+91\/'2(+<'(-5'99>/
'00+('()1/-0/(),<./+8/1-.5+2,.)/8205(-+0/-0/#7',T/A/
!
"#$!/&(1,0'($!%0.1)&,.(!*..,)*)$/!F@!)#$!&2-',>$/!6R4"!,%!)#$!
3-*'7!(@()$2!*'$!(#,G.!&.!"*F+$!cE!
$'39)/VA/6-.5+2,.)/8205(-+0./-0/(L)/#7',T/FZJ/
&85".%3/"0&'
&'($)@-<,%&H'-#A%.'A%%,'/"'1+C%B'
.6*'31)@-<,%&H'(/"$#0.'%I*0#$#-+/('/"'K2%&-+/(B'
$1,)@0%I+*#0'1+C%((%&&B'
&/141)@0+(A'-/',"%C+/2&'2--%"#(*%B'
(/141'@(%;'*/(-"+L2-+/('-/'*/(C%"&#-+/(B'
145/*3"3)
.8$&(*3&)
(101(1$.1)@-<,%&H'C+&2#0'/"'-%I-2#0B'
"66'3&(*&1)@%0#L/"#-%'!%#-2"%'-5"/215'+002&-"#-+/(B'
!(8'$2"$!)@-<,%&H'"%K2%&-B'
"#$($!%0.1)&,.(!G,0+/!)#$.!F$!2*--$/!&.),!(0--,')&.8!.,.>$'F*+!
F$#*>&,'! ,.! )#$! '$1$&>&.8! $./! 9*11,'/&.8! ),! $T&()&.8! $2-&'&1*+!
/*)*!,.!%*1$U),U%*1$!/&(1,0'($=!%,'!*!%0++!20+)&2,/*+!/$+&>$'@E!
"#$! %0.1)&,.(! G$'$! /'*G.! %',2! )#$! +&)$'*)0'$! ,.! /&(1,0'($! *./!
1,.>$'(*)&,.! *.*+@(&(H! *./! '$-'$($.)! (,2$! ,%! )#$!2,()! 1,22,.!
$+$2$.)(! )#*)! 8&>$! '&($! ),! 1,.>$'(*)&,.*+! .,.>$'F*+! F$#*>&,'E!!
3,2$! ,%! )#$2! *'$! >(-%"#*-+/(#0! &.! .*)0'$! 9(01#! *(! )0'.=! G#&+$!
,)#$'(! *'$!9"/,/&+-+/(#0! 9(01#! *(! 1,.)'*()=E! !S,G$>$'H! (,2$! *'$!
/&%%&10+)! ),! 1+*((&%@! *11,'/&.8! ),! )#$($! 1*)$8,'&$(H! (01#! *(! )#$!
-',1$((!,%!8',0./&.8H!G#&1#!2*@!'$+@!,.!F,)#E!
5.! 6R4"! )#$! %0.1)&,.! *..,)*)&,.(! G$'$! /,.$! 0(&.8! d;<! )*8(!
-+*1$/! /&'$1)+@! G&)#&.! )#$! *..,)*)$/! )$T)E! ! "#$! )$'2! =2(*-+/('
6#"A2,'M#(12#1%' 9:;<=!G*(!0($/! ),!/$(1'&F$! )#$($! )*8(! &.! )#$!
3-*'7!(@()$2!),!1,.)'*()!)#$2!G&)#!)#$!($)!,%!)*8(!0($/!),!/$(1'&F$!
)#$!(0--,')&.8!N%5#C+/"!96;<=E!!"#&(!.*2&.8!,%!)#$!)G,!/&%%$'$.)!
)*8! ($)(! #*(! F$$.!2*&.)*&.$/! &.! )#$! 34564! %'*2$G,'7H! F0)! )#$!
*1)0*+!)*8!()'01)0'$!#*(!$>,+>$/E!
]A" $^H/H_CJ_:B;/FZJ/:B/#!:"!/b$'#*-(! ,.$! ,%! )#$! +*'8$()! /&%%$'$.1$(! F$)G$$.! 6R4"! )*8(! *./!
34564!)*8(!&(!)#*)!)#$!+*))$'!F'$*7(!%'$$!,%!)#$!()'&1)!#&$'*'1#&1*+!
,'/$'&.8! ,%! )*8(! G&)#! )#$! &.)',/01)&,.! ,%! 4<(*5' 9/+(-&! LeME!!
34564!d;<! /$(1'&-)&,.(! 1*.! F$! %+*)H!G&)#! ,'/$'&.8! 1,.()'*&.)(!
-',>&/$/! )#',08#! [email protected]#! *))'&F0)$(E! ! "#&(! *++,G(! -*')&*++@!
,>$'+*--&.8!)*8(E!
Z#&+$!6R4"!1,0+/!2*7$!)#$!*((02-)&,.!)#*)!)#$!&.-0)!G,0+/!F$!
)#$! )$T)! ),!F$! (-,7$.H!*./! )#$'$%,'$! )#*)!:;<!)*8(!1,0+/!(&2-+@!
F$! -+*1$/! *',0./! *--',-'&*)$! )$T)! $+$2$.)(H! )#$! 34564! :;<!
'$-'$($.)*)&,.!1*.!2*7$!.,!(01#!*((02-)&,.E! !"#$!8$.$'*)&,.!,%!
)#$! )$T)! &)($+%! 2*@! .,)! #*>$! ,110''$/! *)! )#$! )&2$! ,%! %0.1)&,.!
/$(1'&-)&,.E!
"#*)! &(!G#@! )#$! :;<! F'$*7U,0)! 8',0-! *)! )#$!B$@7C*>&7! 34564!
G,'7(#,-! &.! J__e! -',-,($/! ),! /&>&/$! :;<! )*8(! &.),! )G,! ($)(E!!
"#$!%&'()!($)!/$%&.$(!1$')*&.!F*(&1!%0.1)&,.*+!,'!($2*.)&1!8(+-&!)#*)!
*'$! *((,1&*)$/! G&)#! )#$! 1,220.&1*)&>$! $>$.)E! ! "#$! ($1,./! ($)!
&.1+0/$(!O,%"#-+/(&!)#*)!$(($.)&*++@!,-$'*)$!,.!-'$>&,0(+@!/$%&.$/!
0.&)(! ),! *--+@! 1$')*&.! %0.1)&,.*+! $%%$1)(! ,.! )#$2E! ! "#$! &.&)&*++@!
-',-,($/!0.&)(!*'$!+&()$/!&.!"*F+$!KE!
$'39)/XA/F-,.(/#!:"!/FZJ/7,+7+.'9K/D0-(./
5*(&"."5*$&)
&'($)
&85".)
51(08(4*&"#1)@&,%%*5'#*-B)
.8$&1$&)@)%-#+0%)',"/,/&+-+/(B)
!
"#$($!0.&)(!*'$!,'/$'$/!#$'$!%',2!)#$!G&/$()!&*/,%!),!)#$!(2*++$()!
&*/,%E!!"#*)!2$*.(!)#*)!)#$!G&/$()!(1,-$/!:;<!2*@!1,.)*&.!,.$!
,'!2,'$!,%!)#$!(2*++$'!(1,-$/!$+$2$.)(E!!"#$!^-$'*)&,.!)*8(!1*.!
*%%$1)! *.@! ,%! )#$($! 0.&)(! 9*./! )#$'$%,'$! )#$&'! (1,-$! G&++! >*'@!
8'$*)+@=E! ! 4! -'$+&2&.*'@! +&()! ,%! )#$! (088$()$/! ^-$'*)&,.! )*8(! &(!
-',>&/$/!&.!"*F+$!eE!
$'39)/]A/F-,.(/#!:"!/FZJ/7,+7+.'9K/C7),'(-+0./
145/*3"3)
.8$&(*3&)
"66'3&(*&"8$)
*001.&)
38."*6)@"%0#-+/(#0'1/#0&B)
.8!$"&"#1)@$%-#:*/1(+-+C%'%P1P')+!!+*20-<'/!',"/*%&&+(1B)
.1(&*"$&7)@,"/)2*%"Q&'*%"-#+(-<'/!'2(+-Q&'-"2-5B)
!
4.! $T*2-+$! ,%! #,G! *.! :;<! F+,17! 1,0+/! F$! 1,.()'01)$/! 0(&.8!
)#$($! )G,! ($)(! ,%! )*8(! G*(! 8&>$.! &.! LfME! ! "#$! $T*2-+$! &(!
'$-',/01$/!#$'$!&.!:&80'$!eE!!4(!)#$!34564!$%%,')!%,10($/!2,'$!
,.!6;<!/0'&.8!)#$!%,++,G&.8!-#*($H!)#&(!$*'+@!:;<!-',-,(*+!#*(!
.,)! F$$.! %+$(#$/! ,0)! (,! %*'H! *./! &(! )#$'$%,'$!>$'@!201#!G,'7! &.!
-',8'$((!
!
:>%)?10"$"$!)'$"&3)@)0"(3&)31&)%;)
:5*(&"."5*$&'+)DE#0+E'"/0%DE&,%#A%"E<;'
:5*(&"."5*$&'+)DE-"#+(%%E'"/0%DE#))"%&&%%E<;'
:&'($'+)DE-2"(FE'&-#"-DE-#A%E'%()DE1+C%E;'
'''''':&85".'+)DE-/,+*FE'-<,%DE(%;E;'
'''''''''''':51(08(4*&"#1'+)DE,%"!/"$FE'-<,%DE%(K2+"<E;'
'''''''''''''''''':.8$&1$&;1/#0'-"#+(%%'R'5%"%:<.8$&1$&;'
'''''''''''':<51(08(4*&"#1;)
'''''':<&85".;)
:<&'($;)
:>%)A51(*&"$!)8$)'$"&3)@)31.8$2)31&)%;)
:145/*3"3'-<,%DE(%;ES,%"!/"$FH5%"%:<145/*3"3;'
:*001.&'-<,%DE!%#"ES,%"!/"$FH1/#0:<38."*6;'
:38."*6'-<,%DE$#+(-#+(T)+&-#(*%ES-"#+(%%:<38."*6;'
F-*2,)/]A/!0/)Y'<79)/+8/'0/FZJ/1).5,-7(-+0/(L'(/<-*L(/,).29(/
-0/9)'0-0*/'?'>/'01/.7)'T-0*/[`L'(/',)/>+2/1+-0*/L),)a\/
bA" &CB&JD6:B;/%HZ!%=#/<,,7&.8! F*17! ,>$'! )#$($! '$+*)$/! -',C$1)(H! ,.$! )#&.8! &(! ()'&7&.8N!
"#$!%&'()! (@()$2(!%,10($/!,.! )#$!$(($.)&*+!2$1#*.&(2!,'!-',1$((!
,%! 2*&.)*&.&.8! '$*+U)&2$! /&*+,80$H! G#&+$! )#$! +*)$'! ,.$(! ()*')!
+,,7&.8! &.),! )#$! -'$($.)*)&,.! ,%! 1,.)$.)E! ! "#$'$! ($$2(! ),! F$! *!
'$+*)&>$+@! 8,,/! *8'$$2$.)! *F,0)! )#$! -',1$((! ,'! &.)$'*1)&,.*+!
%0.1)&,.(H! (,! -$'#*-(! )#&(! &(! *! 8,,/! -+*1$! ),! ()*')!G&)#! *! (#*'$/!
(-$1&%&1*)&,.E!!5)!&(!&2-,')*.)!)#*)!&.)$'*1)&,.*+!%0.1)&,.(!1,.)&.0$!
),!F$!%&'()U1+*((!1&)&Q$.(!&.!*.@!:;<!'$-'$($.)*)&,.!*./!)#*)!)#$@!
.,)! ,.+@! F$! 0($%0+! %,'! 8$.$'*)&,.! ,%! F$#*>&,'! F0)! *+(,! %,'!
&.)$'-'$)*)&,.!,%!F$#*>&,'E!
cA" %HFH%HB&H#/L]M& "#g'&((,.H!OE!BEH!]hhfE!?,220.&1*)&>$!S02*.,&/(N!4!
?,2-0)*)&,.*+!;,/$+!,%!b(@1#,(,1&*+!A&*+,80$!37&++(E!!
b#EAE!"#$(&(H!;*((*1#0($))(!5.()&)0)$!,%!"$1#.,+,8@H!;4E!
LJM& ?*(($++H!iEH!j&+#Ck+2((,.H!SEH!?#*.8H!OEH!6&172,'$H!"EH!?*2-F$++H!<E!*./!\*.H!SEH!]hhhE!B$[0&'$2$.)(!%,'!*.!
4'1#&)$1)0'$!%,'!R2F,/&$/!?,.>$'(*)&,.*+!?#*'*1)$'(E!5.!
?,2-0)$'!4.&2*)&,.!*./!3&20+*)&,.!lhh!9R0',8'*-#&1(!
3$'&$(=E!j&$..*H!40()'&*N!3-'&.8$'!j$'+*8E!
LcM& ?*(($++H!iEH!j&+#Ck+2((,.H!SEH!*./!6&172,'$H!J__]E!6R4"N!)#$!6$#*>&,'!RT-'$((&,.!4.&2*)&,.!",,+7&)E!5.!b',1$$/&.8(!
,%!4?;!35IIB4bSH!<,(!4.8$+$(H!408E!]JU]mH!-EKmmUKnfE!
LKM& j&+#Ck+2((,.H!SEH!J__eE!4082$.)&.8!^.+&.$!?,.>$'(*)&,.!)#',08#!40),2*)$/!A&(1,0'($!"*88&.8E!5.!b',1$$/&.8(!,%!
"#$!f)#!4..0*+!;&.&)'*17!,.!b$'(&()$.)!?,.>$'(*)&,.!*)!)#$!
cn)#!S*G*&&!5.)$'.*)&,.*+!?,.%$'$.1$!,.!3@()$2!31&$.1$(H!
i*.E!cUfH!J__eH!S&+),.!Z*&7,+,*!j&++*8$H!6&8!5(+*./H!S*G*&&H!
5RRRH!J__e!
LeM& O,--H!3EH!O'$..H!6EH!;*'($++*H!3EH!;*'(#*++H!4EH!b$+*1#*0/H!?EH!b&'7$'H!SEH!"#g'&((,.H!OEH!j&+#Ck+2((,.H!SEH!J__fE!
",G*'/(!*!?,22,.!:'*2$G,'7!%,'!;0+)&2,/*+!I$.$'*)&,.!
&.!R?4(N!"#$!6$#*>&,'!;*'70-!<*.80*8$E!!5.!b',1$$/&.8(!
,%!)#$!f)#!5.)$'.*)&,.*+!?,.%$'$.1$!,.!5.)$++&8$.)!j&')0*+!
48$.)(H!408E!J]UJcH!;*'&.*!/$+!B$@H!?4!
LfM& j&+#Ck+2((,.H!SE!*./!;*'($++*H!3EH!J__eE!3,1&*+!b$'%,'2*.1$!:'*2$G,'7E!5.!b',1$$/&.8(!,%!)#$!Z,'7(#,-!,.!;,/0+*'!
?,.()'01)&,.!,%!S02*.U<&7$!5.)$++&8$.1$!*)!)#$!J_)#!
o*)&,.*+!4445!?,.%$'$.1$!,.!4')&%&1&*+!5.)$++&8$.1$H!i0+@!
h)#H!b&))(F0'8#H!b4!