Analysing emotional interaction through avatars. A generic architecture

22
Analysing emotional interaction through avatars. A generic architecture Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge Posada, Maria Teresa Linaza, Nestor Garay-Vitoria Edutainment and Graphical Conversational Interfaces Department. VICOMTech Research Center, Paseo Mikeletegi 57, 20009 Donostia-San Sebastian, Spain aortiz AT vicomtech.org Universidad de la UPV/EHU Manuel Lardizabal 1 E-20018 Donostia-San Sebastian, Spain nestor.garay AT ehu.es Abstract This study proposes the use of avatars to generate non- verbal communication in a multimodal interactive system, mainly based on the affective or emotional aspects of this type of com- munication. The objective of this study is the use of avatars as an element for generating non-verbal communication, without taking a specific application in account, but rather analysing all their in- teraction possibilities. Therefore, all the aspects related with the emotional multimodal interaction through avatars have been anal- ysed as (i) the main advantages and disadvantages of integrating a multimodal interactive system, (ii) the elements comprising the interactive system (the user, the avatar and the application) and (iii) their characteristics, (iv) the interaction levels that may oc- cur among the elements and (v) the interaction types that have to be taken into account when generating an emotional multi- modal interaction through avatars. Once all this information has been analysed, a general architecture is proposed that includes all these types of interaction that can occur in a system which includes avatars. This architecture can serve to formalise the de- velopment of emotional applications with and without avatars. 1 Introduction The human computer interaction is a highly active multi-disciplinary sci- entific area, where many years have been spent working on ensuring that the interaction between the user and the machine is more natural and intuitive. This work is very important, as the success or failure of an interactive system depends on the user interface to a great extent. Cur- rently, advances in technologies, such as machine vision, 3D graphics or Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge Posada, Maria Teresa Linaza, Nestor Garay-Vitoria “Analysing emotional interaction through avatars. A generic architecture” Vol. I No. 5 (Mar. 2009). ISSN: 1697-9613 (print) - 1887-3022 (online). www.eminds.hci-rg.com

Transcript of Analysing emotional interaction through avatars. A generic architecture

Analysing emotional interaction through

avatars.

A generic architecture

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge Posada,Maria Teresa Linaza, Nestor Garay-Vitoria

Edutainment and Graphical Conversational Interfaces Department.VICOMTech Research Center, Paseo Mikeletegi 57, 20009 Donostia-San

Sebastian, Spainaortiz AT vicomtech.org

Universidad de la UPV/EHUManuel Lardizabal 1 E-20018 Donostia-San Sebastian, Spain

nestor.garay AT ehu.es

Abstract This study proposes the use of avatars to generate non-verbal communication in a multimodal interactive system, mainlybased on the affective or emotional aspects of this type of com-munication. The objective of this study is the use of avatars as anelement for generating non-verbal communication, without takinga specific application in account, but rather analysing all their in-teraction possibilities. Therefore, all the aspects related with theemotional multimodal interaction through avatars have been anal-ysed as (i) the main advantages and disadvantages of integratinga multimodal interactive system, (ii) the elements comprising theinteractive system (the user, the avatar and the application) and(iii) their characteristics, (iv) the interaction levels that may oc-cur among the elements and (v) the interaction types that haveto be taken into account when generating an emotional multi-modal interaction through avatars. Once all this information hasbeen analysed, a general architecture is proposed that includesall these types of interaction that can occur in a system whichincludes avatars. This architecture can serve to formalise the de-velopment of emotional applications with and without avatars.

1 Introduction

The human computer interaction is a highly active multi-disciplinary sci-entific area, where many years have been spent working on ensuring thatthe interaction between the user and the machine is more natural andintuitive. This work is very important, as the success or failure of aninteractive system depends on the user interface to a great extent. Cur-rently, advances in technologies, such as machine vision, 3D graphics or

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge Posada, Maria TeresaLinaza, Nestor Garay-Vitoria “Analysing emotional interaction through avatars.

A generic architecture” Vol. I No. 5 (Mar. 2009). ISSN: 1697-9613 (print) -1887-3022 (online). www.eminds.hci-rg.com

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

voice technologies are enabling a new approach to interaction known asthe multimodal interaction, which allows a more natural interaction style.Multimodal interaction provides the user with multiple modes of inter-facing with a system, e.g. images, sounds, voice, texts, etc. Within themultimodal interaction field, the most widespread type of communica-tion is verbal. However, when designing a multimodal interactive system,the fact that people can communicate in two ways has to be taken intoaccount: by means of verbal communication, which allows us to commu-nicate what we are thinking using words, and by means of non-verbalcommunication, which for example allows us to express our emotions.

Apart from the verbal communication languages (written or oral lan-guage), considered by the majority of architectures existing in the state ofthe art, the multimodal system must interpret and generate the languagesrelating to non-verbal communication (iconic or corporal language) to ob-tain a natural multimodal interaction, so that the user can communicatewith the system using those language and obtain a response in the sameform. In addition, non-verbal communication must be present in a mul-timodal interactive system as it greatly contributes to the message to betransmitted. In fact, according to the Mehrabian study [26], 93% of themessage to be transmitted using body language, basically consisting offacial expressions and body movements (55 %) and the use of the voice(38 %), can even change the meaning of the oral language. In non-verbalcommunication, information is mainly received and interpreted by twochannels, the issuer, consisting of face, eyes, body and voice, and thereceptor, consisting of sight, hearing, smell, taste and touch. One of theways of simulating the issuer in the non-verbal communication in an inter-active system may be based on the used or virtual characters or avatars.An avatar is a virtual character that consists of a physical and a mentalpart. The physical part can be in different forms. It may be 2D or 3D,anthropomorphic or cartoon, and it may be a combination of face, eyes,body or voice. The mental part may be intelligent or not. If the avatar isnot intelligent, it simply visualises the orders that reach it using its face,its eyes, its body or its voice. If the avatar is intelligent, it is capable ofdecision taking. The components required for the avatar to take decisionsare usually called agents.

This study proposes the use of avatars to generate non-verbal commu-nication in a multimodal interactive system, mainly based on the affectiveor emotional aspects of this type of communication. We think that usingthe avatar in the interface we can give face, body and voice to the systemand as Picard said [37]: “Voice inflection, facial expression, and postureare the physical means by which an emotional state is typically expressed,and are the primary means of communicating human emotion”. Section2 analyses the pros and cons of using avatars in emotional interaction andall the related aspects are analysed in Section 3. All this information isincluded in a matrix in Section 4. Based on this matrix, Section 5 setsout a proposal for a generic architecture of a system that allows emotional

58

Analysing emotional interaction through avatars.

A generic architecture

multimodal interaction through avatars. Section 6 shows several applica-tions and their relation with this architecture. The conclusions of thisstudy are set out in Section 7.

2 Emotional multimodal interaction through

avatars

Integrating avatars in different applications has been a highly researchedfield in recent years. Several authors have performed evaluation involvingreal users to define the main advantages of interaction through avatars.Some of the key advantages include:

– It facilitates social interaction with the machine. In 1994, Nass etal. [29] performed five experiments that revealed that the individualinteractions of computer users are fundamentally social. Recently,Prendinger et al. in [38] also included that the user hopes to obtainthe same type of social behaviour. Therefore, they proposed to givethe interface with personality aspects and voice synthesis to improvethe human machine interaction.

– The user then considers the system to be more reliable and credible.A user needs to believe in an agent’s reliability in order to have theconfidence to delegate certain tasks to it. There are evaluationsthat demonstrate that confidence and credibility increase with thepersonification of the agent, in other words, by giving it a face,eyes, body or voice. If the aspect of the character is also realistic,the agent is seen to be more intelligent and friendly [19].

– The commitment of the user increases. Personifying the agents in-creases the user’s commitment to the application. In learning envi-ronments, for example, personifying the virtual tutor has a positiveimpact on the perception of the students, particularly, if there arealso emotional responses [18].

– It catches the attention of the user. Hongpaisanwiwat et al. [16]concluded that the avatar is capable of catching the user’s attentionand that this increases if the avatar is credible, as it generates theillusion of life in the system [39].

– It focuses the user’s attention. An avatar can be used to focus theuser’s attention to points of interest, which is of great importancefor learning environments [38].

In spite of these advantages, there is still great controversy aboutwhether the best way to interact with the ambient is through mimick-ing human communication using virtual characters. A lot of authors havetried to answer this question through real users evaluations. For example,

59

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

Koda and Maes [19] realized a quantitative analysis of subjects’ impres-sions about a personified interface. They concluded that having faces andfacial expressions is considered likable and engaging and that they alsorequire more attention from the user. Walker et al. [50] also argued thatthe virtual character occupies the users’ attention but in a negative way.They compared subjects who answered questions presented via a text dis-play on a screen, with subjects who answered the same questions spokenby a talking face. The subjects who responded to questions presented bytext spent more time on the task, made fewer mistakes, and wrote morecomments. From these results they concluded that adding human char-acteristics could make the experience for users worse rather than better,because it would require more effort and lower performance.

Having into account the analysed state of the art, this study is based onusing avatars in multimedia interactive systems as elements of non-verbalcommunication transmitters through its face, eyes, body and voice. Eventhough the technologies relating to the field of virtual characters such asbody animation and voice synthesis have been widely researched [12] [2],their use as a necessary element in multimodal interaction has been barelystudied. In the majority of the applications that allow interaction withavatars, these have been integrated as an independent module, whichprovides the user with audio and visual information about the system.This aspect was studied by Not et al. [30] who concluded that the avatarshould not be considered as an indivisible modality, but rather a synergiccontribution of different communication channels which, when correctlysynchronized, generate global communication. Metaphorically, his groupinterprets an avatar as an output multimodal interface and they studyall its capacities in order to generate a marker language that allows theavatars to be easily integrated into any application.

3 Analysis of each element

The objective of this study is to design a generic architecture that includesthe necessary interaction aspects in a multimodal system that integratesavatars. In the same way as in the work of Not et al. [30], this architecturealso considers the avatar as a multimodal interface, even though in thiscase the proposed architecture has three elements that can interact at thesame level: the user (body and mind), the avatar (virtual body and intel-ligent agent) and the application (user interface and logic of application).Each of these elements may exist in the system in the following form:

– One or several elements of the same type can exist, depending onthe number of users, avatars or applications in the system.

– Each type of element has an interface that can be or not be multi-modal.

60

Analysing emotional interaction through avatars.

A generic architecture

– Each one of the elements of the system has its own characteristicsthat define its capacities.

In order to design the general architecture that includes all the in-teraction aspects, it is first of all deemed to be necessary to study thedefining elements of the system, their characteristics, the parallelisms ex-isting between each element and the levels of interaction that can occurin the system.

3.1 Characteristics of each element

When designing the architecture of the multimodal interactive system,the influence of the form and functionalities of each of the three elementshas to be taken into account. The characteristics and capacities of eachelement needs to be known in detail in order to design a generic architec-ture. However, given that the characteristics of the user, the avatar andthe application are very diverse, it is very difficult to perform a completeclassification. Given that this study is focused on multimodal interaction,a classification was performed according to the type of communicationthat each element can generate or interpret. This analysis detected thateach element can be active or passive. On the one hand, an active elementis one that can directly interact with the system, either as it has an innatecapacity to do so or as the system provides the necessary tools for it todo so. On the other hand, the active element is capable of recognisingthe events of the interaction, interpreting them and decision taking. Inaddition, the active element is capable of making itself known to the sys-tem through some of its non-verbal or verbal communication capacities.The passive element does not directly interact with the system, but ratherdelegates in the system or acts as an intermediary by executing orders ofan active element.

3.2 Interaction levels between each element

The majority of the architectures studied focus their research on the in-teraction between two elements, generally the user and the avatar [45],[27] or [13]. However, in a multimodal interactive system, this studydefines three types of elements that interact and which should be syn-chronized: the user, the avatar and the application. For the architectureconsidered in this paper, the following interactions are defined: User-User, User-Application, User-Avatar, Avatar-Avatar, Avatar-Application,Application-Application. Depending on their characteristics, two levelsare established for each element:

1. Direct interaction. It is establish when two elements directly actand/or react. This interaction can be performed at two levels. Level1 is established when one of the elements makes a request and the

61

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

element with which it interacts reacts, but it is not capable of in-teracting with the first element. On the other hand, level 2 is es-tablished when both elements act and react. At this level, nothingis pre-recorded and it can therefore only occur between two activeelements.

2. Indirect interaction. One of the elements acts as intermediary, inother words, it functions as a transmission tool of the message to athird element with which it desires to interact.

3.3 Roles of each element

The roles of each element are defined according to their characteristicsand the levels of interaction between them.

1. Avatar. The avatar can play four different roles depending on itscharacteristics and the characteristics of the element with which it isinteracting: interactive (Int), autonomous (Atm), user puppet (UP)or author puppet (AP). The interactive avatar is capable of inter-acting with the user, with the application and with other avatars, byinterpreting each event produced in the interaction and generatingthe relevant response. The avatar can only play the interactive rolewhen it is an active element and the interaction is level 2 direct orindirect. However, an autonomous avatar is an active avatar thatis completely autonomous in the decision taking that interacts withusers or level 1 direct passive applications. The user puppet avataris the one that executes the orders that the user gives, but does nothave the decision taking capacity as it is a passive avatar. This roleoccurs when the interaction is indirect or when it is level 1 directand the role of the user with which it is interacting is that of theauthor. Finally, the author’s puppet avatar is the one that executesthe orders defined by the author of the application. It does not havea decision taking capacity, as it is a passive avatar. This role occurswhen the interaction is indirect or when it is level 1 direct and therole of the user with which it is interacting is that of spectator.

2. User. The user can play three types of roles: interactive (Int), au-thor (Auth) or spectator (Spec). On the one hand, the interactiveuser plays this role when it is capable of taking decisions and the in-teractive system has the necessary tools to interact with the avataror with the application. The user may play this role in the Level 2 di-rect interactions. On the other hand, the author user is the designerof the interactive system and which predefines the behaviour of theavatar and of the application. The author is an active user thatinteracts with the avatar and with the application though a level 1direct interaction. Finally, the spectator user does not interact withother elements, but simply observes what the system designer haspredefined or the reactions of the autonomous elements.

62

Analysing emotional interaction through avatars.

A generic architecture

3. Application. The application can play three types of roles: inter-active (Int), autonomous (Atm), or author puppet (AP). The be-haviour of each role is similar to that of the avatar.

4 Types of emotional interaction

The types of emotional interaction that exist in the system should coverthe need to interpret and generate emotions for each interaction level.With respect to the interpretation, the user has its own perception whilethe avatar and application must have the computer tools that allow themto satisfy this interpretation (such as gesture or voice recognizers). How-ever, when generating emotions, the characteristics of each element (activeor passive) are going to imply the use of the following types of interaction,defined in this research as:

– Static emotional interaction (SEI) is defined as the type of interac-tion needed to generate interaction that involves passive elements.The passive element receive and executes pre-recorded actions, pre-viously defined by the application designer or recorded by the activeelement at the time of the interaction.

– Dynamic emotional interaction (DEI) is understood to be gener-ated as the result of the decision taking process of the interactivesystem. In this case, there is no pre-recorded emotion, but ratherall the emotional responses are generated as the result of a processto evaluate the interaction events produced in the system.

– Human emotional interaction (HEI) is understood to be generatedas the result of the decision taking process of the user.

This last interaction type is generated in the mind of the user, whorecognises, interprets and evaluates the received events, to generate theemotional response. However, the system will need tools in order to beable to generate dynamic and static emotional interaction of the avatarand application elements. The state of the art of the tools required forboth types of interaction are detailed below:

4.1 Static emotional interaction

Static emotional interaction is generated as a pre-defined reaction in thesystem. The fact of having to predefine the emotional behaviour impliesthat the system has tools to define this behaviour. This need has beenresearched by other authors who concluded that one of the most efficientways of providing the system with static emotional qualities is by labelling,in other words, labels that will define the facial gestures, emotions orintensities (in the case of the avatar) that will have to be reproduced at aspecific moment, in addition to the text that the system has to reproduce.

63

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

The need to label the behaviour led to an analysis of the main existinglanguages that was published in [6]. As can be seen in this analysis,the most comprehensive marker languages were VHML [25] and RRL[43], as they include all the parameters relating to emotion, body andfacial animation, and the dialogue labelling. In this paper, VHML hasbeen chosen as it is a standard that allows all aspects of static emotionalinteraction to be defined.

4.2 Dynamic emotional interaction

For an interactive system to be capable of generating emotions, it willneed to be provided with a module that is automatically capable of gen-erating emotions. The emotions will be the result of a cognitive processthat evaluates the events in the environment. There are several emotionalmodels on which this cognitive perspective have been based, such as thosedeveloped by Aaron Sloman [46], Lazarus [20], Ortony, Clore y Colins [35]or Roseman [41]. Within this set, the Ortony, Clore and Colins (OCC)and the Roseman models were explicitly designed to be integrated in com-puters and offer a rule-based mechanism to generate cognitive emotions.In fact, many authors have developed computational systems based onemotional models. For example, the Roseman model was implementedby Velasquez in Cathexis [47]. The OCC model has been extensively im-plemented. Special mention should be made of the models developed byElliot [10] and by Bates [4]. As limitations were found in both models,other authors have combined them, such as El-Nars [9] o Buy [5]. In histhesis [3], Bartneck states that the reason why the majority of projectsin this area choose the OCC model is that is the only one that offers astructure of variables that influence the intensity of the emotion.

In a comparison between both models, significant differences have beenfound that make the use of each of them mainly depends on the level ofinteraction that is desired to obtain in the system. For example, for thelevel 2 direct interactions, it was concluded that the OCC model takes intoaccount the standard models and preferences of the user in its evaluationprocess, while Roseman only evaluates according to objectives and thismeans that some emotions relating to attitudes or standards (taste/dis-taste or anger) cannot be specifically defined. However, there are manyother cases within the set of level 1 direct or indirect interactions wherethe user of the model cannot be established. Therefore, the Roseman im-plementation may be more appropriates for those cases. In general, in thispaper, we have opted for the Roseman model for the following reasons.On the one hand, Table 3 shows that are a greater number of cases whichare better met by the Roseman model. In addition, this model considerssurprise emotion within its 17 emotions and this emotion is very impor-tant as Ekman [8] considers it as one of the six universal basic emotions.However, the OCC model does not contemplate it. Another motive foropting for the Roseman model is that, as Barneck concluded, a log func-

64

Analysing emotional interaction through avatars.

A generic architecture

tion needs to be stored which will help to assess the probability, fulfilmentand effort of each event to categorize the emotions, an element that is notcontemplated in the OCC model.

4.3 Emotional interaction matrix

Table 3 sets out and visualises all the aspects analysed in previous sections(characteristics of each element, interaction levels, roles of each elementand types of interaction) to satisfy the emotional interaction needs andvisualise them as a symmetric matrix, including the interactions that canexist in the system and the form in which they produce. In addition, someexamples can be observed of published applications that fit well into thematrix.

As the reader can see in Table 2, each cell shows the characteristicsof the interaction produce between the element situated in the left cell ofthis row and the element situated in the superior cell of this column. Ineach cell the reader can see, first of all, the role that the element plays inthe each interaction. If the role is preceded by the symbol �, the elementrefereed is the one in the row and if the role is preceded by the symbol�, the element refereed is the one in the column. When X appears, thatmeans there is no interaction between both elements. Then, the readercan see the level of interaction between both elements (Direct level 1 or2, or Indirect). Last, the reader can see the kind of emotional interactionthat the element will need.

In each cell, the reader can see some examples of research projects orcomercial applicactions that implements the kind of interaction which isproduce between the element situated in the left cell of this row and theelement situated in the superior cell of this column.

Elem.1

Role Elem.2Role Elem.1

Elem.2 Interaction levelKind of interaction Elem.2Kind of interaction Elem.1

Table 1: Information in each cell of the matrix.

Role can be user puppet (UP), interactive (Int), author (Auth), spectator (Spec),autonomous (Atm) or author puppet (AP).

Interaction level can be Indirect or Direct (Level1 or Level2)

Kind of interaction can be Static emotional interaction (SEI),Dynamic emotional interaction (DEI),Human emotional interaction (HEI) )

Table 2: Information of abbreviations.

65

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

Active Passive Active Passive Active Passive

Avatar Avatar User User Application Application� Int � Int � Int � Auth � Int � Auth� Int � UP � Int � Spec � Int � AP

Active Direct L2 Indirect Direct L2 No Int Direct L2 Direct L1Avatar � DEI � DEI � DEI X � DEI � DEI

� DEI � SEI � HEI X � DEI � SEI[34]

� UP � UP � UP � AP � UP � AP� Int � UP � Auth � Spec � Interactiva � AP

Passive Indirect Indirect Direct L1 No Int Indirect IndirectAvatar � SEI � SEI � SEI � SEI � SEI � SEI

� DEI � SEI � HEI � HEI � DEI � SEI[28] [11] [17][36] [44]

� Int � Aut/Int X X � Interactive � Aut/Int� Int � AP/user X X � Int � AP

Active Direct L2 Direct L1 No HCI No HCI Direct L2 Direct L1User � HEI � HEI X X � HEI � HEI

� DEI � SEI X X � DEI � SEI[33] [6][33] [21]

� Spec � Spec X X � Atm � Spec� Aut � AP X X � Spec � AP

Passive No Int No Int No HCI No HCI No Int No IntUser X � HEI X X X X

X � SEI X X X X[51] [1]

� Int � Int � Int � Atm � Int � Atm� Int � UP � Int � Spec � Int � AP

Active Direct L2 Indirect Direct L2 No Int Direct L2 Direct L1Application � DEI � DEI � DEI X � DEI � DEI

� DEI � SEI � HEI X � DEI � SEI[49] [22] [24] [7]

� AP � AP � AP � AP � AP X� Aut � AP � Aut/Int � Spec � Atm X

Passive Direct L1 Indirect Direct L1 No Int Direct L1 No IntApplication � SEI � SEI � SEI X � SEI X

� DEI � SEI � HEI X � DEI X[32] [23] [31] [7]

Table 3: Emotional interaction matrix.

5 Generic architecture

A lot of authors have been working in order to achieve emotional inter-action between the user and the machine. There are some importantarchitectures such as Oz project [40], Affective Reasoner [10], Cathexis[48], FLAME [9], Emile [14], Greta [42], ParleE [5] or the intelligent tutoragent [15]. All of them developed a very interesting architectures. How-ever, the objective of this research is the use of avatars as an element forgenerating non-verbal communication, without taking a specific applica-tion into account, but rather analysing all their interaction possibilities, ageneric architecture needs to be designed that allows emotional interactionto be generated in all its forms. We found in the existing architectures:

– Most of the architectures focus their work in the interaction betweentwo elements, mainly actives. They do not have into account thepossible interactions between pasive elements.

– These architectures focus their research between two types of ele-ments, mainly between the user and the avatar, without taking into

66

Analysing emotional interaction through avatars.

A generic architecture

account that the application is a further interaction element.

– Most of the architectures give to the system the capacity of gen-erating dynamic emotional interaction and they do not have intoaccount other kinds of interaction.

– In the majority of the architectures found, each element is not con-sidered to have its physical part and its mental part, and both partsmust actively act in the interaction.

– These architectures could be considered conceptually not to rep-resent the interaction flow produced in the multimodal system, inother words, the element that generates the interactions, the mod-ules involved in the process and the element that generates the re-sponse to that interaction. In addition, none of them is sufficientlygeneric as to allow new elements to be integrated into a system orgenerate more types of interaction.

As previously mentioned, there are three types of elements in the in-teraction with avatars. The type of communication that is established inthe system is a combination of the interaction levels produced betweeneach of them. In order for the integration of the avatar in the applicationto be natural and the system response to be coherent, all the interfacesmaking up the system have to be correctly connected and synchronized.

The proposed architecture is set out in Figure 5. Each interaction pro-duced in the system is directly carried out between each element. The in-teractions produced between passive elements are represented with broken

Figure 1: The proposed general architecture.

67

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

lines, while the interactions produced by active elements are representedwith continuous lines. The three elements involved ”live” in the systemwith their ”physical” part and the ”mental” part. The physical part ofthe user is his body and that of the avatar or that of the application isits visual part. Each physical part is represented in the architecture withthe name of each element (avatar, user or application), followed by theircharacteristics (active or passive).

With respect to the ”mental” part, each element contain the necessarymodules to recognise and interpret the input data, process it and generatethe relevant response.

In the case of the user, the ”mental” part consists of its decision takingand interpretation modules, whose capacities depend on the characteris-tics of the user. The other elements of the system (avatar and application)communicate with the user through their physical part (visualisation),which it recognises through sight (physical part) and interprets throughhis perception. Once he has recognized and interpreted what the systemwishes to communicate (whether through the avatar or the application),the user takes a decision and communicates it to the system through its”physical” part.

In order to understand the data put into the system, each element hasa recognition and interpretation module, capable of recognising and inter-preting the interaction events from non-verbal or verbal communication.

Once the system has recognised and interpreted the request of the user,from another avatar or another application, the ”mental” part starts towork. Both the active avatar and the active application have an intelli-gent module, the agent and the logics of the application, which knows thenature of the system and knows how to react to that request. In addition,each active element has an associated module capable of performing theemotional decision taking process. In the case of the active avatar andthe active application, the Emotional Response Generator shared modulegenerates the coherent emotional response to the user. On the other hand,with respect to the ”mental” part of the passive elements, the avatar andapplication both have an associated module containing pre-recorded emo-tional events and rules. All the interactions produced between a passiveelement and an active element, whether through the static emotional in-teraction or dynamic emotional interaction, need recognizers as the singleway capable of interpreting language, whether computational or human,and of indicating the meaning of the produced interaction event to the in-telligent modules (decision taking, application logics or agent) It can alsobe noted in the architecture that the interaction between agent and thelogics of the application is performed by means of the dynamic emotionalinteraction modules, even though a emotional response generator moduleis included. The logics and the agent decide what must occur in the sys-tem. They communicate it to the generator, which sends the contents tobe visualised to the physical parts of the avatar and of the environment.Any interaction produced in the system is represented as a closed circuit,

68

Analysing emotional interaction through avatars.

A generic architecture

as all the interactions produces some type of response.

6 Application Examples

In this section, we present two applications that serve to prove the general-ity of the architecture detailed in the previous section. These applications,known as IGARTUBITI and ELEIN, are capable of producing differentemotional interaction types through avatars that are shown using the in-teraction matrix.

6.1 The IGARTUBEITI system

This system is aimed at creating a Virtual Reality system that enablesa story to be experienced through digital accounts. It is a virtual jour-ney (Figure 2) where the user uses a joystick to make his way through a16th century farmhouse. Throughout the virtual journey, the user comesacross several information points and, when he approaches them, a vir-tual inhabitant of the farmhouse who explains to the user the history ofthe place in question. In addition to the avatar’s explanation, at some ofthese points of interest, animations take place in the virtual environmentthat depict the way that people lived at that time. Throughout the ex-planation, the joystick is deactivated, in such a way that user must listento the full explanation before returning on his journey. Table 4 shows thedifferent elements involved in the IGARTUBEITI system, together withtheir roles, their levels and their types of interaction. In this system, thethree elements, user, avatar and application, are involved in their passiveform. In addition, the user is observed to actively participate, as the verynature of the IGARTUBITI system means that a single user plays twodifferent roles. Initially, when the user is navigating through the virtualenvironment, he is an active user with the interactive role, as he is ca-pable of decision taking and the system provides him with the necessarytools (a joystick) to interact with the application. However, when theuser reaches an information point, the joystick disconnects and the userbecomes a passive user with a spectator role, as he simply observes whatthe system designer has previously designed using the SEI tools.

Figure 3 sets out the components of the generic architecture that areinvolved in IGARTUBEITI system. When the active user makes a requestto the application with the joystick, the Computational Interpretation andRecognition module searches the behaviour to be visualised, both by theavatar and by the application, in the databases that contain the behaviourthat the designer has previously defined. The user does not interactivewith the avatar, but rather places the request to the application. TheComputational Interpretation and Recognition module manages the in-direct interaction between the passive avatar and the passive application(broken blue bi-directional arrows).

69

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

PA AU PU PApp

� AP � AP� Spec � AP

PA - - No Int Indirect� SEI � SEI� HEI � SEI

X X � Int

X X � AP

AU - No HCI No HCI Direct L1

X X � HEI

X X � SEI

� Spec X X � Spec� AP X Spec � AP

PU No Int No HCI No HCI No Int� HEI X X � HEI� SEI X X � SEI

� AP � AP � AP X� AP � Int � Spec X

PApp Indirect Direct L1 No Int No Int� SEI � SEI X X� SEI � HEI X X

Table 4: Emotional interaction matrix used in IGARTUBEITI system.

6.2 The ELEIN system

The aim of the ELEIN system to achieve emotional multimodal interac-tion and communication in e-Learning environments, which allows edu-cational contents to be expressed in a new communication language onthe website. The main interaction element between the student and theenvironment consists of a User Conversational Interface, in the form ofa Three Dimensional Educational Agent, with the speaking synthesis ca-

Figure 2: Screenshots of the IGARTUBEITI system.

70

Analysing emotional interaction through avatars.

A generic architecture

Figure 3: IGARTUBEITI system components identified within the genericarchitecture.

pacity in real time, fully integrated in the contents of the courses. Thedeveloped system is known as ELEIN (E-LEarning as 3D INteractive ed-ucational agents ). Any course is usually divided into two main parts, thepart where the concepts are explained (which requires SEI) and the as-sessment part by means of questions or exams (which requires DEI). Theavatar and user elements in their two forms (active and passive) are usedin the ELEIN system. The application in this case only plays the passiverole, as it is the tutor agent that takes the DEI-related decisions. Theelements together with its roles, interaction levels and interaction typescan be seen in Table 5.

Figure 3 sets out the components of the generic architecture that areinvolved in ELEIN system. The figure is divided into two zones. Theyellow zone indicates the modules exclusively involved in the concept ex-planation phase. The passive avatar there acts as the author puppet roleand the user is a mere spectator. All the behaviour is pre-recorded inthis phase. However, the modules involved in the evaluation phases arethose that are in the image, in other words, the yellow and white zone.The agent module is the tutor agent, entrusted with generating the dy-namic emotional interaction with the active user during the evaluationphase. This module is communicated with the emotional response gener-

71

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

Figure 4: Screenshot from e-Learning course of ELEIN system.

AA PA AU PU PApp� Int � Auth � Auth� Int � Spec � AP

AA - - Direct L2 No Int Direct L1� DEI X � DEI� HEI X � SEI� UP � AP � AP

� Auth � Spec � APPA - - Direct L1 No Int Indirect

� SEI � SEI � SEI� HEI � HEI � SEI

� Int � Aut/Int X X � Aut/Int� Int � AP/user X X � AP

AU Direct L2 Direct L1 No HCI No HCI Direct L1� HEI � HEI X X � HEI� DEI � SEI X X � SEI� Spec � Spec X X � Spec� Aut � AP X X � AP

PU No Int No Int No HCI No HCI No IntX � HEI X X XX � SEI X X X

� AP � AP � AP � AP X� Aut � AP � Aut/Int � Spec X

PApp Direct L1 Indirect Direct L1 No Int No Int� SEI � SEI � SEI X X� DEI � SEI � HEI X X

Table 5: Emotional interaction matrix of ELEIN system.

ator in order to predict the emotion of the user and take a relevant action.Some of the agent’s actions can be pre-recorded (for example, explaininga previous concept again), and for that reason, the passive avatar and itspre-recorded behaviour is involved in this phrase.

72

Analysing emotional interaction through avatars.

A generic architecture

Figure 5: ELEIN system components identified within the generic archi-tecture.

7 Conclusions

The main objective of this research is to seek a more natural and intuitivehuman machine interaction. It was seen in the state of the art that, de-spite the huge quantity on contributions relating to verbal communicationin multimodal interactive system, the non-verbal aspects have not beenworthy studied. This research considers the use of avatars as a necessaryelement to obtain non-verbal communication between the user and themachine.

In order to research the avatars’ capacity to generate non-verbal com-munication, an analysis was performed of the main aspects that influencethe communication between the user and the machine when avatars in-tervene in the multimodal interaction. The first result of this analysisindicates that there are three elements in this type of interactive systemswhose interaction has to be synchronized. These three elements are theuser, the avatar and the application.

A more detailed analysis of these three elements have shown that thecharacteristics of each of them conditions the form of interaction, so thatthese characteristics are going to define the level of interaction that occurbetween the elements and their role in that interaction.

However, the main objective of the research was to consider the pos-sible interaction through non-verbal communication (mainly emotionalaspects) that can exist in a system that includes avatars. Therefore, once

73

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

the characteristics of each element, the interaction levels and the roleof each element were established, the necessary components were anal-ysed that have to exist in a system to generate emotional interaction.This analysis showed that three types of interaction were required in theinteractive system, SEI (Static Emotional Interaction), DEI (DynamicEmotional Interaction) and HEI (Human Emotional Interaction).

In order for all this information to be graphically depicted, a matrixwas designed where the following was defined for each interaction betweentwo elements; the level of interaction that can occur, the role that eachelement plays and the types of interaction that have to exist in the systemto generate emotional interaction, along with an application case.

The main contribution of this research is to create a generic architec-ture designed using the interaction matrix, which includes all the neces-sary modules for emotional multimodal interaction through avatars. Thearchitecture has been applied and further information is available in thefollowing studies. IGARTUBEITI [23] and ELEIN [33]. However, a pro-posed future research is to make a modular system that allows all types ofemotional interaction defined in the interaction matrix to be generated.

References

[1] I. Aizpurua, A. Ortiz, D. Oyarzun, I. Arizkuren, J.Posada, A.C.Andres, and I.Iurgel. Adaptation of mesh morphing techniques foravatars used in web applications. In AMDO2004, pages 26–39, Palmade Mallorca, Spain, September 2004.

[2] A. Aucella, R. Kinkead, A. Wichansky, and C. Shmandt. Voice: tech-nology searching for communication needs. In CHI ’87: Proceedingsof the SIGCHI/GI conference on Human factors in computing sys-tems and graphics interface, pages 41–44, New York, NY, USA, 1987.ACM Press.

[3] C. Bartneck. eMuu - An Embodied Emotional Character for the Am-bient Intelligen Home. PhD thesis, For the ambient intelligent homei Technische Universiteit Eindhoven, Oktober 2002.

[4] J. Bates. The role of emotion in believable agents. Source Commu-nications of the ACM archive, 37(Issue 7):122 – 125, July 1994.

[5] D. Buy. Creating emotions and facial expressions for embodiedagents. PhD thesis, University of Twente, July 2004.

[6] M.P. Carretero, D. Oyarzun, A. Ortiz, I. Aizpurua, and J. Posada.Virtual characters facial and body animation through the editionand interpretation of mark-up languages. Computer & Graphics,29(2):189–194, 2005.

74

Analysing emotional interaction through avatars.

A generic architecture

[7] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten, and J-C. Burgel-man. Scenarios for ambient intelligence in 2010. Technical report,European Commission - Community Research, 2001.

[8] P. Ekman. Facial expression and emotion. American Psychologist,48(4):384–392, April 1993.

[9] M. El-Nasr, J. Yen, and T. Ioerger. Flame:fuzzy logic adaptive modelof emotions. Autonomous Agents and Multi-Agent Systems, 3:219 –257, September 2000.

[10] C. Elliot. The Affective Reasoner: A process model of emotions in amulti-agent system. PhD thesis, Norhwestern Universtiy, Evanston,IL, 1992.

[11] M. Fabri. Emotionally Expressive Avatars for Collaborative VirtualEnvironments. PhD thesis, Leeds Metropolitan University, November2006.

[12] T. Giang, R. Mooney, C. Peters, and C. O’Sullivan. Real-time char-acter animation techniques. Technical report, Trinity College Dublin,Computer Science Department, 2000.

[13] M. Gilles and D. Ballin. Affective interactions between expressivecharacters. In Proceedings of 2004 IEEE International Conference onon Systems, Man and Cybernetics., pages 1589 –1594. IEEE, 2005.

[14] J. Gratch. Emile: Marshalling passions in training and education. InM. G. C. Sierra and J. S. Rosenschein, editors, International Con-ference on Autonomous Agentes, pages 325 – 332, Barcelona, Spain,2000. ACM Press.

[15] Y. Hernandez, J. Noguez, E. Sucar, and G. Arroyo-Figueroa. Incor-porating an affective model to an intelligent tutor for mobile robotics.In Frontiers in Education Conference, 36th Annual, October 2006.

[16] C. Hongpaisanwiwat and M. Lewis. Attentional effect of animatedcharacter. In INTERACT, 2003.

[17] Imvu. Imvu 3d avatar chat. http://www.imvu.com/, 2006.

[18] Y. Kim. Pedagogical agents as learning companions: the effects ofagent affect and gender on student learning, interest, self-efficacy,and agent persona. PhD thesis, Department of Educational Psychol-ogy and Learning Systems. Florida State University., Tallahassee,FL, USA, 2004. Major Professor-Amy L. Baylor.

[19] T. Koda and P. Maes. Agents with faces: The effects of personifi-cation of agents. In 5th IEEE International Workshop on Robot andHuman Communication, Tsukuba, Japan, November 1996.

75

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

[20] R. S. Lazarus, I. R. Averill, and E. M. Opton. Toward a CognitiveTheory of Emotion, chapter Feeling and emotion., pages 207–232.Academic Press, New York, 1970.

[21] M. Lehr, A. Arruti, A. Ortiz, D. Oyarzun, and M. Obach. Speechdriven facial animation using hmms in basque. In Proceedings of 9thInternational Conference, TSD 2006, Brno, Czech Republic, 11-15September 2006. Text, Speech and Dialogue, Proceedings LectureNotes in Artificial Intelligence (Springer).

[22] M.T. Linaza, A. Garcıa, A. Susperregui, and C. Lamsfus. Interactivemobile assistants for added-value cultural contents. In Proc. of the7th International Symposium, 2006.

[23] MT. Linaza, A. Ortiz, I. Leanizbarrutia, and J. Florez. Collaborativecultural experience- the farm of igartubeiti. In ARTECH 2004- 1stWorkshop Luso-Galaico de Artes Digitais, Lisboa, Portugal, 12 July2004.

[24] C. Malerczyk and M. Schnaider. A mixed rality-supported interac-tive museum exhibit. In University College London The Institute ofArchaeology, 2004.

[25] A. Marriot. Vhml: Virtual human markup language. In OZCHI,Fremantle, Western Australia, November 2001.

[26] A. Mehrabian. Communication without words. Psychology Today,2(4):53–56, September 1968.

[27] X. Mi and J. Chen. Agent-based interaction model for collaborativevirtual environments. In Proceedings of the Ninth International Con-ference on Computer Supported Cooperative Work in Design, 2005.,pages 401– 404. IEEE, : 24-26 May 2005.

[28] Midway. Mortal kombat armageddon - todo final es un nuevocomienzo. http://www.mortal-kombat.org/intro.html, 2006.

[29] C. Nass, J.Steuer, and E.R.Tauber. Computers are social actors. InCHI ’94: Proceedings of the SIGCHI conference on Human factors incomputing systems, pages 72–78, New York, NY, USA, 1994. ACMPress.

[30] E. Not, K.Balci, F.Pianesi, and M.Zancanaro. Synthetic charactersas multichannel interfaces. In ICMI ’05: Proceedings of the 7th in-ternational conference on Multimodal interfaces, pages 200–207, NewYork, NY, USA, 2005. ACM Press.

[31] A. Ortiz, MP. Carretero, D. Oyarzun, J.Yaguas, C. Buiza, M.F. Gon-zalez, and I. Etxeberria. Elderly users in ambient intelligence: Doesan avatar improve the interaction? In Proceedings of 9th ERCIM

76

Analysing emotional interaction through avatars.

A generic architecture

Workshop. User Interfaces For All, Konigswinter (Bonn), Germany,27-28 September 2007.

[32] A. Ortiz, D. Oyarzun, I. Aizpurua, I. Arizkuren, and J.Posada.Three-dimensional whole body of virtual character animation for itsbehavior in a virtual environment using h-anim and inverse kinemat-ics. In IEEE Computer Society Press, editor, Institute of Electricaland Electronics Engineers (IEEE), pages 307–310, Los Alamitos, CA,June 2004.

[33] A. Ortiz, D. Oyarzun, M.P. Carretero, and N. Garay-Vitoria. Virtualcharacters as emotional interaction element in the user interfaces. InProceedings of AMDO 2006. IV Conference on Articulated Motionand Deformable Objects., Andrax, Mallorca, Spain, 11-14 July 2006.

[34] A. Ortiz, D. Oyarzun, M.P. Carretero, C. Toro, and J. Posada.Avatar behaviours due to the user interaction. IADAT Journal ofAdvanced Technology on Imaging and Graphics, Volume 1(number1):28–31, 7-9 September 2005.

[35] A. Ortony, G.L. Clore, and A. Collins. The Cognitive Structure ofEmotions. Cambridge University Press, Cambridge, England., 1998.

[36] I.S. Pandzic, T.K. Capin, E. Lee, N. Magnenat-Thalmann, andD. Thalmann. Autonomous actors in networked collaborative vir-tual environments. In Proc.MultiMedia Modeling ’98, pages 138 –145. IEEE Computer Society Press, 1998.

[37] R. Picard. Affective Computing. The MIT Press, 1997.

[38] H. Prendinger, C. Ma, J. Yingzi, A. Nakasone, and M. Ishizuka.Understanding the effect of life-like interface agents through users’eye movements. In ICMI ’05: Proceedings of the 7th internationalconference on Multimodal interfaces, pages 108–115, New York, NY,USA, 2005. ACM Press.

[39] W. S. Reilly. Believable Social and Emotional Agents. PhD thesis,School of Computer Science, Carnegie Mellon University, Pittsburgh,PA, USA., May 1996.

[40] W. S. Reilly and J. Bates. Building emotional agents. Technical Re-port CMU-CS-92-143, School of Computer Science, Carnegie MellonUniversity, Pittsburgh, PA, USA, 1992.

[41] I. Roseman, A. Antoniou, and P. Jose. Appraisal determinants ofemotions: Constructing a more accurate and comprehensive theory.Cognition and Emotion, 10, 1996.

77

Amalia Ortiz, David Oyarzun, Maria del Puy Carretero, Jorge

Posada, Maria Teresa Linaza, Nestor Garay-Vitoria

[42] F. Rosis, C. Pelachaud, I. Poggi, V. Carofiglio, and B. Carolis. Fromgreta’s mind to her face: modelling the dynamics of affective statesin a conversational embodied agent. Int. J. Hum.-Comput. Stud.,59(1-2):81–118, 2003.

[43] RRL. The neca rrl. http://www.ai.univie.ac.at/NECA/RRL/, 2006.

[44] SecondLife. Secondlife - your world. your imagination.http://secondlife.com/, 2006.

[45] E. Shaw, W. Lewis Johnson, and R. Ganeshan. Pedagogical agents onthe web. In AGENTS ’99: Proceedings of the third annual conferenceon Autonomous Agents, pages 283–290, New York, NY, USA, 1999.ACM Press.

[46] A. Sloman. How many separately evolved emotional beasties livewithin us., chapter Emotions in humans and artifacts. MIT Press,Cambridge, MA., 2002.

[47] J.D. Velasquez. Modeling emotions and other motivations in syn-thetic agents. In AAAI Conference, pages 10–15, Providence, 1997.

[48] J.D. Velasquez. Cathexis–a computational model for the generationof emotions and their influence in the behavior of autonomous agents.Master’s thesis, Massachusetts Institute of Technology. Dept. of Elec-trical Engineering and Computer Science, 1996. Plublisher, Mas-sachusetts Institute of Technology.

[49] L. Wachowski and A. Wachowski. The matrix. Warnerbross film, 31March 1999.

[50] J.H. Walker, L. Sproull, and R. Subramani. Using a human face inan interface. In Human factors in computing systems, pages 85–91,New York, NY, USA, 1994. ACM Press.

[51] W. Wright. The sims. http://thesims.ea.com/, 2004.

78