Compelling experiences in mixed reality interactive storytelling

10
Compelling Experiences in Mixed Reality Interactive Storytelling Fred Charles, Marc Cavazza, Steven J. Mead School of Computing University of Teesside Middlesbrough, TS1 3BA, UK [email protected] Olivier Martin Laboratoire de Télécommunications et Télédétection, Université catholique de Louvain, 2 place du Levant, 1348 Louvain-la-Neuve, Belgium [email protected] Alok Nandi Xavier Marichal Alterface, 10 Avenue Alexander Fleming, 1348 Louvain-la-Neuve, Belgium [email protected] ABSTRACT Entertainment systems promise to be a significant application for Mixed Reality. Recently, a growing number of Mixed Reality applications have included interaction with virtual actors and storytelling. However, AI-based Interactive Storytelling techniques have not yet been explored in the context of Mixed Reality. In this paper, we describe a fully-integrated first prototype based upon the adaptation of an Interactive Storytelling technique to a Mixed Reality system. After a description of the real time image processing techniques that support the creation of a hybrid environment, we introduce the storytelling technique and the essential modalities of user interactions in the Mixed Reality context. Finally, we illustrate these experiments by discussing examples obtained from the system. Categories and Subject Descriptors I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence Intelligent Agents. General Terms Algorithms, Design, Experimentation, Human Factors, Languages. Keywords Mixed Reality, Interactive Storytelling, Virtual Actors. 1. INTRODUCTION Although Mixed Reality (MR) techniques show great potential, the important issues for the entertainment industry are about making MR technology become transparent for the content to have full effect [25]. One promising application of MR to entertainment has been the development of “interactive theatre” systems involving synthetic actors. Examples of such systems include the Romeo and Juliet interactive play [15], the interactive theatre of Cheok et al. [8]], the multiple point-of-view experience for a single narrative [3] and the Transfiction system [13] [17]. However, these systems do not make use of the most recent advances in interactive storytelling technologies, considering the dramatic expansion of the field in recent years. Interactive storytelling systems use Artificial Intelligence (AI) techniques to generate real time narratives featuring synthetic characters. Unlike early systems in which users were interacting with virtual characters, like ALIVE [11], IMPROV [20] or KidsRoom [1], they are based on a long-term narrative drive and maintain the overall consistency of the plot while taking into account user interaction [5]. Two main paradigms have emerged for interactive narratives. In the “Holodeck™” approach [26], the user is immersed in a virtual environment while acting his own part within the story that evolves around him. On the other hand, in “Interactive TV” approaches, the user is an active spectator influencing the story from an external point of view to the unfolding plot [5]. Immersive Interactive Storytelling systems (inspired from the “Holodeck™”) are faced with a major limitation, which is that the user is part of the story but is not part of the medium itself as a consequence of the first-person mode through which s/he experiences the narrative. In this paper, we present a potential solution to this problem using immersive Mixed Reality, in which the user is immersed in the story but also features as a character in its visual presentation, which allows the user to see himself play an active role in the interactive narrative. In this Mixed Reality Interactive Storytelling approach, the user’s video image is captured in real time and superimposed onto a virtual environment populated by autonomous virtual actors with which s/he interacts. The user in turn watches the composite world projected onto a large-scale display, following a “magic mirror” metaphor. In the next sections, we describe the rationale behind the chosen narrative context for our Mixed Reality Interactive Storytelling installation. After a brief description of the system’s architecture and the techniques used in its implementation, we discuss the specific modes of interaction and user involvement that are associated with user’s acting in Mixed Reality Interactive Storytelling. 2. RATIONALE The storytelling scenario supporting our experiments is based on a James Bond adventure, in which the user is actually playing the role of the Professor. The characteristic elements of James Bond Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACE’04, June 3–5, 2004, Singapore. (c) 2004 ACM 1-58113-882-2/04/0006…$5.00

Transcript of Compelling experiences in mixed reality interactive storytelling

Compelling Experiences in Mixed Reality Interactive Storytelling

Fred Charles, Marc Cavazza, Steven J. Mead

School of Computing University of Teesside

Middlesbrough, TS1 3BA, UK

[email protected]

Olivier Martin Laboratoire de Télécommunications

et Télédétection, Université catholique de Louvain, 2 place du Levant, 1348 Louvain-la-Neuve,

Belgium

[email protected]

Alok Nandi Xavier Marichal

Alterface, 10 Avenue Alexander Fleming, 1348 Louvain-la-Neuve,

Belgium

[email protected]

ABSTRACT Entertainment systems promise to be a significant application for Mixed Reality. Recently, a growing number of Mixed Reality applications have included interaction with virtual actors and storytelling. However, AI-based Interactive Storytelling techniques have not yet been explored in the context of Mixed Reality. In this paper, we describe a fully-integrated first prototype based upon the adaptation of an Interactive Storytelling technique to a Mixed Reality system. After a description of the real time image processing techniques that support the creation of a hybrid environment, we introduce the storytelling technique and the essential modalities of user interactions in the Mixed Reality context. Finally, we illustrate these experiments by discussing examples obtained from the system.

Categories and Subject Descriptors I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence – Intelligent Agents.

General Terms Algorithms, Design, Experimentation, Human Factors, Languages.

Keywords Mixed Reality, Interactive Storytelling, Virtual Actors.

1. INTRODUCTION Although Mixed Reality (MR) techniques show great potential, the important issues for the entertainment industry are about making MR technology become transparent for the content to have full effect [25]. One promising application of MR to entertainment has been the development of “interactive theatre” systems involving synthetic actors. Examples of such systems include the Romeo and Juliet interactive play [15], the interactive theatre of Cheok et al. [8]], the multiple point-of-view experience

for a single narrative [3] and the Transfiction system [13] [17].

However, these systems do not make use of the most recent advances in interactive storytelling technologies, considering the dramatic expansion of the field in recent years. Interactive storytelling systems use Artificial Intelligence (AI) techniques to generate real time narratives featuring synthetic characters. Unlike early systems in which users were interacting with virtual characters, like ALIVE [11], IMPROV [20] or KidsRoom [1], they are based on a long-term narrative drive and maintain the overall consistency of the plot while taking into account user interaction [5].

Two main paradigms have emerged for interactive narratives. In the “Holodeck™” approach [26], the user is immersed in a virtual environment while acting his own part within the story that evolves around him. On the other hand, in “Interactive TV” approaches, the user is an active spectator influencing the story from an external point of view to the unfolding plot [5]. Immersive Interactive Storytelling systems (inspired from the “Holodeck™”) are faced with a major limitation, which is that the user is part of the story but is not part of the medium itself as a consequence of the first-person mode through which s/he experiences the narrative. In this paper, we present a potential solution to this problem using immersive Mixed Reality, in which the user is immersed in the story but also features as a character in its visual presentation, which allows the user to see himself play an active role in the interactive narrative. In this Mixed Reality Interactive Storytelling approach, the user’s video image is captured in real time and superimposed onto a virtual environment populated by autonomous virtual actors with which s/he interacts. The user in turn watches the composite world projected onto a large-scale display, following a “magic mirror” metaphor.

In the next sections, we describe the rationale behind the chosen narrative context for our Mixed Reality Interactive Storytelling installation. After a brief description of the system’s architecture and the techniques used in its implementation, we discuss the specific modes of interaction and user involvement that are associated with user’s acting in Mixed Reality Interactive Storytelling.

2. RATIONALE The storytelling scenario supporting our experiments is based on a James Bond adventure, in which the user is actually playing the role of the Professor. The characteristic elements of James Bond

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACE’04, June 3–5, 2004, Singapore. (c) 2004 ACM 1-58113-882-2/04/0006…$5.00

stories arise from the skilful intertwining of stereotypes, such as the strong duality James Bond versus the Professor. James Bond stories have narrative properties that make them good candidates for interactive storytelling experiments: for this reason, they have been used as a supporting example in the foundational work of Roland Barthes in contemporary narratology [2]. Besides, their strong reliance on narrative stereotypes facilitates the understanding of the actions the user can perform as part of his active role as the villain (let us call him the Professor).

We devise a coherent scenario following the James Bond plots, whereby James Bond is given a task by his secret services, the Professor moves and appears to James Bond, James Bond offers his first check to the Professor and James Bond attempts to conquer the Professor. The basic storyline represents the early encounter between Bond and the Professor. The objective of Bond is to acquire some essential information, which he can find by searching the Professor’s office, obtained from the Professor’s assistant or even, under certain conditions, (deception or threat) by the Professor himself. The actions of the user (acting as the Professor) are going to interfere with Bond’s plan, altering the unfolding of the story. The next section describes the interactive storytelling engine through which this interaction is interpreted.

3. THE MIXED REALITY INSTALLATION Our MR system fulfils the requirements of the intended means of visualisation and interaction in a first-person mode based on a “magic mirror” model [13]. Within our framework, the user’s image is captured in real time by a video camera, extracted from his background and mixed with a 3D graphic model of a virtual stage including the synthetic characters taking part in the story. The resulting image is projected on a large screen facing the user, who sees their own image embedded in the virtual stage with the synthetic actors.

The MR system should create both a visual presentation of the story embedding the user and create a virtual stage for the interactive story. The latter aspect should support some level of physical interaction between the video avatar of the user and the virtual elements, whether objects or actors, which populate the stage. The graphic component of the Mixed Reality world is based on a game engine, Unreal Tournament 2003TM. This engine not only performs graphic rendering and character animation but, most importantly, contains a sophisticated development environment to define interactions with objects and characters’ behaviours [12]. In addition, it supports the integration of external software, e.g. through socket-based communication (Figure 1). Game engines have become a popular development platform for creating Mixed Reality systems, with examples such as ARQuake [21] and MRQuake [9], as well as for large-scale display installations such as Cave-UT2003 [12].

The mixed environment is constructed using real-time image processing, using the Transfiction engine [17]. A single (monoscopic) 2D camera facing the user analyses his image in real-time by segmenting the user’s contours. The objective behind segmentation is twofold:

- It is intended to extract the image silhouette of the user in order to be able to insert it into the virtual setting on the projection screen.

- Simultaneously, the extracted body silhouette undergoes some analysis in order to be able to recognise and track the behaviour of the user (position, attitude and gestures) and to influence the interactive narrative accordingly.

Figure 1. System architecture.

Figure 2. Overview of the process of object detection.

The video image acquired from the camera is passed to a detection module, which performs segmentation in real time and outputs the segmented video image of the user together with the recognition of specific points which enable further processing, such as gesture recognition. The present detection module uses a 4x4 Hadamard determinant of the Walsh function and calculates the transform on elements of 4x4 pixels. Sliding the box of 2 pixels aside allows taking decision on 2x2 pixel blocks. As a result, it can segment and relatively precisely detect the boundary of objects and also offers some robustness to luminance variations. Figure 2 shows the overview of the change detection process with Walsh-Hadamard transform. First, the module calculates the Walsh-Hadamard transform of the background image. Afterwards, the module compares the values of the Walsh-Hadamard transform of both the current and the background images. When the rate of change is higher than a threshold that has been initially set, this module sets the area as foreground. Segmentation results can be corrupted in presence of shadows, e.g. an object moving between a light source and the background generates a cast shadow on the background. This can be problematic due to variable indoor lighting conditions. We have used invariant techniques [23] to remove such shadows.

The resulting video image from the segmentation remains to be composited with the virtual environment image. This process takes place in real-time by mixing the video channels captured by

a separate computer running a DirectXTM-based application. The first stage consists in isolating the user’s image from its background by using basic chroma-keying. The remaining stage attempts to solve the problem of occlusion by blending the user’s image with the virtual environment’s image using empirical depth information provided by the gesture recognition module as a user’s relative distance to the camera, and by the game engine itself for the virtual environment. Figure 4 (see next page) illustrates the overall processing whereby the several video image layers are composited in real time to produce the final image which is projected onto the screen in front of the user.

Figure 3. The 3D bounding cylinder determines physical interactions in the Unreal Tournament 2003TM engine.

The two system components operate by sharing a normalised system of co-ordinates. The shared co-ordinates system makes possible to position the user in the virtual image, but most importantly to determine the relations between the real user and the virtual environment. This is achieved by mapping the 2D bounding box produced by the Transfiction engine, which defines the contour of the segmented user character, to a 3D bounding cylinder in the Unreal Tournament 2003TM environment, which represents the position of the user in the virtual world (Figure 3) and, relying on the basic mechanisms of the engine, automatically generates low-level graphical events such as collisions and object interaction. The object interaction is further represented within the virtual environment by mapping the position of the left-most and right-most markers to a small-scaled bounding cylinder, which allows for more precise collisions with objects that can be manipulated by the user within the virtual environment.

The two sub-systems communicate via TCP sockets: the image processing module, working on a separate computer sends at regular intervals to the graphic engine two different types of messages, containing updates on the user’s position as well as any recognised gestures. The position of the user is represented by five points, which are: the highest point, the lowest, the right most, the left most and the centre of gravity of the segmented silhouette, as well as of the bounding box. The recognized gesture is transmitted as a code for the gesture (plus, when applicable, e.g. for pointing gestures, a 2D vector indicating the direction of pointing). However, the contextual interpretation of the gesture is carried out within the storytelling system. The time lag introduced by the socket communication is not significant considering the sampling rate of the system at gesture recognition level.

4. THE INTERACTIVE STORYTELLING ENGINE The interactive storytelling technology used in these experiments has been developed in previous work and is described in detail in [5] [6]. We shall thus only give a brief overview of the approach, insisting on those aspects that are most relevant to its Mixed Reality implementation.

Interactive Storytelling consists in the real-time generation of narrative actions that takes into account the consequences of user intervention, by “re-generating” the story as the environment is modified by the user’s intervention. The latter point distinguishes Interactive Storytelling from other forms of Mixed Reality systems that include a narrative experience.

Interactive Storytelling systems are based on Artificial Intelligence techniques supporting representations of the plot [26] [27] or the characters’ roles [6], which control the behaviour of virtual actors.

The approach we have developed is centred on the roles played by virtual actors, and we have referred to it as character-based interactive storytelling [5]. Rather than maintaining an explicit model of the plot that will control the unfolding of the story (and explicitly co-ordinate the characters), the storyline is distributed across each actor’s roles, which are defined independently from one another.

In a MR context, where the user himself is playing the part of an additional character, the modular approach brought by character-based interactive storytelling facilitates the interpretation of user actions, whose consequences in the environment are taken into account using the same mechanism that co-ordinates the other characters’ sharing of on-stage resources.

The AI mechanism that supports character behaviour is based on a planning technology using Hierarchical Task Networks (Figure 5) [18]. These representations describe the role of character as a plan, by describing it as a hierarchical decomposition of tasks into

sub-tasks1. For instance, if the task consists in obtaining information, it can be decomposed into several options for gaining access to that information, such as searching files, obtaining it from another character, etc. Each of these tasks can be further decomposed, for instance in order to obtain information from a character, it is necessary to be in close proximity, establish a relation with it, convince it to handle the information, etc. The decomposition of tasks in the HTN is carried out until the level of “terminal actions”, which correspond to actions that can be visually performed by the synthetic character in the virtual world. The system thus uses an HTN planner to select in real-time the actions to be carried out by each actor. The planner distributes its computations between the various actors, taking advantage of the duration of the actions being performed. Failure of an action is passed back to the planner that will then produce an alternative solution. This is an essential mechanism for Interactive Storytelling as user intervention will most often cause the current planned action of a given character to fail, leaving it to produce an alternative solution which will lead the story into new directions. The mechanism through which user intervention leads to action failure is essentially by altering the pre-conditions or the executability conditions [10] of those actions. Several examples are described in section 6.

1 Formally, HTN are AND/OR graphs and the solution plan can be represented as a sub-graph of the HTN.

Figure 4. Constructing the Mixed Reality environment.

Figure 5: HTN representing a sample of James Bond’s role.

By assimilating roles to plans, the character’s goals are acting as narrative drives: whatever the events to which it has to react, the virtual actor will resume his initial goal, taking into account the new state of the world resulting from previous interactions.

One of the major difficulties in interactive storytelling systems is having the user understand the kind of actions he is allowed to perform and what form his involvement in the story can take. In most of the systems described so far, the user is led into task-based conversations with virtual actors, which are essentially constitutive of the action [14] or [26]. This is due to the fact that, in these systems, the story is visually constituted by a virtual stage featuring the synthetic characters, the user being part of the action, but not integrated to the visual medium that displays the story. In a MR context/implementation, the user himself becomes part of the visual presentation of the narrative, which means that its role cannot be reduced to external intervention, as this would totally disrupt the interactive storytelling concept. Being entirely part of the story, including its visual presentation, the user must thus actually play a part in that story; hence act within the constraints of a role allocated to him2. However, unlike with traditional acting, this role is only defined in terms of the generic actions that can be performed in a given scene’s context, leaving the user to freely select which action he will perform at any given stage. This performance constitutes the user interaction with the system, which we analyse in the next section.

2 Influencing the characters without playing a part in the story would put the user into a director’s position and give him a double status of director and spectator, similar to the one described for computer games by Bolter and Grusin [4]. However, this status would probably be inconsistent with the mode of user representation we have adopted.

5. USER INTERACTION AND ACTING Following the characteristic elements of James Bond stories, the user should abide by his role as the Professor, by playing opposite James Bond. However within this narrative framework, the user remains free of his lines. This allows for story variability. The limitation of constraints on the user increases his potential for a positive experience of the narrative.

First, we introduce the elements to which we refer when discussing the modalities of user acting within our MR paradigm. The three main components of acting in our MR installation are:

• User’s attitude through the recognition of his gestures.

• Spoken utterances while conversing with the virtual actors, through the use of speech recognition and basic isolation of syntactic structures via multi-keyword spotting (Figure 6)

• Physical interaction from the user onto objects or characters of the virtual stage. Examples of physical interactions include direct interaction with objects that the user can give to James Bond which might prevent him from drawing his gun, and direct interaction with virtual actors, such as pushing, slapping, etc.

Figure 6. Different gestures performed by the user.

a) welcome; b) offer to sit down; c) give an object; d) call assistant

In contrast with Holodeck™-like approaches [10], the main character (Bond) is actually a synthetic actor rather than the user, and the storyline is driven by its role. This ensures a spontaneous drive for the story while setting the base for an implicit narrative control. The fact that the user visually takes part in the story presentation obviously affects the modes of user intervention: these will have to take the form of traditional interaction between characters. In other words, the user will have to act. As a

consequence, the mechanisms of his normal acting should serve as a basis for interaction.

User’s acting is not restricted to gesticulation, but also gestures that do have a direct impact on the virtual objects, or virtual actors. Fulfilling his main goals described by his overall role, James Bond may draw his gun to the Professor to threaten him if he does not provide the information requested. In the same way virtual actors can interact with their environment, the user can physically interact with both objects and virtual actors, e.g. pick up a glass, a cigar or a document, and offer it to James Bond to prevent him from drawing his gun, or in response to a threat, the user could push James Bond back, slap his assistant for mis-behaviour, or punch James Bond.

One limitation of MR installations is the transfer of objects from real to virtual world. And we are also faced with difficulties for the transfer of virtual objects from virtual actors to the use in his real world. Although in our paradigm of “magic mirror” it is possible to achieve a transfer of virtual objects towards the representation of the user (his real video image on the screen), the true real aspect of object interaction cannot always be achieved.

This fundamental aspects shape the whole interaction, in particular it determines a specific kind of multi-modal interaction, composed of a spoken utterance and a gesture or body attitude. The latter, being part of the acting actually constitutes a semiotic gesture whose content is complementary but similar in nature to that of the linguistic input [3]. A set of semiotic gestures is identified and constitutes a “gesture lexicon”, at this stage comprising up to 15 body attitudes (some of which are represented on Figure 6).

Interactive storytelling has focussed its formalisation efforts on narrative control. It has done so using the representations and theories of narratology. Yet, little has been said about the user’s interventions themselves. While they should obviously be captured by the more generic representations of story or plot, there is still a need to devise specific representations for units of intervention.

In the case of a role involving interactions between several characters, in our case between Bond and the villain, the terminal actions of the HTN representing Bond's role actually correspond to interactions with the villain's character. These interactions, put into the global perspective of the narrative, correspond to narrative functions (such as deception, threat, submission).

Because the user's participation takes the form of multi-modal input (corresponding to his acting), the objective of multi-modal processing should be to instantiate the representations corresponding to these narrative actions. However the strong context of the narrative representations can provide guidance for the processing of the multi-modal input. This should be able to compensate, at least in part, for the limited performance of multi-modal parsing, especially when it comes to speech recognition.

Figure 7. The EARTM SDK development environment

The multi-modal processing systems includes a speech understanding and a gesture processing component, together with a specific module merging the content of both modalities. The speech processing system identifies speech acts that are compatible with the narrative functions currently active. It does so through a two-layer processing. In the first instance the system tries to identify a given speech acts from surface clues, through the recognition of certain patterns as occurrence of words. For instance, the patterns of words directly suggesting threat or denial. If no such patterns can be found, the system attempts at matching the semantic categories extracted from the utterance to those characterising the narrative function.

The speech recognition component is based on the Ear SDK system from BabelTech™, which is an off-the-shelf system including a development environment for developing the lexicon (Figure 7). One advantage is that it can provide a robust recognition of the most relevant topics in context, without imposing constraints on the user (like the use of a specific phraseology).

The requirements on spontaneous speech preclude the use of any input grammar that the user would have to learn. The existence of a strong context, characteristic of sub-language applications, can indeed be taken advantage of, but only in some cases. This has led us to rely on multi-keyword spotting (MKS) as a paradigm of speech recognition. We implement MKS through the BabelTech™ system by using patterns with optional and joker characters which implement MKS from the same formalism used as a recognition grammar. In addition some syntactic forms corresponding to idiosyncratic expressions likely to be encountered can also be added to improve performance.

From the point of view of story events taking place on the virtual stage, we identify two main modes of interaction, which are speech utterances and situations taking place between the two characters, which can be assimilated to narrative functions.

At formalism level, we use the HTN planning as a representation for the virtual actors’ roles whereby we define the overall set of possible terminal actions to succeed his overall plan. A sub-set of

these actions being performed on the virtual stage, can define direct interactions with the Professor’s character (thus the user), being either physical or spoken interactions.

The interest for those interactions lies within the richness of spoken utterances providing a huge potential for variability in the story evolution as well as maximising the user’s experience of the narrative.

Our approach is to be able to map spoken utterances for which the content is often implicit, to well-defined narrative situations. These situations can be described in terms of narrative functions demonstrating the duality between James Bond and the Professor.

If we consider a scene between Bond and the Professor, the majority of narrative functions would develop around a central dimension, which is the cooperative/antagonistic relation. If we assume that the final product of the interpretation of the implicit content of utterances can be formalised as a speech act, then we can bias the classification of such speech acts towards those high-level semantic dimensions that can be interpreted in narrative terms.

Figure 8: Sample of spoken utterances from the user, as

answers to questions from James Bond.

We studied a corpus of spoken utterances in order to identify the problems of speech processing and we developed an empirical approach to the processing of those utterances aiming at identifying the speech acts: either following their structure whereby clues are available from the syntax itself; or following their content, by comparing the content and the semantic dimensions associated to the concepts indexing the narrative functions.

For example, Figure 8 illustrates a sample of spoken utterances from the user given as a set of possible responses to questions from James Bond. For instance, in the utterance “You must be joking, 007!”, we can identify the important keywords as being [you] and especially [joking].

We can represent the semantic information referred to the narrative action where James Bond requests information from the Professor as:

(Narrative-Action (actor: ?Bond ?Professor)

(Negotiation (type: confrontational)

(obj: (information: location)))

And we can attribute a similar semantic information to the speech act according to the provided contextual narrative information (this since, the answer to the question is not a direct answer):

(Speech-Act (type: Negotiation

(content: ?un-co-operative))

(speaker: ?Professor))

The utterance is compatible with the identification of a Denial speech act. In turn, the Denial speech act belongs to a category of antagonistic attitudes. This antagonistic attitude is then interpreted as part of the narrative function / action.

The mapping of speech acts to narrative functions is based upon two dimensions. First, the interactions taking place on stage, which include speech utterances from the user and virtual actors’ actions being performed on the virtual stage. And also, the formalisms considered which include speech acts and virtual actors’ roles being represented as HTN plans.

6. RESULTS In this section, we illustrate the behaviour of our prototype system on a reduced but realistic example, corresponding to a single scene of an interactive narrative. We describe several examples and describe them in terms of the form of the user intervention, its consequences in terms of virtual actors behaviour and the visual scene that is produced as a result.

This scene features James Bond, the Professor and his assistant: Bond and the assistant are virtual actors, James Bond being the feature character whose role is the main drive for the scene. Both Bond and the assistant react to interactions with the human user: these interactions interrupt their plans and introduce changes in the environment, which will have to be taken into account when they (mostly Bond) resume their original plan.

In this context, the actions performed by the user are essentially interfering with Bond’s plan and forcing him to devise new ways of achieving his goal. There are several mechanisms for this interference but they mainly involve a modification of a pre-condition or of executability conditions of narrative/terminal

Figure 9. Example of situations taking place on the virtual stage between James Bond and the Professor.

actions. For instance, Bond has to respond to the Professor’s invitation, which, if he is currently searching the filing cabinet forces him to abandon that task to take a seat at the Professor’s desk.

In this first example, Bond enters the Professor’s office from the door in the bottom left corner, far enough from the professor’s desk not to be aware of his presence. Bond will direct himself towards the filing cabinet with the intent to search it. The user manifests his presence by greeting Bond with a greeting gesture (Figure 9). Bond interrupts his search and walks to the Professor’s desk where he takes a seat. He can now resume his plan, which will comprise several options, such as convincing, threatening the Professor or contacting his assistant.

We then consider the phase of Bond's role where he demands information from the Professor (Figure 5) regarding the shipment he is tracking. The Bond character will assertively ask a question such as: “I need to know where the shipment leaves from.”

The Professor's response can be part of different stories, and in this interactive context will actually affect the further unfolding of the plot.

If the user adopts the traditional role of the villain he will be un-cooperative and sometimes ironical, replying for instance "How would I know, Mr Bond". As we have seen in former sections, this kind of utterance can be interpreted as a "Denial" speech act, which is then interpreted as an antagonistic attitude by the narrative engine, prompting Bond to become more pressing in his demands (Figure 10).

However, it is the nature of Interactive Storytelling that user behaviour can depart from narrative stereotypes. In which case the Professor could even become co-operative, by answering Bond's question “It will leave from Nagoya harbour” (Figure 11). The

information contained in this question can be processed by the system so as to satisfy Bond's request and trigger appropriate actions. At this stage we are not considering the case (actually the most interesting one) where the user has lied to the virtual actor.

7. CONCLUSIONS MR offers a new context for Interactive Storytelling, which

puts the user in a double actor-spectator position, through the “magic mirror” [13] metaphor, which provides an inverted third-person mode of participation. This is a significant departure from other paradigms of user involvement, such as pure spectator (with the ability to influence the story) [6] or an actor immersed in first-person mode following a “Holodeck™” paradigm [26]. While the practical implications of this form of involvement are yet to be explored, the same context brings new perspectives for user interaction as well, with an emphasis on multimodal interaction.

Figure 10. The Professor replies to a question from James Bond: “How would I know Mr Bond”.

Figure 11. The Professor provides the information requested by James Bond.

8. REFERENCES [1] A.F. Bobick, S.S. Intille, J.W. Davis, F. Baird, C.S. Pinhanez, L.W. Campbell, Y.A. Ivanov, A. Schutte, and Andrew Wilson, “The kidsroom: A perceptually-based interactive and immersive story environment”, Technical Report 398, MIT Media Lab Perceptual Computer Section, 1996. [2] R. Barthes, “Introduction à l’Analyse Structurale des Récits (in French). Communications, 8, 1966, pp.1-27. [3] B. MacIntyre, J.D. Bolter, J. Vaughn, B. Hannigan, M. Gandy, E. Moreno, M. Haas, S.H. Kang, D. Krum and S. Voida, “Three Angry Men: An Augmented-Reality Experiment in Point-of-View Drama.” In Proceedings of TIDSE 2003, Darmstadt, Germany, March 24-26, 2003. [4] J.D. Bolter, and R. Grusin, “Remediation: Understanding New Media”, Cambridge, MIT Press, 1998. [5] M. Cavazza, F. Charles, and S.J. Mead, “Character-based Interactive Storytelling”, IEEE Intelligent Systems, special issue on AI in Interactive Entertainment, 2002, pp. 17-24. [6] M. Cavazza, F. Charles, and S.J. Mead, “Interacting with Virtual Characters in Interactive Storytelling”, ACM Joint Conference on Autonomous Agents and Multi-Agent Systems, Bologna, Italy, 2002, pp. 318-325. [7] F. Charles, J.L. Lugrin, M. Cavazza, and S.J. Mead, “Real-Time Camera Control for Interactive Storytelling”, GameOn 2002, London, UK, 2002. [8] A.D. Cheok, W. Wang, X. Yang, S.J.D. Prince, S.W. Fong, M. Billinghurst, and H. Kato, “Interactive theatre experience in embodied and wearable mixed reality space”, International Symposium on Mixed and Augmented Reality 2002, Darmstadt, Germany, 2002. [9] M. Faust, "Mixed reality gaming environment QuakeRunner", Proceedings of the Second International Conference on Entertainment Computing, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, pp 1-4, 2003. [10] C. Geib, “Intentions in means-end planning”, Technical Report MC-CIS-92-73, University of Pennsylvania, 1992. [11] P. Maes, T. Darrell, B. Blumberg, and A. Pentland, “The ALIVE System: Wireless, Full-Body Interaction with Autonomous Agents”, Multimedia Systems 5(2), 1997, pp. 105-112. [12] Jacobson, J., Hwang, Z., “Unreal Tournament for Immersive Interactive Theater”, Communications of the ACM, Vol 45 No. 1, 2002. [13] X. Marichal, and T. Umeda, “Real-Time Segmentation of Video Objects for Mixed-Reality Interactive Applications”, Proceedings of the "SPIE's Visual Communications and Image Processing" (VCIP 2003) International Conference, Lugano, Switzerland, 2003. [14] M. Mateas, and A. Stern, “A Behavior Language for Story-Based Believable Agents”, IEEE Intelligent Systems, special issue on AI in Interactive Entertainment, 2002, pp. 39-47.

[15] M. Mateas, “An Oz-Centric Review of Interactive Drama and Believable Agents”, Technical Report CMU-CS-97-156, Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA, 1997. [16] R. Nakatsu, and N. Tosa, “Active Immersion: The Goal of the Communications with Interactive Agents”, Autonomous Agents 2000 Workshop on Achieving Human-Like Behavior in Interactive Animated Agents, Barcelona, Spain, 2000. [17] A. Nandi, and X. Marichal, “Senses of Spaces through Transfiction”, pp. 439-446 in “Entertainment Computing: Technologies and Applications” (Proceedings of the International Workshop on Entertainment Computing, IWEC 2002), R. Nakatsu and J. Hoshino (eds), Kluwer Academic Publishers, 2003, ISBN 1-4020-7360-7 [18] D. Nau, Y. Cao, A. Lotem, and H. Muñoz-Avila, “SHOP: Simple hierarchical ordered planner”, Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, AAAI Press, 1999, pp. 968-973. [19] J. Ohya, A. Utsumi, and J. Yamato, “Analyzing Video Sequences of Multiple Humans: Tracking, Posture Estimation and Behavior Recognition”, The Kluwer International Series in Video Computing 3, 2002, pp. 160, ISBN 1-4020-7021- [20] K. Perlin, and A. Goldberg, “Improv: A System for Scripting Interactive Actors in Virtual Worlds”, Computer Graphics; Vol. 29 No. 3., 1996. [21] W. Piekarski, and B. Thomas, “ARQuake: The Outdoor Augmented Reality Gaming System”, Communications of the ACM, Vol 45. No. 1, pp 36-38, 2002. [22] M. Riedl, C.J. Saretto, and R.M. Young, “Managing interaction between users and agents in a multiagent storytelling environment”, in the Proceedings of the Second International Conference on Autonomous Agents and Multi-Agent Systems, 2003. (to appear) [23] E. Salvador, A. Cavallaro, and T. Ebrahimi, “Shadow identification and classification using invariant color models”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2001), Salt Lake City (USA), 2001. [24] N.M. Sgouros, G. Papakonstantinou and P. Tsanakas, “A Framework for Plot Con-trol in Interactive Story Systems”, Proceedings AAAI’96, Portland, AAAI Press, 1996. [25] Christopher B. Stapleton, Charles E. Hughes, J. Michael Moshell, Paulius Micikevicius, Marty Altman: Applying Mixed Reality to Entertainment. IEEE Computer 35(12): pp.122-124, 2002. [26] W. Swartout, R. Hill, J. Gratch, W.L. Johnson, C. Kyriakakis, C. LaBore, R. Lindheim, S. Marsella, D. Miraglia, B. Moore, J. Morie, J. Rickel, M. Thiebaux, L. Tuch, R. Whitney, and J. Douglas, “Toward the Holodeck: Integrating Graphics, Sound, Character and Story”, in

Proceedings of the Autonomous Agents 2001 Conference, 2001. [27] R.M. Young, “Creating Interactive Narrative Structures: The Potential for AI Approaches”, AAAI Spring

Symposium in Artificial Intelligence and Interactive Entertainment, AAAI Press, 2000.