User centred virtual actor technology

9
User Centred Virtual Actor Technology Daphne Economou Sony Broadcast and Professional Research Labs Jays Close, Viables, Basingstoke, RG22 4SB, England ++44 (0) 1256 483700 [email protected] William Mitchell Department of Computing and Mathematics, The Manchester Metropolitan University Chester Street, M1 5GD, England ++44 (0) 161 247 1493 [email protected] Steve Pettifer Jon Cook James Marsh Advanced Interfaces Group The University of Manchester, Oxford Road, M13 9PL, England ++44 (0) 161 275 6259 [email protected] ABSTRACT This paper argues that the development of virtual actor technology must be guided by application and end user needs. Two objectives drive the development of the ‘Senet’ project described in this paper: to develop a set of design guidelines for the use of virtual actors in Collaborative Virtual Environments (CVEs) for learning, and to inform the development of the underlying virtual actor technology following a user centred approach. The methodological approach followed involves the development of prototype virtual learning environments in a series of distinct phases. These are based on an ancient Egyptian game (senet) and are aimed at children at Key Stage Level 2 of the National Curriculum for education in England. Two-dimensional multi- media technology has been used to develop robust prototypes, which were then observed in use by children. The result of this study derived a set of design guidelines, which were then used to guide the implementation of a 3D CVE using the Deva CVE technology. Keywords Collaborative virtual environments, pedagogy, evaluation, interaction analysis, design guidelines, technology requirements. 1. INTRODUCTION This paper describes the Senet project, which has two main aims. Firstly, to develop design guidelines for the use of virtual actors in CVEs for learning. Secondly, to inform the development of underlying virtual actor technology. Virtual actors model human figures that move and function in CVEs [45]. They can either represent software agents or real human users [18,26]. Virtual actors are considered to be a key element of communication and interaction in CVEs [8]. One problem with developing virtual actor technology is that it tends to be driven by technological rather than user concerns (section 2). The Senet project was divided into three distinct phases in order to allow detailed user studies of an educational CVE (sections 3). In the first two phases prototypes have been developed using more mature multimedia and groupware technologies in order to identify requirements and design factors for an educational CVE (section 4, 5). A set of design guidelines is developed which is evaluated and extended in the third phase (section 6). In this a full CVE has been constructed using the Deva and studied to identify design factors concerning the use of virtual actors (section 7). 2. CVEs and pedagogy 2.1 CVES & VIRTUAL ACTORS CVE systems (in which distributed users are represented with virtual actors and participate in joint activities) have been used mainly to investigate technical issues, such as systems architectures to maintain performance [3,6,19] or spatial management and distribution of behaviour in shared worlds [31,44]. The development of technology has also been driven by the need to improve rendering techniques, in order to support photorealistic VEs and virtual human representations [5]. Human factors and user needs are important areas that have to be addressed in VEs and virtual actors’ development in order for this technology to fulfil its potential [24,38]. In the area of Virtual Actors, research has focused on: support for conversational interaction; facial expressions and face tracking [25,32]; body language [21]; navigation issues [9,29]; and ‘believability’ in terms of a virtual actor’s knowledge and intelligence [22,41]. However, the problem of producing autonomous virtual actors relies not just on physical models of human motion but also on modelling human behaviour. It is important that interfaces be designed explicitly with social interaction in mind [10,30]. Stephanidis [39] argues that the design of ‘new virtualities’ must draw upon the accumulated knowledge and results of established theoretical stands within the social sciences and /or psychology. One area where virtual actor design has been informed by social issues is the research on embodied conversational agents (ECAs) [8]. This research aims to develop agents, which can exhibit the same properties as humans do in face-to-face conversation. This includes being conversational in terms of emotion, personality and social convention, as well as human-like in the use of virtual actor body language. 2.2 PEDAGOGICAL AGENTS Recent years have seen a growing amount of research on the use of pedagogical agents [22]. Particularly important are animated pedagogical agents due to their ability to communicate nonverbally using gestures, gaze, facial expressions, and locomotion. Students perceive such agents as being very helpful, credible and entertaining [27]. One such example is Steve (Soar Training Expert for Virtual Environments), which focuses on multimodal behaviour generation and multimodal input [33]. Steve is an animated agent that can collaborate with human students in virtual worlds, with

Transcript of User centred virtual actor technology

User Centred Virtual Actor Technology Daphne Economou

Sony Broadcast and Professional Research Labs

Jays Close, Viables, Basingstoke, RG22 4SB, England

++44 (0) 1256 483700 [email protected]

William Mitchell Department of Computing and Mathematics, The Manchester

Metropolitan University Chester Street,

M1 5GD, England ++44 (0) 161 247 1493

[email protected]

Steve Pettifer Jon Cook

James Marsh Advanced Interfaces Group

The University of Manchester, Oxford Road, M13 9PL, England

++44 (0) 161 275 6259 [email protected]

ABSTRACT This paper argues that the development of virtual actor technology must be guided by application and end user needs. Two objectives drive the development of the ‘Senet’ project described in this paper: to develop a set of design guidelines for the use of virtual actors in Collaborative Virtual Environments (CVEs) for learning, and to inform the development of the underlying virtual actor technology following a user centred approach. The methodological approach followed involves the development of prototype virtual learning environments in a series of distinct phases. These are based on an ancient Egyptian game (senet) and are aimed at children at Key Stage Level 2 of the National Curriculum for education in England. Two-dimensional multi-media technology has been used to develop robust prototypes, which were then observed in use by children. The result of this study derived a set of design guidelines, which were then used to guide the implementation of a 3D CVE using the Deva CVE technology.

Keywords Collaborative virtual environments, pedagogy, evaluation, interaction analysis, design guidelines, technology requirements.

1. INTRODUCTION This paper describes the Senet project, which has two main aims. Firstly, to develop design guidelines for the use of virtual actors in CVEs for learning. Secondly, to inform the development of underlying virtual actor technology. Virtual actors model human figures that move and function in CVEs [45]. They can either represent software agents or real human users [18,26]. Virtual actors are considered to be a key element of communication and interaction in CVEs [8]. One problem with developing virtual actor technology is that it tends to be driven by technological rather than user concerns (section 2). The Senet project was divided into three distinct phases in order to allow detailed user studies of an educational CVE (sections 3). In the first two phases prototypes have been developed using more mature multimedia and groupware technologies in order to identify requirements and design factors for an educational CVE (section 4, 5). A set of design guidelines

is developed which is evaluated and extended in the third phase (section 6). In this a full CVE has been constructed using the Deva and studied to identify design factors concerning the use of virtual actors (section 7).

2. CVEs and pedagogy 2.1 CVES & VIRTUAL ACTORS CVE systems (in which distributed users are represented with virtual actors and participate in joint activities) have been used mainly to investigate technical issues, such as systems architectures to maintain performance [3,6,19] or spatial management and distribution of behaviour in shared worlds [31,44]. The development of technology has also been driven by the need to improve rendering techniques, in order to support photorealistic VEs and virtual human representations [5]. Human factors and user needs are important areas that have to be addressed in VEs and virtual actors’ development in order for this technology to fulfil its potential [24,38]. In the area of Virtual Actors, research has focused on: support for conversational interaction; facial expressions and face tracking [25,32]; body language [21]; navigation issues [9,29]; and ‘believability’ in terms of a virtual actor’s knowledge and intelligence [22,41]. However, the problem of producing autonomous virtual actors relies not just on physical models of human motion but also on modelling human behaviour. It is important that interfaces be designed explicitly with social interaction in mind [10,30]. Stephanidis [39] argues that the design of ‘new virtualities’ must draw upon the accumulated knowledge and results of established theoretical stands within the social sciences and /or psychology. One area where virtual actor design has been informed by social issues is the research on embodied conversational agents (ECAs) [8]. This research aims to develop agents, which can exhibit the same properties as humans do in face-to-face conversation. This includes being conversational in terms of emotion, personality and social convention, as well as human-like in the use of virtual actor body language.

2.2 PEDAGOGICAL AGENTS Recent years have seen a growing amount of research on the use of pedagogical agents [22]. Particularly important are animated pedagogical agents due to their ability to communicate nonverbally using gestures, gaze, facial expressions, and locomotion. Students perceive such agents as being very helpful, credible and entertaining [27]. One such example is Steve (Soar Training Expert for Virtual Environments), which focuses on multimodal behaviour generation and multimodal input [33]. Steve is an animated agent that can collaborate with human students in virtual worlds, with

the objective of helping them to learn how to perform physical, procedural tasks. Steve supports nonverbal signals to regulate the flow of conversation and turn taking in mixed-initiative dialogue, but doesn’t support other nonverbal signals that are closely tied to spoken dialogue. Another example is Gandalf, a solar system expert that helps users to learn about the planets [42]. Gandalf can travel to the planets, zoom in and out, and start and stop the planets from moving in their orbits. Communication with Gandalf relies on upper body movement tracking, eye tracking and speech recognition. A more advanced example of this is Rea (an extension of the Ymir system that Gandalf has been built on) [7], which explores the use of a vision system for tracking the user’s nonverbal communication. The agents that have been presented so far focus mainly on incorporating nonverbal interaction, sophisticated control and motion of figures and system architecture. Only a comparatively small amount of research is focusing on the social issues in the use of pedagogical agents [27,43].

3. Research methodology Current CVE and virtual actors technology is immature and not particularly robust. This has meant that many of the applications developed so far have been of a prototypical nature and used by restricted user groups. Roussos has called for much more exploratory work to be carried out in which novel learning applications are built and informally evaluated [34]. The Senet project follows this exploratory approach with user studies being carried out as observations rather than evaluations. To determine the requirements for the use of virtual actors in a CVE it is necessary to study a ‘real world’ situation. Only in such a situation do seemingly trivial problems arise that in reality may determine the success or failure of a system [20]. In order to study an authentic learning activity the research is based around the work of Manchester Museum's Education Service [28]. One particular strength of the Museum is its collection of every day life ancient Egyptian artefacts from the town of Kahun. It was decided to develop a learning resource based on one of the Kahun artefacts. The resource would be aimed at Key Stage Level 2 (9-11 years old) of the National Curriculum for education in England. The collaborative nature of the resource supports the role of collaborative learning in the National Curriculum as well as the UK Government's National Grid for Learning (NGfL) [13]. The particular Kahun artefact chosen for the CVE is senet, an ancient Egyptian board game for two players. Players take turns to throw a die. The object of the game is to “bear off” your 10 pieces first. The game allows both co-operation (in learning) and competition (in trying to win) to be studied. One problem in studying CVEs is the wide range of factors to be considered. Kaur has identified 46 properties relating to usability in VEs [24]. The number of factors increases dramatically when considering communication and collaboration issues introduced in CVEs. Additionally, when the population increases in a CVE for learning, issues of control and the structure of the learning process emerges [22]. This makes it difficult to isolate which design decisions are responsible for the overall effectiveness of the environment or to identify the inter-play between various factors (e.g. the effects that usability issues have on pedagogic issues). To overcome these problems, the development of the senet application was structured into three distinct phases [16]. The phases differ in 3 main ways: • population, the amount of users participating the activity • 2D/3D, use of a 2-dimensional environment simplifies issues

relating to navigation and the way in which objects are manipulated

• external/internal interaction, whether user interactions take place face-to-face or via the computer.

Each phase addressed a subset of the range of factors in CVEs and formed a particular situation to be studied. In the first two phases of the project a “low-tech prototyping” approach was adopted. In the first phase a single display groupware prototype was constructed using multimedia technology to study issues relating to playing the senet game. Interaction between users was external to the prototype. In the second phase a conventional groupware prototype was used to study issues involving internal to the system interaction and communication between remotely located users whilst playing the game. The user studies in the first two phases were used to identify a set of application requirements. From these a set of design guidelines derived. These design guidelines guided the implementation of the third phase using 3D CVE technology. This application was subsequently evaluated and the design guidelines further refined. This iterative approach provided several benefits: � it provided the means of managing the complexity of factors

by dealing with a manageable set of factors in each study (e.g. 2D/3D and population)

� it allowed the results of each study to inform the development of subsequent prototype applications

� in this way it allowed technology requirements to be progressively identified

The use of more robust technologies allowed essential features of the situation (interactivity and social communication) to be studied with real users in a way not possible with the current immature and inaccessible CVE technology. In summary, the method consists of the following stages: � identification of application requirements via “low-tech

prototyping” (first phase) � identification of application requirements via “low-tech

prototyping” (second phase) � development of design guidelines � design and implementation of the application using the Deva

CVE technology (third phase) � study of the Deva application in use

4. First phase The first phase prototype (see Figure 1) was an example of single display groupware [40], where interactions take place face-to-face in the real world external to the virtual environment. The prototypes were observed in use by the general public during an open week at Manchester Museum. Observations of school children were also conducted under more controlled conditions. The studies in this phase aimed to understand and identify the factors involved in a real world game playing situation: � the types of interactions that occur between the users and the

game environment � the communication between users (content and modes) � the roles that the users adopt in a game playing situation � controls over communication and the game playing activity The purpose of this study was primarily exploratory in nature. It gathered a rich set of qualitative information and identified usability issues surrounding the prototype, as well as technical issues like interface decisions that informed the design of subsequent studies. The studies also gathered requirements related to experimental settings and conditions for organising controllable provision for future studies. What stood out was mainly:

� the rich range of interactivity and social communication that needs to be supported in CVLEs

� the importance the expert being aware of and able to control even such a seemingly well structured activity as game playing.

The findings from this phase were subsequently incorporated into later phases, which is the main focus of this paper. Further details of the first phase are available elsewhere [14].

Figure 1 First phase prototype

5. Second phase 5.1 SECOND PHASE PROTOTYPES The second phase prototypes were developed so as the communication tools available for internal interaction emulated those that would be available in a full CVE. What was of interest was firstly how these tools were used and secondly, how external interactions had to be used to supplement them. The second phase prototypes consisted of a multi-user groupware environment, where the users were remotely located and communicated via NetMeeting (see Figures 2, 3). Participants had to communicate with each other with a range of communication tools: chat boxes (for typed communication), a hand (for pointing at things in the game) and a white board (for other communication e.g. drawing). The communication tools were designed to emulate the type of tools typical for CVEs. Three different prototypes were developed and observed: � 2-D senet prototype where the dialogue was external to the

game environment and the rules of the game appear as a permanent source of information on the wall (P1) (see Figure 2). This prototype supports two users, a child and an expert (played by a researcher) located in different rooms. Participants communicated by typing text in a NetMeeting chat box, which was external to the game window.

� 2-D senet prototype where the dialogue was internal to the game environment in “speech bubbles” associated to individual’s virtual actors (P2). This prototype supports two users. A palette allows the users to select their own representation.

� 2-D senet prototype, dialogue internal to the game environment (P3) (see Figure 3). This differs from the second prototype, as the population increases.

The second phase has been driven by factors that have been outlined in the literature and those identified in the first phase. It focused on understanding the effect of the internalisation of interactions and communication via the system tools, identifying

new factors arising and the way these affect the users’ behaviour. The focus of the second phase was: � interaction, with the CVE and objects in it � communication, including: turn taking, the communication

content in different stages of a session; the communication modes for delivering a topic; and the efficiency of the tools the system provided

� pedagogy, pedagogical tactics for delivering various topics � appearance, how the users were represented in the CVE � awareness, users’ perceptions of each other’s status of activity,

intention of action, association of actions with users

Figure 2 2-D senet prototype, dialogue external to the game

environment, which supports two users at a time (P1)

Figure 3 2-D senet prototype, dialogue internal to the game

environment with increased population (P3)

5.2 SECOND PHASE USER STUDIES The second phase studies took place at a local school over three school days. The subjects were twelve-year old children (year 7). Twenty-two children (11 pairs) participated the studies. Two rooms were used. One contained a researcher playing the role of the ‘expert’ and the second contained one or two children working on individual computers accompanied by a second researcher (the helper). In the first and second studies (using P1 and P2) only one child used the environment, the second child accompanying the expert and adopting the role of an observer. In the third study (using P3) both children used the environment. Questionnaires were used to find out background information about the children. Each session lasted approximately 45 minutes. Basic instructions about the system were given to the children at the start of the session and they were instructed to ask the expert for support. The expert and the child were both video taped. The text typed in the chat boxes was automatically saved to a file. Each session was followed up by an interview with the children

about their experiences, which lasted approximately ten minutes and was tape recorded.

5.3 A METHOD FOR STUDYING SECOND PHASE INTERACTIONS One option to study user interactions in the second phase was to use a method such as ethnomethodology, or conversation analysis. These methods contribute to understanding the production of social actions and activities and recognise the activities of others [1,4,37]. However, these methods are fine-grained, exhaustive and lead to a vast amount of rich qualitative data that make generalising about design factors difficult. Linguistically based methods such as Discourse Analysis [12], tend to be too narrowly focused on issues surrounding the dialogue itself. The method would have to be capable of managing a large amount of disparate data (2 video tapes, audio, observation notes, files containing the text dialogues). One candidate is Interaction Analysis, which studies human activities, such as talk, non-verbal interaction and the use of artefacts and technologies [23]. Interaction Analysis formed the basis of the method developed for use in the Senet project. It consists of 7 main steps: (i) data collection, (ii) transcription, (iii) chunking of the transcription, (iv) creation of a grid, (v) application of the grid, (vi) analysis at the session level, (vii) derivation of design guidelines. Further details are available elsewhere [15]. The main way in which the method differs from Interaction Analysis is the use of a grid. The grid generates quantitative information out of qualitative data that allows more concise set of findings for deriving design guidelines (DG). The grid is formed of analytic categories based on studies carried out in the first phase, a preliminary analysis of selective transcriptions of the second phase (to minimise possibility of ignoring factors that have not arisen in the previous situations) and analysis of existing CVEs [3,5,17]. The analytical categories identified included: physical activity (physical movements of the user such as: head movement, facial expression, position of the body, movements of the rest of the body), communication activity (this the modes of communication using text, pointing, speech, body language), turn taking (how turn boundaries were marked, interruption mechanisms users employed), external intervention (when complete breakdown occurs and real world intervention is needed) and pedagogy.

5.4 ISSUES ARISING The major problem encountered was in turn taking, namely when to recognise the start or end of a turn. One complicating factor was that in NetMeeting the mouse pointer is shared. Only one user at a time can have control. To gain control a user has to click their mouse. Claiming a turn thus involved simply taking control of the mouse pointer. Once the user had claimed control of the mouse pointer, its position gave a clue as to the user's intended activity (e.g. if the user had moved the pointer to their chat box then it could be assumed that they were about to type in some dialogue). Cues for the activity included: text appearing in a chat box; movement of the hand pointer; or manipulation of an object. However, it did become more difficult for the expert to be aware of who was in control as the population increased and various activities occurred simultaneously. Interruptions were a problem, especially in the first of the prototypes. The shared chat box required a user to press the return key before their contribution was displayed. This often resulted in one participant interrupting and taking control of the mouse (and thus the turn) because they were not aware that the other participant was perhaps thinking about and typing a message. The lag in the system added to these problems. This was particularly problematic when a lot of information needed to be

communicated via a multi-part turn (e.g. when the expert was trying to explain a rule). This became less of a problem in the second and third prototypes in which the chat boxes were associated with individual actors. In these cases text appeared as it was being typed, thus revealing to other participants the user's current activity. Also, in later sessions an informal protocol arose whereby a participant placed the mouse pointer in a particular area of the screen to signify availability and thus offer the turn to the other participant. Related to turn taking is the issue of control. Occasions arose where the children clicked the mouse constantly to take a turn, not allowing the expert to explain or intervene. In extreme cases this required external intervention from the helper. External intervention was also required when the child faced problems that the expert was not aware of (e.g. the child not being able to respond because they did not know to click on a chat box to activate it). External intervention was not only on the part of the helper though. Cases occurred where the observer child accompanying the expert intervened and told the expert to wait while the other child tried to move a piece. A variety of different teaching styles were employed, such as the reflection method (part of cognitive apprenticeship theory). For example, instead of giving direct feedback to the children about their actions the expert asked one child to confirm that the other child's action was correct, e.g. “Is this move correct?”, “What would you do if you were asked the same question?”. Such methods demand more open-ended responses. It was sometimes difficult for the expert to gauge the progress of these responses due to system lag issues. Text-based communication is important for pedagogical purposes, as literacy is a high profile skill in education. The chat boxes were satisfactory for explaining short pieces of information. In many cases the chat boxes were used in transcript fashion to access previous explanations. However, there were problems when longer pieces of information had to be communicated, due to the problems of system lag. One solution was to represent the text directly in the environment (e.g. rules were displayed on the wall). This also had the benefit of providing a resource that the children could return to. The children and expert were both observed using the chat box as a “transcript” to keep track of previous actions and dialogues (e.g. scrolling back over an explanation). This has important implications for pedagogy. The biggest problem faced by the expert was the number of times they were literally not in control in the environment. Whether it was due to lag or the child’s lack of awareness of the expert’s intended action it is not a situation that a teacher would want to see arise. This is an indication that one vital feature will be the ability to “freeze” the situation and take control and in some cases even rewind “virtual time”. Whether or not such apparently “unnatural” interventions in a teaching environment are appropriate, and exactly when such effects might be required in a 3D CVE is a matter for future investigation.

6. Development of design guidelines Twenty-two design guidelines have been produced. They attempt to capture how a collaborative learning application should be designed using CVE technology. The guidelines have been divided into three sections: � the environment � virtual actors � teacher actor

These are outlined below, followed by a brief discussion that justifies their derivation based on the studies using the single display groupware and the conventional groupware applications.

6.1 ENVIRONMENT DG1 to DG4 address issues regarding general tools that CVEs for learning should provide. DG1: users need to have simultaneous control in the CVE. A major problem in the conventional groupware application was turn taking, particularly one user interrupting another. This was because the technology allowed only a single user to be in control of the pointer (and thus in overall control) at a time. Simultaneous control would reduce the need for interruptions. DG2: a history of the communication activity needs to be kept in the CVE. Children were observed scrolling back over chat boxes in order to examine previous communications. Confusion over the rules of the game was occasionally resolved “locally”, or without reference to the teacher by looking for confirmation of a rule stated previously. The chat boxes have been effectively used as logs of the session. A history tool would: allow users to catch up on things they might have missed when their attention was focussed elsewhere; allow monitoring of a student’s progress; help users that join a session late to catch up. DG3: a history of the physical activity needs to be kept in the CVE. It was difficult for children and the expert to keep track of a sequence of actions (e.g. what number was thrown on the die and how many places a piece had been moved). In some cases observing child had to point out to the expert what she had missed, or the expert had to ask a child to repeat their move. DG4: a permanent information resource related to the educational activity should be available in the CVE. Observations showed that the teacher directed children to such a source of information (e.g. the rules on the wall in Figure 2) at the beginning of the session and when they encountered problems. There were also cases when the children referred to the rules without being directed.

6.2 VIRTUAL ACTORS DG5 to DG19 address issues regarding the virtual actors’ features in CVEs for learning. DG5: the virtual actors should be aesthetically pleasing. Children come with high expectations from their exposure to the high performance computer graphics available in most computer games. They were very positive about the appearance of the actors during interviews in the second phase. DG6: a virtual actor should convey the user’s presence and identity. Presence in the environment can be distinguished from physical location [3,36]. The actors in the conventional groupware prototype were static so physical location was not an issue. Children were able to establish their own identity by choosing a representation and identifying themselves via dialogue. Children also established identity by the “marking” of territory (e.g. when a child captured an opponent’s piece it was moved to the side of their virtual actor) (see Figure 3). DG7: a virtual actor should convey the user’s role in the CVE (e.g. child, expert). Following on from DG6, different representations were used for children and for the expert. DG8: a virtual actor should convey the user’s viewpoint. The viewpoint is the area in space to which the user is attending. The ability to perceive other’s viewpoints is critical for supporting interaction [3,36]. Being aware of other users scrolling chat boxes provided a valuable cue as to what they were focusing on. DG9: a virtual actor should reveal the users’ actionpoint. The action point is the area in space that the user is manipulating [3].

The actionpoint was indicated by the cursor in the second phase prototypes (e.g. which particular piece was being moved). DG10: a virtual actor should be easily associated with its communication. In P1 children had to shift their attention between the main environment window and the text windows to follow dialogues (see Figure 2). This disrupted their engagement with the environment. While in P2 and P3, each virtual actor had an associated chat box, which was placed inside the environment, above the actor. This help to reduce disruption (see Figure 3). DG11 a virtual actor should convey the user’s communication as it is being composed. One problem with turn taking arose in P1 was users being interrupted while the other user was composing a chat message (other users would only see the message once it had been finished). Having text appear as it was being composed (in P2, P3) led to a reduction in these interruptions. DG12 a virtual actor should convey that the user is in the process of an activity. Following on from DG11 is the need to be able to perceive other actions as well. When children were reading the rules they would sometimes use the mouse pointer to scan the lines, this provided an important cue to the expert. DG13: a virtual actor should convey the user’s intention to take a turn. The conventional groupware prototype did not provide an explicit way of claiming a turn. This led to several problems with interruptions or failure to take turns. DG14: a virtual actor should convey the user’s offering of a turn. Interruptions also occurred due to a user not knowing when another user’s turn had been completed. This was overcome to some extent by the development of informal turn taking protocols (positioning of the pointer in a particular are of the screen to indicate turn completion). DG15: the speaker needs to be identified even when their virtual actor is out of other users’ viewpoints. In the second prototype, chat boxes were physically associated with actors. As actors were static, it was not possible for a listener to move out of view of the speaker. With a 3D environment this becomes more of an issue. DG16: a virtual actor should convey the user’s intention to take a turn even when not being in other users’ viewpoints. This follows on from DG15. Cases of the helper interrupting the children while standing behind them by talking directly to them emphasised the need for interruption mechanisms even in cases that the users are not visible to others. It will also be useful when the population in a CVE increases. DG17: the virtual actor should convey the user’s offering of a turn even when being out of other users’ viewpoints. Following on from DG16. DG18: private communication should be supported. When children were co-located in the same room, external communication (such as whispering) provided an effective means of private communication to answer each other’s queries and decide upon turn taking. Similar communication internal to the system had the effect of “disrupting” other activities. An efficient way of supporting private communication channels internal to the environment needs to be provided. DG19: a virtual actor should show when the user is involved in private communication. Private communication was accompanied by nonverbal behaviour (e.g. a child leaning from the side of their computer to see the child they were whispering to). It also follows on from DG12.

6.3 TEACHER VIRTUAL ACTORS DG20 to DG22 address issues regarding knowledgeable user’s behaviour and controls over other users in the CVEs for learning. DG20: the teacher should have control over an individual user’s viewpoint. Observations showed that one of the most important interactions was the teacher directing and focusing the children’s

attention on an on-going activities (e.g. a demonstration), or a certain part of the screen (e.g. a certain part of the rules). This is also supported in the literature [42]. DG21: the teacher should be in able to take control of the session. The biggest problem faced by the expert in the second phase, was the number of times she was literally not in control in the environment. Whether it was due to lag or the child’s lack of awareness of the expert's intended action it is not a situation that a teacher would want to see arise. The expert needs to be able to keep order. In the second phase study, this was done by asking the children to stop activities. In some cases this was difficult if one of the children had control of the pointer (and thus control of the interaction). A more direct means of taking control (e.g. freezing user actions) is needed. DG22: the teacher should be aware of and have control over private communication between children. Following on from DG19, the expert needs to be able to monitor private communication to make sure the children or “on task” or to decide whether an intervention is necessary.

7. Third phase The design guidelines identified in the previous two phases guided the design of the third phase prototype. The technology used is the Deva 3D CVE tool developed by Manchester University [31]. User studies of this environment allow the design factors identified previously to be tested. The results of these studies will in turn be used to guide the development of the underlying Deva technology.

7.1 THIRD PHASE PROTOTYPE In the Deva prototype users are remotely located and interaction is internal to the CVE. Unlike the NetMeeting prototype there is no shared pointer, users can type or move objects independently of each other (this satisfied DG1). The environment contains a 3D representation of the artefacts related to the game (see Figure 4). Each user is represented by their own virtual actor that reveals the user’s role (this satisfied DG6, DG7). The virtual actors are fully functional which means that a user can: � walk in the CVE, indicating the user’s position in the space � point (in which case there is a “laser pointer” from the hand) � select an object close to them by positioning the actor's hand

so that it touches the object � select and move an object far from them by pointing (in which

case a “laser pointer” from the hand indicates selection) These satisfied DG9. Users communicate by typing into a dialogue box. Second phase results indicated the importance of the text being embedded within the context of the CVE. Thus, communication was text rendered in 2D on the near drawing plane of the virtual environment, positioned so as to appear above the “speaking” actor’s head once the users start typing (without activating a text box) and merged with the 3D components of the CVE. This technique has the benefit of allowing several sentences of text to be rendered clearly on the screen independent of the actual position of the user's actor in the virtual world. The text moves with the position of the actor, but is not subject to aliasing or depth buffering problems (this satisfied DG10, DG11). It would not be possible to display the comparatively large amounts of text accumulated during the exchanges of the game. This historical text was instead displayed in a “transcript” window outside the game environment (this satisfied DG2). This was on the basis that when a child requires access to the transcript to resolve an issue the engagement with the current activities in the environment is inevitably interrupted to some extent. Although this is a hypothesis that remains to be confirmed by the user

studies, the second phase studies lend a degree of confidence to the design decision. One problem is being aware of an actor who is outside the field of view. When such an actor “talks”, an empty dialogue box appears at the left or right edges of the screen, depending on their location in the CVE. The dialogue appears in the transcript window when the return key is pressed (this satisfied DG15). As in the second phase, the problem of presenting a large set of rules was solved by displaying them on the wall (this satisfied DG4). The participant can obtain information about the game by using the dialogue box to ask the expert questions, or by reading the rules that appear on one of the walls of the environment. The expert can respond by typing text in the dialogue box, by pointing to rules on the wall, or demonstrating something in the environment. Some of the design guidelines could not be followed. This was because larger changes to the Deva technology were required to support their implementation. In this case there was an issue of providing a stronger evidence about the necessity of developing the technology following certain directions. Examples of the latter case include DG13 and DG14 concerning turn taking. DG5 could not be followed as actors implemented in Deva support only static texture maps as faces, thus limiting the level of detail, and possibilities for delivering facial expressions or body language. Although the system already includes sophisticated radiosity rendering software, enabling aesthetically pleasing and realistic lighting models to be used, attempts to make actors more realistic lead to a greater rendering load and slower performance. A current focus is on developing the Deva actors so that they more closely match the actors as envisaged by the designers.

Figure 4 The senet prototype in Deva

7.2 USER STUDIES A school visit was set up for the third phase and conducted in laboratories of the Advanced Interfaces Group at the University of Manchester. This was due to the difficulty of taking equipment into the school environment because currently the CVE technology is not easily portable. Twelve (6 pairs), of twelve-year-old children participated in the studies. Three rooms were used, one contained a researcher playing the role of the expert, and the other two rooms each contained a child accompanied by a second researcher (the helper). The children were video taped individually. The duration of each session was 45 minutes followed by 10 minutes of discussion with the children, which was tape-recorded.

A monitor close to the expert displayed the children’s individual fields of view (see Figure 5). The ‘think-aloud’ method was used to record the expert’s thoughts while playing with the children. This was decided in order to more closely study the problems of the expert's lack of awareness of the child's situation as identified in the second phase studies.

Figure 5 The physical set up of the third phase study

7.3 RESULTS The data gathered is being analysed using the method outlined previously (see section 5.3). Some of the major findings are reported below.

7.3.1 AWARENESS OF OTHERS A virtual actor was successful in conveying information about the user’s viewpoint, actionpoint, and the activity they are in the process of doing (e.g. typing, reading the rules, navigating) (DG8, DG9, DG12). The fact that the application was a 3D environment increased the chances of events occurring outside of a user’s viewpoint (DG15, DG16, DG17). This might be because something was blocking a user’s view (e.g. someone else’s actor), or the user was physically located far away from the event. When an actor speaks who is out of viewpoint, a text box appears at the left or right edges of the listener’s screen depending on the speaker's location relative to the listener. A prompt is also provided to indicate who is talking (e.g. “the user’s name is talking”). This was very useful as it increased the user’s awareness of the on going dialogue. This also suggested another design guideline concerning a user being aware of other users’ actions, as well as communication outside their viewpoint. A large number of activities taking place outside the field of view could lead to an overcomplicated display. This raises deeper issues over what “grain” of action would need to generate prompts.

7.3.2 TURN TAKING Turn taking, one of the major problems in earlier studies, appeared to be much easier in Deva. One reason was that DG1 was satisfied (users had simultaneous control and did not have to share a pointer). Another reason that turn taking was smoother was due to following DG8, DG9 and DG10. The virtual actor conveyed a lot of information about the user’s viewpoint, actionpoint, and the activity they are in the process of doing (e.g. typing, reading the rules, navigating), which increased other’s awareness of the user’s current activity and communication.

The position of the virtual actor also conveyed information about the user’s intention (see DG12). For example, a virtual actor moving close to the board, usually meant that the user was going to interact with the objects on it. Another example, a child would turn their actor to the expert after completing their turn to seek feedback. There were no explicit mechanisms built in for turn taking. The only way was to negotiate by typing text messages. The interruption mechanism the expert used was to be within the children’s viewpoint and send a message. Children rarely interrupted. This was either to remind their co-players to take a turn, or ask the expert to repeat something.

7.3.3 COMMUNICATION Satisfying DG10 (presenting a user’s communication as a text box above their virtual actor) helped in making explicit who the speaker was. One problem occurred when virtual actors were positioned close to each other. In this case one text box would obscure another. In such cases the users resorted to using the transcript window (see DG2). Another problem was that the text remained in a text box until the user next typed something. This was confusing because it appeared as if the speaker was currently referring to something that was in fact referred to much earlier. This indicates that some means is needed of making a text box disappear after a certain period of time. The text boxes also had a benefit in focusing the viewpoint. To follow the ongoing dialogue, a user would have to turn to see someone else’s text box. They thus also saw the actions that the speaker might be doing whilst talking (e.g. the teacher demonstrating the information as she talked). Interviews with the children after the sessions highlighted that although textual communication was sufficient, there were points that it was tiring and audio communication would be preferable. One child’s comment (typical of many) was that: “if the application was not playing this type of a game but shooting, he would not bother talking because by the time he would have to type a message he would be shoot”. This shows that while text based communication was suitable for this specific application (for educational and literacy reasons) it is not necessarily so for other types of applications. In some cases the expert had to spend extensive time explaining things to one of the children. This was frustrating for the other child, as borne out by comments such as “not again” when the expert asked the child to do another action again. This emphasized again the need for private communication channels (DG18). This will be an issue for further investigation, especially when the population in the environment increases.

7.3.4 PEDAGOGICAL ISSUES Pedagogical issues divided into three main categories: � support for educational resources � support for various styles of teaching/learning � support for practical issues of keeping order and managing the

learning situation Following DG4 and providing the rules as a permanent source of information in the CVE was very useful. Children would walk to the wall and read the rules either after being directed by the expert or on their own initiative. However, Deva supported this via a texture map and the resource was static. This suggested a further technology requirement of incorporating multimedia display techniques in the CVE. The CVE supported a variety of different teaching styles. The instructional style was supported by permanently displaying the rules on the wall. Children could also learn to play the game by observing others playing [2,35]. In some case a style based on the

cognitive apprenticeship approach [11] was used, the expert gradually removing support as the children became familiar with the game. Practical management was eased by providing support for the expert to be aware of the activities of the children (DG8, DG20). At a simple level this was done by displaying the children’s viewpoints on a separate monitor next to the expert (Post-it-notes were used to associate the viewpoints with particular children) (see Figure 5). This was found to be very helpful for the expert in terms of deciding when and how to support the children. For example, when the expert saw that a child had a poor view of the board they were able to ask the child to move closer to the board to see better. The expert made use of this tool when trying to attract a child’s attention. In such a case the expert would position her actor so that it was visible to the child (i.e. within their viewpoint) before speaking to them. At another level the expert needs to be aware of past actions of the children. This could be for assessment purposes or simply to see what move the child had made last. This suggests that the Deva technology needs to be changed to provide mechanisms for managing “virtual time”, as DG3 suggests. This would allow past events to be viewed. There was no explicit mechanism for the expert to take control of the session (DG21). In general this was not a problem but in some extreme cases children took advantage of this (e.g. kept rolling the dice until getting a suitable score for capturing a piece, or moving pieces off the board without being permitted to do so). This highlighted the need for implementing mechanisms to support DG20. A virtual time mechanism would perhaps allow the events in a session to be “frozen” in order to restore order.

8. CONCLUSION There are several directions for future work. A particular type of educational situation was chosen for investigation. This meant that certain issues took priority (e.g. the use of text rather than audio communication). Also, the population used was limited to a maximum of three users. Increasing the population will have an effect on viewpoints and the need for private channels of communication. The phased approach allowed the large number of factors involved to be handled in a manageable way. The first two phases focused on the requirements of the learning application. The third phase focused on particular issues concerning the move to a 3D environment (e.g. the problems of location and viewpoints and its impact on user awareness). The use of a set of design guidelines made the project more focussed. The design guidelines captured in a more formalised way the application requirements obtained in the first two phases. They provided a systematic way of expressing these requirements during the implementation of the CVE version of the application in the third phase. They also provided a way of focusing the studies throughout the project. Technology requirements can derive by analysing where design guidelines could not be followed. Paying careful attention to the problem area and application requirements revealed areas of concern for designing “real-world” problems. For example, this included the importance of text rather than audio for communication because of various educational and literacy requirements. While the educational situation investigated is quite specific in terms of age group and curriculum we believe that the guidelines are generalisable to some degree. What became apparent over the course of the project was that the key pedagogical issue was not that of which particular theory of learning had to be supported. Rather the emphasis became the practical issues of keeping track of and controlling children in an educational situation. These

practical issues underlie most educational situations. From that point of view, we believe that the design guidelines can be generalised not only to other educational situations but also more general situations where such control is desirable.

9. Acknowledgements Thanks to the State Scholarships Foundation of Greece for funding Daphne Economou's Ph.D., the MMU Manchester Multimedia Centre for use of their facilities, Claremont Road Primary School and Knutsford High School, the ‘Advanced Interfaces Group' at Manchester University and especially Sylvain Daubrenet and Simon Gibson for their co-operation.

10. REFERENCES [1] Atkinson, J.M., and Heritage, J. Structures of Social

Actions: Studies in Conversation Analysis. Cambridge University Press, 1984.

[2] Bandura, A. Social Learning Theory. General Learning Press 1971.

[3] Benford, S., Bowers, J., Fahèn, L., Greenhalgh, C., and Snowdon, D. User Embodiment in Collaborative Virtual Environments. In: Proceedings of CHI’95 (Denver CO, May 1995), ACM Press, 242-249.

[4] Boden, D., and Zimmerman, D.H. Talk and Social Structure: Studies in Ethnomethodology and Conversation Analysis, Polity, 1991.

[5] Capin, T.K., Pandizc, I.S., Chauvineau, E., Thalmann, N.M., and Thalmann, D. Modeling and Animation of Virtual Humans. COVEN D4.1 1996, CEC Deliverable Number: AC040-GEN-MIR-DS-P-041.b0.

[6] Carlsson, C., and Hagsand, O. DIVE- A Platform for Multiuser Virtual Environments. Computers and Graphics 17, 6, 1993, 663-669.

[7] Cassell, J., Bickmore, T., Campbell, L., and Vilhjálmsson, H. Human conversation as a System Framework: Designing Embodied Conversational Agents. In: Embodied Conversational Agents, Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (eds.), MIT Press, 2000, 29-63.

[8] Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. Embodied Conversational Agents. MIT Press 2000.

[9] Cavazza, M.A. Conversational Agent for Interactive Television. In: Proceedings of the Workshop on Intelligent Virtual Agents VA’99 (Salford, September 1999), 57- 66.

[10] Churchill, E.F., Cook, L., Hodgson, P., Prevost, S., and Sullivan, J.W. May I help you: Designing Embodied Conversational Agent Allies. In: Embodied Conversational Agents, Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (eds.), MIT Press, 2000, 64-94.

[11] Collins, A., Brown, J.S., and Newman, S.E. Cognitive apprenticeship: teaching the crafts of reading, writing and mathematics. In: Knowing, Learning, and Instructions: Essays in Honor of Robert Claser, Resnick, L.B. (ed.), Hillsdale, 1989, 453-494.

[12] Coulthard, M., Montgomery, M., and Brazil, D. Developing a Description of Spoken Discourse. In: Studies in Discourse Analysis, Coulthard, M., and Montgomery, M. (eds.), Routledge, 1981, 1-50.

[13] DfEE Open for Learning, Open for Business: The Government’s National Grid for Learning Challenge. Department of Education and Employment, U.K., November 1998, URL: http://www.dfee.gov.uk/grid.

[14] Economou, D., Mitchell, W.L., and Boyle, T. Requirements Elicitation for Virtual Actors in Collaborative Learning

Environments. Computers & Education 34, 3-4, 2000, 225-239.

[15] Economou, D., Mitchell, W.L., and Boyle, T. Towards a user-centred method for studying CVLEs. In: Proceedings of ERCIM WG UI4ALL joint Workshop with i3 Spring Days 2000 on Interactive Learning Environments for Children (Athens, March 2000).

[16] Economou, D., Mitchell, W.L., and Boyle, T. A Phased Approach to Developing a Set of Requirements for the Use of Virtual Actors in Shared Virtual Learning Environments. In: Proceedings of ED-MEDIA & ED-TELECOM’98 (Freiburg, June 1998), AACE Press, 1620-1621.

[17] EPFL, Geneva, IIS, TNO, Lancaster, UCL, Nottingham, SICS, Division, Thomson, KPN. Guidelines for building CVE applications. COVEN D2.6 1997. van Liempd, G. (ed.), CEC Deliverable Number: AC040-KPN-RESEARCH-DS-P-026.b1 http://chinin.thomson-csf.fr/projects/coven/.

[18] Granieri, J.P., and Badler, N.I. Simulating Humans in Virtual Reality. In: Virtual Reality Applications. Earnshaw, R.A., Vince, J.A. and Jones, H. (eds.). Academic Press, 1995, 253-269.

[19] Greenhalgh, C., and Benford, S. MASSIVE: A Collaborative Virtual Environment for Tele-conferencing. ACM Transactions on Computer Human Interfaces (TOCHI’95) 2, 3, ACM Press, 1995, 239-261.

[20] Gunton, T. (ed.) Information systems practice : the complete guide. Manchester : NCC Blackwell, 1993.

[21] Guye-Vuilleme, A., Capin, T.K., Pandzic, I.S., Thalmann, N.M., and Thalmann, D. Nonverbal Communication Interface for Collaborative Virtual Environments. In: Proceedings of Collaborative Virtual Environments CVE’98 (Manchester, June 1998), 105-112.

[22] Johnson, W.L., Stiles, R., and Munro, A. Integrating Pedagogical Agents into Virtual Environments. Presence 7, 6, 1998, 523-546.

[23] Jordan, B., and Henderson, A. Interaction Analysis: Foundations and Practice. The Journal of Learning Sciences 4, 1, 1995, 39-103.

[24] Kaur, K. Designing virtual environments for usability. PhD Thesis, City University, 1998.

[25] Koda, T., and Maes, P. Agents with Faces: The Effects of Personification of Agents. In: Proceedings of HCI'96 (London, August 1996), The British HCI Group, 98-103.

[26] Laurel, B. Interface agents: Metaphor with character. In: The Art of Human-Computer Interface Design. Laurel B. (ed.), Addison-Wesley, 1990, 355-365.

[27] Lester, J.C., Towns, G.S., Callaway, B.C., Voerman, L.J., FitzGerald, J.P. Deictic and Emotional Communication in Animated Pedagogical Agents. In: Embodied Conversational Agents, Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (eds.), MIT Press, 2000 123-154.

[28] Mitchell, W.L. Moving the museum onto the Internet: The use of virtual environments in education about ancient Egypt. In: Virtual Worlds on the Internet, Vince, J.A., and Earnshaw, R.A. (eds.), IEEE Computer Society Press, 1999, 263-278.

[29] Monsieurs, P., Coninx, K., and Flerackersm, E. Collision Avoidance and Map Construction Using Synthetic Vision.

In: Proceedings of the Workshop on Intelligent Virtual Agents VA’99 (Salford, September 1999), 33-46.

[30] Nass, G., Steuer, J., and Tauber, E.R. Computers are social actors. In: Proceedings of CHI’94 (Boston, April 1994), ACM Press, 72-78.

[31] Pettifer, S. An operating environment for large scale virtual reality. PhD Thesis, The University of Manchester, 1999.

[32] Poggi, I., and Pelachaud, C. Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents, Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (eds.), MIT Press, 2000, 155-188.

[33] Rickel, J., and Johnson, W.L. Task-Oriented Collaboration with Embodied Agents in Virtual Worlds. In: Embodied Conversational Agents, Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (eds.), MIT Press, 2000, 95-154.

[34] Roussos, M., Johnson, A., Moher, T., Leigh, J., Vasilakis, C., and Barnes, C. Learning and Building Together in an Immersive Virtual World. Presence 8, 3, 1998, 247-263.

[35] Salomon, G. Interaction of Media, Cognition, and Learning. Jossey-Bass 1979.

[36] Shu, L., and Flowers, W. Teledesign: groupware user experiments in three-dimensional computer-aided design. Collaborative Computing 1, 1, 1994, 1-14.

[37] Silverman, D. Qualitative Research: theory, method and practice, Silverman, D. (ed.), Sage, 1997.

[38] Stanney, K.M., Mourant, R.R., and Kennedy, R.S. Human Factors Issues in Virtual Environments: A Review of the Literature. Presence 7, 4, 1998, 327-351.

[39] Stephanidis, C., Salvendy, G., Akoumianakis, D., Arnold, A., Bevan, N., Dardailler, D., Emiliani, P.L., Iakovidis, I., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Oppermann, C., Stary, C., Tamura, H., Tscheligi, M., Ueda, H., Weber, G., and Ziegler, J. Towards an Information Society for All: HCI challenges and R&D recommendations. International Journal of Human-Computer Interaction 11, 1, 1999, 1-28.

[40] Stewart, J., Bederson, B.B., and Druin, A. Single Display Groupware: A Model for Co-operative Collaboration. In: Proceedings of CHI’99 (Pittsburgh, May 1999), ACM Press, 286 -293.

[41] Terzopoulos, D., Tu, X., and Grzeszczuk, R. Artificial fishes: Autonomous locomotion, perception, behavior, and learning in a simulated physical world. Artificial Life 1, 4, 1994, 327-351.

[42] Thórisson, K.R. Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. Ph.D. Thesis, MIT, 1996.

[43] van Mülken, S., André, E., and Müller, J. The Persona Effect: How Substantial Is It? In: Proceedings of HCI'98 (Sheffield, September 1998), 53-66.

[44] West, A.J., Howard, T.L.J., Hubbold, R.J., Murta, A.D., Snowdon, D.N., and Butler, D.A. A Generic Virtual Reality Interface for Real Applications. In: Virtual Reality Systems, R.A., Earnshaw, M.A., and J.H., Gigante, (eds.). Academic Press, 1993, 213-236.

[45] Zeltzer, D., and Johnson, M.B. Virtual Actors and Virtual Environments. In: Interacting with Virtual Environments. MacDonald, L. & Vince, J.A. (eds.), John Wiley & Sons Ltd., 1994, 229-255.