Vocal interaction in collocated cooperative design

Vocal Interaction in Collocated Cooperative DesignAlistair Jones∗, Atman Kendira†, Claude Moulin∗, Jean-Paul A. Barthes∗, Dominique Lenne∗, and Thierry Gidel†

∗UMR CNRS HeudiasycUniversite de Technologie de Compiegne, Compiegne, France

Email: {alistair.jones,claude.moulin,barthes,dominique.lenne}@utc.fr†Laboratoire COSTECH

Universite de Technologie de Compiegne, Compiegne, FranceEmail: {atman.kendira,thierry.gidel}@utc.fr

Abstract—Using vocal interfaces in complex applications leadsto more intuitive interactions. At UTC we have built a systemincluding a large graphics table and peripheral devices forsupporting preliminary cooperative design using multimodal in-teraction. The paper details the architecture and implementationof vocal interfaces, using two different multi-agent platforms,including a platform of cognitive agents.

Index Terms—Multi-agent systems, cognitive agents, multi-modal interaction, vocal interfaces.

I. INTRODUCTION

Our research group at the UTC is interested in using largeinteractive surfaces for supporting cooperative preliminarydesign. We have initiated the TATIN-PIC project (Frenchacronym for Tactile Interactive Tabletop - Intelligent Platformfor Design) to design such an environment and study suchan approach. The project integrates two interactive surfaces;a tabletop prototype we have previously constructed and aninteractive board as shown Fig. 1, as well as a number of pe-ripheral devices like tablets, or earphones for vocal interaction(see Kendira et al. [12] for details).

Multi-surface work environments allow capitalizing upondifferent surfaces (vertical, horizontal, private), using differentkinds of collaboration (side-by-side, face-to-face) and promot-ing group awareness or maintaining users’ privacy [3], [4].We have targeted various phases of the preliminary designof engineering projects, e.g. brainstorming, risk assessment,task organization, causal analysis, etc. Such phases are groupcreativity exercises designed to promote collaboration andcommunication around a predefined problem. The goal is tothoroughly explore as much of the solution space as possibleto uncover innovative solutions and sidestep hidden risks atthe beginning of an engineering project.

We believe that voice activated intelligent agents can pro-vide valuable assistance to augment the creativity of the usersin this work environment. First, they can alleviate certaindeficiencies inherent in the ergonomics of such a system. Forexample, we have observed that users have difficulty usingonscreen keyboards to enter text on the tabletop system, andintelligent agents can use speech-to-text technology to helpalleviate this problem. Second, intelligent agents can providemultimodal support for the user interface. Multitouch inter-faces often suffer from excessive clutter of onscreen widgetsas documented in [5]. This is partially because buttons must

Figure 1. A brainstorming session around our interactive tabletop andinteractive board

be larger for accurate touch input and options can not be aseasily hidden inside contextual menus, as they are tradition-ally done in mouse-drive WIMP interfaces. Intelligent agentscan provide multimodal support such as context switchingto simply multitouch interaction, as detailed in [8]. Finally,performing preliminary design often requires accessing datastored in previous projects or knowledges bases. Intelligentagents can use ontologies to provide access to such dataand make suggestions to the participants to complement theirpreliminary design work.

In order to better understand these problems, we will firstgive a brief overview of our interactive work environment.Next, we will provide a detailed description of our multi-agent middleware implemented in our system which includespersonal assistant agents and staff agents built using twodifferent toolkits. Then, we will proceed with an in-depthpresentation of the personal assistant agents dialog engine,ontology management and task library. Our main contributionin this paper is to provide a detailed description of the inter-nal operations of our personal assistant agent for computer-supported cooperative work. We conclude with a discussion ofthe advantages and disadvantages of such a system, as well aspossible avenues for our future work.

https://www.researchgate.net/publication/4215902_Forlines_C_MultiSpace_Enabling_electronic_document_micro-mobility_in_table-centric_multi-device_environments?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/228992754_Display_factors_influencing_co-located_collaboration?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/228964553_Reducing_clutter_on_tabletop_groupware_systems_with_tangible_drawers?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/255982786_Speech_Augmented_Multitouch_Interaction_Patterns?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

II. USER INTERFACE AND VOCAL INTERACTION

In this section, we present the interface of the TATIN-PICplatform with a team brainstorming scenario on the interactivetabletop and the examples of vocal interaction options on thetable.

A. Brainstorming Scenario

Figure 2. Part of the brainstorming interface showing Post-it notes, virtualkeyboard and command menu.

To introduce vocal interaction, let us consider the brain-storming phase of the preliminary design process during whichparticipants generate ideas and write them on virtual Post-it R©

notes as one can see on Fig.2. Before the brainstorming sessionstarts, each participant has a circular menu which allows themto open up a virtual keyboard to create and modify virtual Post-it notes. These notes can be grouped using a lasso gesture.

At the start of the brainstorming session, a central questionor problem is presented to the group by a moderator. In thestages of a traditional brainstorming, first, the participantsindividually generate as many ideas as possible toward findinga solution to this problem, placing each one on its own Post-itnote. Next, participants each present their ideas to the team,where duplicate ideas are deleted and additional ideas can becreated. Finally, the team collaboratively organizes these ideasinto groups and sub-groups and adds labels to these groups. Formore examples of the brainstorming process within interactiveenvironments see Tse et al. [15], Hilliges et al. [7], Geyer etal. [6] or Clayphan et al. [2].

One of the leading measures of a successful brainstormingsession is the number of Post-it notes, participants are able toproduce during the idea generation step. If users are inhibitedfrom creating Post-it notes with ease and speed, whether forsocial or technical reasons, this is referred to as productionblocking [7]. Moreover, the final stage of the brainstormingsession, where participants must collaboratively group Post-itnotes and label them, is done so in an affinity diagram styleinteraction, an aspect highlighted in [2]. These means Post-itnotes are often grouped, ungrouped, labeled, and relabeled sev-eral time before a satisfactory solution if found by the group.These are some of our primary motivations for adding vocalinteraction and multimodal interaction via personal assistantagents.

B. Examples of Requests

This section gives first examples of participants’ requests, toillustrate our user interaction with the personal assistant agents.During the brainstorming phase, voice interaction can be usedto create a new Post-it note, or dictate the content of a note(usually a short phrase), or, combined with graphics gestures,gather notes together and give the resulting group a label. Eachparticipant is equipped with an earphone and communicatesvocally with a personal assistant agent (PA). The speech istreated using a speech-to-text engine, and the resulting stringof text is what our interactive agent will act upon. Theparticipant also has the option displaying a window on theinteractive tabletop which shows the result of the speech-to-text algorithm,to give the user a better understanding of whatthe agent is hearing. The agent responds to the participantusing text-to-speech software which is played directly into theearphone of the user. Once the user issues a vocal request, theagent can ask the user for more information concerning therequest or for clarifications. We start with a simple example,then address more complex multi-modal interaction.

Category 1: Simple Vocal Request An example of aninteraction in this first category can be creating a blank Post-itnote. The participant will say something like: ”create a newPost-it.” She will see the text of her request displayed on thetabletop, and the agent will create a new Post-it by her menu.The personal assistant agent will respond with a affirmationthat the Post-it was created, and will supply the ID of thePost-it note.

Category 2: Compound Vocal Request In this case theutterance would be for example: ”Please create a Post-itentitled market survey.” Processing in that case is similar to theprevious case with the difference that the PA will retrieve thephrase ”market survey” and make it the content of the Post-itnote.

Category 3: Multimodal Request: One of the simplestmultimodal interaction is to select a Post-it note with a gestureand either modify its content or delete it. The input in this casewill be a touch event on a Post-it note, and the vocal request”delete it.” It requires that the section process provide the noteinternal id to the PA, in order to let the PA act on the relevantobject. The problem in that case is more complex, becauseof possible ambiguities due for example to several notes beingselected on the table by different participants at the same time.

The next section gives an overview of the system architec-ture, and describes the personal assistant agents interacts withthe rest of the multi-agent architecture.

III. ARCHITECTURE OF THE TATIN-PIC SYSTEM

The table has a common surface controlled by a set ofagents, and a number of peripheral small computers (mini PCs)are attached to each user (Fig.3). Each mini PC contains a setof agents, including a personal assistant agent (PA), and thevocal I/O software adapted to a particular participant.

The TATIN interactive tabletop and interactive board arecontrolled by a multi-agent system as shown Fig.4. One of theagents controls the table, one controls the interactive board,

https://www.researchgate.net/publication/221054044_AffinityTable-_A_Hybrid_Surface_for_Supporting_Affinity_Diagramming?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/221629693_Designing_for_collaborative_creative_problem_solving?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==


https://www.researchgate.net/publication/221441564_R_Exploring_true_multiuser_multimodal_interaction_over_a_digital_table?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/221540183_Firestorm_A_brainstorming_application_for_collaborative_group_work_at_tabletops?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==


Figure 3. The TATIN-PIC architecture showing a multi-agent platform forhandling the graphics and the application and another multi-agent platformfor handling the user interface. The agents in blue are built using the JADEtoolkit, and the agents in green are built using the OMAS toolkit.

Figure 4. Mutlti-agent system for handling the TATIN tabletop and interactiveboard

each phase of the design process is controlled by a phaseagent and each circular menu of the users are controlled bycorresponding workbench agents. All such agents are imple-mented using the JADE multi-agent platform [9]. The JADEtoolkit is written in Java, and allows us to tightly integrate ouragents with our user interface, also built using a java toolkitMT4j [10]. The JADE toolkit also provides some inter-devicecommunication. For example, the moderator of a brainstormingsession might want to display the result of the brainstormingsession (the categorization of Post-it notes) on the interactiveboard, to provide an overview. In this case, the moderator willpress a button on his circular menu, cause the Table Agent torequest that the brainstorming Phase Agent send its model tothe Whiteboard Agent.

In order to provide multimodal interaction, each user hasan earphone (vocal I/O channel) linked by bluetooth to a smallmini PC housed inside the interactive tabletop. It is responsiblefor running the speech-to-text (Dragon NaturallySpeaking) andtext-to-speech (Microsoft Speech SDK) modules, and containsa set of agent devoted to the participant. One of the agentsis their personal assistant agent (PA) in charge of runningthe dialogs. The other agents, called staff agents, are morespecialized (Fig.5). The role of the PA is to handle commu-nications with the participant. The PA has a very superficialknowledge of the application, following here the digital butlerapproach promoted by Negroponte [13]. In addition, currently,agents on the side of the user have a more cognitive biasand are implemented through an OMAS platform [1], whichspecialize in rapid prototyping for artificial intelligence anduser interaction. The two multi-agent platforms communicatethrough gateways, also called transfer agents or postmen.

Figure 5. Mutlti-agent system for handling the user interface, showingpersonal assistants and staff on each user’s mini PC and a service agent andtransfer agent to access the table multi-agent platform.

IV. PROCESSING VOCAL REQUESTS

The recognition mechanism is based on a two-step process:(i) the PA searches the input for selecting a possible task todo; and (ii) if a task is found the dialog associated with thattask is triggered and, depending on the result, a message issent to the right staff agent in order to execute the task. Then,the staff agent processes the task and sends a message to thecorresponding workbench agent to implement the result on thetable. Once the result appears on the table some informationis returned to the PA meaning the end of the action.

This assumes that possible tasks are limited in number andorganized into a library of tasks, and that each task has anassociated dialog used for acquiring the information needed tolaunch the task. It also assumes that there is no problem forsending messages from a platform to the other. The approachrequires an ontology that will be used in part for recognizingthe task to do, in part for specifying the information transferredbetween agents. One also needs structures to represent the tasks

https://www.researchgate.net/publication/234794779_Agents_from_direct_manipulation_to_delegation?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/45696140_MT4j_-_A_Cross-platform_Multi-touch_Development_Framework?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

https://www.researchgate.net/publication/222826643_OMAS-a_flexible_multi-agent_environment_for_CSCWD?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

and to model the dialogs. Details of our implementation forprocessing vocal requests are described below.

A. Implementation

1) Dialogs: Dialogs are defined as conversation graphs,namely transition networks. A node of the network representsa state of the conversation and according to the input from theuser, a transition occurs to another state until some action stateis reached or the request is abandoned. At each node rules areexecuted on a base of facts containing the information deductedso far as shown Fig.6.

Figure 6. Principle of a Conversation Graph. The fact base (facts) is movedalong as the graph is traversed as a result of transitions.

Fig.7 represents the different states of the (sub-)dialog forcreating a Post-it note. The different states are created by usinga defstate instruction.

Figure 7. Conversation graph implementing the sub-dialog for creating aPost-it note.

The following expression specifies the ”create” state.

(defstate crpi-create(:label ”ask TATIN to create post it”)(:explanation ”We send a message to create post it”)(:send-message

:to :JEAN-PAUL-POSTIT :self PA JEAN-PAUL-PA:action :create:args ‘(((:data (”type” ”postit”)

(”value” ,(get-fact :text))(”creation date” ,(moss::get-current-date))(”author” ”JEAN-PAUL”)))))

(:transitions(:on-failure :target crpi-sorry)(:otherwise;; record new post it data onto the salient features queue:exec (sf-put PA JEAN-PAUL-PA (get-fact :raw-input)):print-answer #’print-postit-id :success))

)

Here a message is sent to the POSTIT staff agent asking itto create a new Post-it note, adding some meta information,and specifying transitions in case the action fails or in caseit is a success. Note that in case of success, the text of thePost-it note is added to the salient-features stack and will beavailable in case it is needed at a later stage.

2) Ontology: An ontology is necessary to model the tasks,the domain and to structure various knowledge bases. We use arepresentation language called MOSS [1]. MOSS is a complexframe-based representation language, allowing to describe con-cepts, individuals, properties, classless objects, default values,virtual concepts or properties. It includes an object-orientedlanguage, a query system, multilingual facilities and muchmore described in the online documentation. We only presenthere the features used for communication. MOSS is centeredon the concept of property and adopts a descriptive (typicality)rather than prescriptive approach, meaning that defaults areprivileged and individuals may have properties that are notrecorded in the corresponding concept. Reasoning is done viaa query mechanism.

a) Concepts: Concepts are represented as objects withproperties (attributes and relations). For example the conceptof group of Post-it notes can be represented as follows:

(defconcept ”Group”(:att ”label” (:unique)(:entry))(:rel ”children” (:to ”post it”))

b) Relations: Relations are binary. They link two ob-jects, e.g. a group and a post it.

c) Attributes: Attributes have associated values. All at-tributes have multiple values that may be restricted by varioustypes of constraints. An attribute can be inverted, providingdirect access to associated objects (it may point at severalobjects sharing this attribute). Thus, a value becomes an index.

The MOSS formalism is used to represent ontologies andthe corresponding knowledge bases.

3) Task Library: The set of possible actions is modeled andgathered into a library of tasks. Tasks are defined as individualsof the concept of task. For example:


(deftask ”create post it”:doc ”Task for creating a post it”:performative :request:dialog create-postit-conversation:indexes

(”add” .3 ”post it” .6 ”new” .3 ”create” .3”delete” -1 ”kill” -1 ”archive” -1))

The :indexes parameter specifies a list of linguistic cues.Each phrase (cue) has a weight between −1 and 1. A value of−1 means that the task may not be selected if the input containsthe corresponding phrase. A value of 1 means that the taskcertainly applies if the input contains the corresponding phrase.The :performative parameter specifies whether the task appliesto a request or to an assertion. The :dialog parameter pointsto the representation of a conversation graph when interactingvia the PA interface.

4) Selecting the Task: Using the task library, selecting anaction (a task) is done as follows:

1) The user tells something to her PA, like ”create a postit”;

2) For each task in the library the PA checks the sentencefor phrases specified in the index pattern describing thetask, and computes a score by using a MYCIN-likeformula (i.e. if 2 cues a and b are present, the combinedscore is computed by the formula a+b-ab); for exampleusing the user input of step one and the cues for the taskdescribed previously, the two phrases ”create” and ”postit” will give a score of (0.3 + 0.6 − 0.3 × 0.6) = 0.72for the task;

3) tasks are then ordered by decreasing scores;4) the task with the highest score is selected, and all

the tasks with a score under a specific threshold arediscarded.

Indices use terms from the ontology (with some linguisticadditions when needed) and currently weights are assignedmanually.

Finally, when using vocal input the sentence received bythe PA may be garbled, in which case a grammatical analysisis not very useful.

5) Sending Information to the Staff Agents: When a PA hasdetermined what task should be executed, it sends a message tothe corresponding staff or service agent with the user request.The content language is fairly simple. In our example:

:data "create a new post it":language :EN

If the PA wants to use the returned information, for exampleto build an individual using the response and keep it in itsmemory, it can add a pattern. For example using the pattern

:pattern "post it" ("title")("id")

allows building a representation of a post it separating the idfrom the content.

When no pattern is specified, the returned answer is a literal(string) and has no structure.

a) Agent Communication Language: The agent commu-nication language (ACL) is different in each platform and thetransfer agents (gateways) have to restructure the messagesfrom and to the other platform. The exchange format followsthe JSON grammar and syntax [11]. A message is consideredas a JSON object whose properties are in accordance withFIPA standard. The properties are: sender, receivers, profileof receivers, creation date, performative, language, IP addressand port of the sender host, identifier of the message, taskidentifier and content. The performatives however need not bevery complex and one can get away with a simple set of basicperformatives. The main performatives used during exchangeare cancel, answer, error (for failure and not understood),inform and request.

b) Agent Content Language: The message content isalso a JSON object whose main properties are:

• action: a symbol corresponding to a skill or a behavior• args: list of parameters useful when the agent performs

the action• contents: JSON structure corresponding to an answer• error-contents: explanation when an error occursThe following example shows the content of a message

requesting for the creation of a group of Post-it notes andfor adding some Post-it notes to this group.

{”action” : ”create”,”args” : {”category” : ”group”,”content” : ”Structure”, ”ref” : [”p3-2-1-1”]}}

Parameters depend on the nature of the action requested.The content argument is the text to be inserted in the Post-itnote and the ref argument is the Post-it note list to be addedto the group created.

6) Micro-contexts and Salient Features: When the interac-tion is a multimodal interaction, then one has to take care ofpossible ambiguities in the selection of virtual objects on thetable. Concerning the brainstorming phase, it was decided thatthe notes being created belong to their creator. Thus, whenevera Post-it note is selected the corresponding Workbench agentsends a message to the right PA. The PA can then constructa stack of salient features that will be used for de-refencingpronouns in particular. A micro-context can then be assignedto the stack during the execution of a specific action, limitingthe scope of the references kept on the stack. Functions weredeveloped to this effect.

B. Examples of Vocal Input

Here, using the classification of vocal input introduced insection 2 (Simple Vocal Request, Compound Vocal Request,Multimodal Request), we give an example of interaction.

Example 1: Creating a Post-it Note Assume that aparticipant wants to create a new Post-it note. She will say,for example, ”create a new Post-it” into her earphone. Herassociated mini PC, containing the speech-to-text recognitionsoftware, will translate the utterance into a list of words,perhaps with some recognition errors, and send the resultingstring to her PA. The PA will first determine what task from thetask library is intended using n-gram analysis and recognize

https://www.researchgate.net/publication/280852514_Conducting_Preliminary_Design_Around_an_Interactive_Tabletop?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==

that the applicant wants to create a Post-it note. It will thensend the sentence to a specialized staff agent in charge ofhandling all operations related to Post-it notes, that will inturn send a message to the Workbench Agent associated withthe specific participant. The Workbench Agent will create thePost-it note and send back the assigned note internal id toback to the PA. The PA will tell the participant that the actionhas been completed through the earphone. The Post-t will becreated on

Example 2: Creating a Post-it Note with Content Inthis case the utterance would be for example: ”please create aPost-it entitled market survey.” or ”give me a new Post-it withcontent market survey. Processing in that case is similar to theprevious case with the difference that the PA knows how toretrieve the phrase ”market survey” and transfer it to the staffagent to make it the content of the Post-it note.

Example 3: Modify content of Post-it Note One of thesimplest examples of multimodal interaction is to select a Post-it note with a gesture and then modify its content with avocal command. This is an example sequential, complementarymultimodal fusion. First, the user taps the Post-it note to selectit. This touch event on the tabletop surface is associated withthe user of the closest circular menu. The Workbench Agent ofthis user’s circular menu sends an inform message to the user’sPA. The PA stores the event in its salient feature queue. Next,the user says ”delete that.” The speech-to-text module on theuser’s mini PC translates the utterance to text, which is fedinto the PA. After the n-gram analysis, the PA issues its deletetask, and once the PA detects the referential pronoun ”that” itimmediately pops the latest entry off its queue. The PA is nowready to send a request to the Table Agent to delete the Post-itnote associated with the last entry of the queue. A delicatepoint in this chain of events occurs when the user selectsthe item. As we explain above, we use a distance heuristicto associate the event to a user. There are times when a Post-itnote could be on the line between two users’ circular menus.We have used visual feedback to help resolve the problem.Once a note is selected it changes colors to match the colorof the user’s circular menu, therefore confirming exactly whohas made the selection. This possible ambiguity in associatingevents to users must be especially considered by interfacedesigners.

V. DISCUSSION

The architecture described in the previous sections has someadvantages. Each multi-agent system has its own role. TheJADE system helps to manage the activities on the tabletop andthe interactive board. The OMAS system manages the activitieswhich directly assist the participants. Both multi-agent systemsmay exploit the information system that capitalizes the projectsdone with the multi-touch table. Internally, they use their ownstructure and dispatch the messages. One of the leading con-tributions of this architecture is it shows how intelligent agentscan be integrated directly into interactive work environmentsusing multi-agent systems.

We have to consider messages exchanged inside one agentsystem and messages exchanged between both systems. Insidethe JADE system, messages respect the corresponding formatbut sometimes the core of messages must contain the modelof a phase. For example, it is necessary to serialize it whena phase agent sends its model to the interactive board agent.The model is first transformed into a serializable object andthen into a JSON string that is inserted in the message.

The main drawback of the architecture is the need to definean exchange message format for the communication betweenthe multi-agent systems. All the actions that a workbench agentperforms, either under the direction of the table, or to informa PA of an event, or for answering a PA request, have to bedescribed in detail and the exchange format has to define allparameters. However, a PA never directly controls an actionto be done on the table. There are different possibilities: aparticipant may ask a PA to send a message to the workbenchin order to perform an action; a PA may suggest to theparticipant some action to do using the voice recognitionsystem and if the participant accepts, it sends the commandto the workbench agent.

Complex questions can be asked to a PA in natural language(or near natural language) according to the strength of the agentanalyzer. Using n-gram analysis to dispatch tasks allows foreasier prototyping and accommodates for errors made by thespeech-to-text engine, those it becomes difficult to create Fornow our analyzer is limited to the study of short sentences anda PA has not yet learning abilities like some meeting assistantscan have. As an answer, a PA can suggest to the participantto perform an action using a toolkit available on the table.For example, a participant can ask to a PA: “Post-it notes ofprojects about microlight” or “Post-it notes about microlight”and the PA can answer with the reference of the project thatmost closely matches the request. The participant can thenopen with the PDF reader tool the PDF file that is automaticallycreated when a project is saved.

During the brainstorming phase, some of the actions maycombine vocal interaction and gestures. For example, groupingelements can be done by saying ”group this, this, and that,and call the group: procurement.” The terms this and thatcorrespond to gestures indicating which items must be puttogether. The combination of giving a command with graphicsparameters corresponds to what has been done for years inCAD systems. A user either issues a vocal command and thenselects items, or selects items and then issues a commandapplicable to the selected items. However, during the dialog,the PA needs to recover the identity of the selected items inorder to construct a command containing the right parameters(here the identity of the items to group). Thus, the table mustsend the identity of the selected items to the PA. In a singleuser environment, it is rather easy: all the selected items areinserted into a salient features queue and the process runningthe dialog retrieves them from there. In a multi-user multi-touch environment this is more complex, since everybody maybe selecting items on the same surface at the same time, whichintroduces ambiguity as to which identity to forward to which

PA, knowing that there is a single agent handling the graphicssurface.

Achieving multimodal fusion of user events is one of severalreasons for which tabletop designers encounter the problem ofidentifying which user is interacting where. Though tabletopsdesigners try to work around this, the problem must beaddressed when trying to produce highly synchronized multi-user multimodal interaction.

VI. FUTURE WORK

We think that the proposed architecture has a strong poten-tial for including more complex features. In particular, it is easyto add agents to the system even dynamically. Service agentscan watch what the PA or staff agents are doing and proposenew actions through the audio channel. Outside resources fromdatabases or from the web can be triggered from the contentof audio exchanges and the result can be proposed followinga presentation policy.

Our system is being deployed and we still have to smoothout some problems, e.g. linked to a noisy environment. Suchproblems have been encountered in other projects in particularby Tur et al [16]. Currently the system has just been integratedand we are only considering extensive testing.

VII. ACKNOWLEDGMENTS

The authors thank the Regional Council of Picardy andEuropean Union for their financial support of the TATIN-PICproject. Europe is a partner of the region Picardy throughFEDER. The content of the paper is the sole responsibilityof the authors and does not imply either regional or Europeanpolicy.

REFERENCES

[1] J.-P.A. Barthes. OMAS - A Flexible Multi-Agent Environment forCSCWD. Future Generation Computer Systems, vol. 27, n. 1, pp. 78–87,2011.

[2] Clayphan, A., Collins, A., Ackad, C., Kummerfeld, B., Kay, J.: Firestorm:a brainstorming application for collaborative group work at tabletops.Proceedings of the ACM International Conference on Interactive Tabletopsand Surfaces. pp. 162-171. ACM, New York, NY, USA (2011).

[3] Mandryk, R.L., Scott, S.D., Inkpen, K.: Display Factors Influencing Co-located Collaboration. In Conference Supplement to ACM CSCW’02Computer Supported Cooperative. pp. 137-138 (2002).

[4] Everitt, K., Shen, C., Ryall, K., Forlines, C.: MultiSpace: enablingelectronic document micro-mobility in table-centric, multi-device environ-ments. Horizontal Interactive Human-Computer Systems, 2006. TableTop2006. First IEEE International Workshop on. p. 8 pp. (2006).

[5] B. Hartmann, M. R. Morris, and A. Cassanego, ”Reducing Clutteron Tabletop Groupware Systems with Tangible Drawers,” in AdjunctProceedings of UbiComp 2006, 2006.

[6] Geyer, F., Pfeil, U., Budzinski, J., Hochtl, A., Reiterer, H.: AffinityTable- A Hybrid Surface for Supporting Affinity Diagramming. Human-Computer Interaction - INTERACT 2011. 6948, 477-484 (2011).

[7] Hilliges, O., Terrenghi, L., Boring, S., Kim, D., Richter, H., Butz, A.:Designing for collaborative creative problem solving. Proceedings of the6th ACM SIGCHI conference on Creativity cognition CC’07. p. 137(2007).

[8] D. Schnelle-Walka and S. Dwling, ”Speech Augmented Multitouch Inter-action Patterns,” in Proceedings of the European Conference on PatternLanguages of Programs, 2011, p. 2011.

[9] JADE: http://jade.tilab.com.[10] U. Laufs, C. Ruff, and J. Zibuschka, ”MT4j - A Cross-platform Multi-

touch Development Framework,” Framework, 2010.[11] JSON: http://www.json.org

[12] Kendira, A., Gidel, T., Jones, A., Lenne, D., Barthes, J.-P., Moulin,C.: Conducting Preliminary Design around an Interactive Tabletop. Pro-ceedings of the 18th International Conference on Engineering Design(ICED11). pp. 366-376 (2011).

[13] Negroponte, N.: Agents: From direct manipulation to delegation. In J MBradshaw, editor, Software Agents. MIT Press, Cambridge. (1997) 57–66

[14] Paraiso, E.C., Barthes, J.-P.A.: An Intelligent Speech Interface forPersonal Assistants in R&D Projects, Expert Systems with Applications,31, 673-683 (2006).

[15] Tse, E., Greenberg, S., Shen, C., Forlines, C., Kodama, R.: Exploringtrue multi-user multimodal interaction over a digital table. Proceedings ofthe 7th ACM conference on Designing interactive systems - DIS ’08. pp.109-118. , Cape Town, South Africa (2008).

[16] Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tur, D., Dowding, J.,Favre, B., Fernandez, R., Frampton, M., Frandsen, M., Frederickson, C.,Graciarena, M., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Purver,M., Riedhammer, K., Shriberg, E., Jing Tien, Vergyri, D., Fan Yang. TheCALO Meeting Assistant System. IEEE Transactions on Audio, Speech,and Language Processing, 18(6), 1601-1611 (2010).










https://www.researchgate.net/publication/220219092_An_intelligent_speech_interface_for_personal_assistants_in_RD_projects?el=1_x_8&enrichId=rgreq-3c3f0f7f-8acc-4cd7-962b-86c2a572f8aa&enrichSource=Y292ZXJQYWdlOzI2MTExNTgyNztBUzoxNDA2MjQwNTc2MDYxNDRAMTQxMDUzODc5Njk5MA==



































Vocal interaction in collocated cooperative design

Documents

Transcript of Vocal interaction in collocated cooperative design