Autobriefer: A system for authoring narrated briefings

17
O. Stock and M. Zancanaro (eds.) Multimodal Intelligent Information Presentation, Dordrecht: Kluwer, 2005. 1 T.H.E. Editor(s) (ed.), Book title, 1—6. © yyyy Kluwer Academic Publishers. Printed in the Netherlands. ELISABETH ANDRE, KRISTIAN CONCEPCION, INDERJEET MANI, AND LINDA VAN GUILDER AUTOBRIEFER: A SYSTEM FOR AUTHORING NARRATED BRIEFINGS We describe a system called AutoBriefer which automatically generates multimedia briefings from high- level outlines. The author of the briefing interacts with the system to specify some of the briefing content. In order to be scalable across domains, AutoBriefer provides a graphical user interface (GUI) that allows the user to create or edit domain knowledge for use in the selecting of some of the briefing content. The system then uses declarative presentation planning strategies to synthesize a narrated multimedia briefing in various presentation formats. The narration employs synthesized audio as well as, optionally, an agent embodying the narrator. 1. INTRODUCTION Document production is an important function in many organizations. In addition to products such as technical papers, instruction manuals, system documentation, courseware, etc., briefings are a very common type of multimedia document product. Since so much time is spent by so many people in producing briefings, often under serious time constraints, any method to reduce the amount of time spent on briefing production could yield great gains in productivity. Our work forms part of a larger DARPA-funded project aimed at improving analysis and decision-making in crisis situations by providing tools that allow analysts to collaborate to develop structured arguments in support of particular conclusions, and to help predict likely future scenarios. These arguments, along with background evidence, are packaged together as briefings to high-level decision- makers. In leveraging automatic methods to generate briefings, our approach needs to allow the analyst to take on as much of the briefing authoring as she wants to (e.g., it may take time for her to adapt to or trust the machine, or she may want the machine to present just part of the briefing). The analyst's organisation usually will instantiate one of several templates dictating the high-level structure of a briefing; for example, a briefing may always have to begin with an executive summary. The methods also need to be relatively easily adapted to different domains, given that the subject matter of crises are somewhat unpredictable. These task requirements force us to focus on a flexible approach to briefing generation. The system must allow the user to take control as much as she wants in the specification of the content and format, as well as in editing the final product. It must also be able to help the user as needed by exploiting any redundancy in the briefing content, including stereotypical structure (dictated in part by the business rules of the organization) as well as material from previous briefings, databases, etc. In both these cases, the system should be able to populate an outline of the briefing. In cases where the user wants to be freed from specifying the low-level details of the

Transcript of Autobriefer: A system for authoring narrated briefings

O. Stock and M. Zancanaro (eds.) Multimodal Intelligent Information Presentation, Dordrecht: Kluwer, 2005.

1 T.H.E. Editor(s) (ed.), Book title, 1—6. © yyyy Kluwer Academic Publishers. Printed in the Netherlands.

ELISABETH ANDRE, KRISTIAN CONCEPCION, INDERJEET MANI, AND LINDA VAN GUILDER

AUTOBRIEFER: A SYSTEM FOR AUTHORING NARRATED BRIEFINGS

We describe a system called AutoBriefer which automatically generates multimedia briefings from high-level outlines. The author of the briefing interacts with the system to specify some of the briefing content. In order to be scalable across domains, AutoBriefer provides a graphical user interface (GUI) that allows the user to create or edit domain knowledge for use in the selecting of some of the briefing content. The system then uses declarative presentation planning strategies to synthesize a narrated multimedia briefing in various presentation formats. The narration employs synthesized audio as well as, optionally, an agent embodying the narrator.

1. INTRODUCTION

Document production is an important function in many organizations. In addition to products such as technical papers, instruction manuals, system documentation, courseware, etc., briefings are a very common type of multimedia document product. Since so much time is spent by so many people in producing briefings, often under serious time constraints, any method to reduce the amount of time spent on briefing production could yield great gains in productivity.

Our work forms part of a larger DARPA-funded project aimed at improving analysis and decision-making in crisis situations by providing tools that allow analysts to collaborate to develop structured arguments in support of particular conclusions, and to help predict likely future scenarios. These arguments, along with background evidence, are packaged together as briefings to high-level decision-makers. In leveraging automatic methods to generate briefings, our approach needs to allow the analyst to take on as much of the briefing authoring as she wants to (e.g., it may take time for her to adapt to or trust the machine, or she may want the machine to present just part of the briefing). The analyst's organisation usually will instantiate one of several templates dictating the high-level structure of a briefing; for example, a briefing may always have to begin with an executive summary. The methods also need to be relatively easily adapted to different domains, given that the subject matter of crises are somewhat unpredictable.

These task requirements force us to focus on a flexible approach to briefing generation. The system must allow the user to take control as much as she wants in the specification of the content and format, as well as in editing the final product. It must also be able to help the user as needed by exploiting any redundancy in the briefing content, including stereotypical structure (dictated in part by the business rules of the organization) as well as material from previous briefings, databases, etc. In both these cases, the system should be able to populate an outline of the briefing. In cases where the user wants to be freed from specifying the low-level details of the

2 ELISABETH ANDRE ET AL.

animation, narration, and multimedia synchronization of the briefing, the system should be able to plan and synthesize the actual presentation, which the user can then post-edit.

As an example, consider a user putting together a briefing about a particular individual, say Victor Polay, a leader of Peru’s Tupac Amaru movement. A person preparing this briefing by hand (using, say, Microsoft PowerPoint) would need to plan the outline, and assemble the various textual and multimedia elements that constitute the core content. This latter activity could involve searching, doing background research, collecting existing material, drafting new text, etc. The person must also design or select a template for the overall briefing, as well as the layout of individual slides, and then produce each slide individually, realizing the content of each slide by embedding or linking the media elements on it, specifying the animation strategies, and then rehearsing and refining the briefing. Finally, the person has to present the briefing in front of the intended audience(s).

In this paper, we report on a system called AutoBriefer where the author of the briefing interacts with a system via a graphical user interface (GUI) to specify some of the briefing content. As before, the user can select or create an outline. However, the user doesn’t have to produce each slide, realizing the media elements on each. Instead, the AutoBriefer GUI allows the user to select and associate the media elements directly with the outline. In doing so, the user may specify the temporal layout of the content. Moreover, instead of using pre-existing media elements, the GUI also allows the user to direct personalized information agents to assemble media elements from existing (possibly distributed) databases or text collections. The user can then leave the rest of the briefing preparation to the system, by asking AutoBriefer to generate a PowerPoint briefing. In doing so, the user allows the system to plan how many slides are needed, and to plan and realize the media elements on each slide, synchronizing them spatially and temporally. Further, the user doesn’t need to present the briefing; the system itself can generate a spoken narrative for the briefing, synchronizing the narrative elements it introduces with the other media elements, producing an animated briefing as a result.

One challenge in automating this process involves automatically selecting and assembling media elements; here AutoBriefer exploits new technologies for automated summarization. In order to generate a narrative automatically, the system needs knowledge about the domain objects; here, AutoBriefer provides scalability across domains by allowing the user to specify or edit domain knowledge in the form of meta-information attributes associated with media elements. This ability to specify and edit domain knowledge in turn addresses an important bottleneck in current multimedia presentation strategies, which are difficult to adapt across domains. AutoBriefer leverages the ability to edit domain knowledge along with declarative presentation planning strategies to synthesize a narrated multimedia briefing in various presentation formats.

2. APPROACH

The design of AutoBriefer is different from most multimedia presentation planners in the way the problem is decomposed. An easy-to-use GUI is used to specify high-

AUTOBRIEFER 3

level structural and temporal constraints for arranging domain objects in a presentation. Information associated with these domain objects is used to generate a narrative outline. This narrative outline is then fed as data along with a high-level communicative goal to a presentation planning component (PPP). The planner chooses a sequence of plan operators to realize the goals associated with the outline. The plan operators constitute a declarative representation of presentation strategies to decide on narrator animation, and spatial and temporal synchronization.

The system architecture is shown in Figure 1. The user interacts with the system via a GUI using a simple point-and-click interface. She may load in a domain model of existing domain objects, for possible editing, or she may create new domain objects using information agents.

Figure 1. System Architecture

BriefingGUI

MultimediaPlayer

orPowerPoint

Narrative Planner

LayoutGenerator

XML2PropConverter

SMILorPowerPoint

XMLNarrativeOutline

BriefingGeneratorUser

Interface

User

DomainKnowledge

Agents

PropositionalNarrativeOutline

As shown in Figure 2, the user arranges the objects along two dimensions. The

first, structural dimension, allows objects to be nested under others to create a hierarchical structure. This is shown in the left frame, which uses a directory browser interaction metaphor. The user here has chosen to specify the goal of retrieving an existing media object, namely a picture of “Polay”. The user has also specified goals of creation of a summary of events and a biography; the execution of these goals, of type ‘create’, involve the use of information agents. Finally, the user has also included a particular audio file as a coda to end the briefing.

The second, temporal dimension allows objects to be ordered temporally. This is shown in the right frame: objects from the left frame can be dropped into numbered temporal slots in the right frame (the system lays them out, by default, based on the outline).

4 ELISABETH ANDRE ET AL.

Figure 2. Briefing GUI

Each object can be edited by filling in meta-information in the form of values for

slots (Figure 3). Here the user has typed in a caption, and has required that the system generate the running text.

Figure 3. Editing an Object

AUTOBRIEFER 5

Once a subset of objects in the domain have been selected and arranged appropriately, the user may choose a style sheet for the current briefing, and then ask the system to create the briefing, in either PowerPoint (preferred, since it has a wide user base) or SMIL (Synchronized Multimedia Integration Language), a W3C-developed extension (SMIL 2002) of HTML that can be played by standard multimedia players, such as Real (REAL 2002) and Grins (GRINS 2002).

The narrative outline is represented in XML as a tree whose structure represents the rhetorical structure of the briefing. An example is shown in Figure 4, which was generated automatically as a result of GUI interaction. (Here, for readability, we show a schematic representation of the narrative outline rather than the raw XML.) Each node has a label, which offers a brief textual description of the node. Each leaf node has an associated goal, which, when realized, provides content for that node. Content-level goals are of two kinds: retrieve goals, which retrieve pre-existing media objects of a particular type (text, audio, image, audio, video) satisfying some description, and create goals, which create new media objects of these types using personalized information agents for search, summarization, etc. In addition to content-level goals, the narrative outline contains narrative-level goals, which serve to introduce descriptions of content at other nodes, including segues to content nodes, and running text and captions describing media objects at content nodes.

Ordering relations reflecting temporal and spatial layout are defined on nodes in the tree. Two coarse-grained relations, seq for precedence, and par for simultaneity, are used to specify a temporal ordering on the nodes in the tree.

As shown in Figure 1, once the user of the GUI asks the system to create the briefing, the GUI passes off the narrative outline to a converter which first applies an XML parser to the outline to check for syntactic correctness. The validated outline is then transformed into propositions that can be processed by the Narrative Planner and the Layout Generator. The Narrative Planner takes the input narrative outline and works out a coherent presentation by realizing the corresponding retrieve and create goals. In particular, it fills out some content by invoking specific components for the acquisition and creation of multimedia material. It also makes decisions when needed about output media type. To improve the coherence of the briefing, the narrative planner may introduce narrative-level goals which refer to meta information on the content of a briefing. In the case of create goals, meta information may be easily provided by the narrative planner. For retrieve goals, this information is usually supplied by the user. In either case, narrative elements are generated from the meta-information by using shallow text generation methods (canned text). Finally, the Layout Generator generates the temporal and spatial layout of the presentation.

Both the narrative planner and the layout generator are based on the PPP multimedia presentation planning tool (Andre and Rist 1996). The PPP tool combines two components: an HTN-planner that realizes the Narrative Planner and a component for spatial and temporal reasoning that is used in the Layout Generator. While in the original approach of André and Rist a presentation is planned on the basis of a domain knowledge base, the starting point for the approach presented here is the narrative outline as provided by the user, which is then refined and synchronized.

6 ELISABETH ANDRE ET AL.

Figure 4. Narrative Outline

Peru Action Brief 1 Preamble 2 Situation Assessment 2.1 Chronology of Events 2.1.2 Latest Document Summary create(“summarize-generic compression .10/peru/p32.txt”) 2.2 Biographies 2.2.1 Profile of Victor Polay 2.2.1.1 A File Photo of Victor Polay retrieve(“D:\rawdata/polay.jpg”) 2.2.1.2 Profile of Victor Polay create(“summarize –bio –length 350 –span multi –person Victor Polay -out table/peru/”*”) 3 Coda “This briefing has assessed aspects of the situation in Peru. Overall, crisis appears to be worsening.” <seq> <par>1</par> <par>2.1.2</par> <par>2.2.1.1 2.2.1.2</par> <par>3</par> </seq>

3. INTELLIGENT MULTIMEDIA PRESENTATION PLANNING

The author of a briefing may choose to flesh out as much of the narrative as desired. For instance, she may exactly indicate how a certain image should be labeled, but is not required to do so. Further, she may specify certain narrative goals, though she is not required to do so; she may also leave it up to the Narrative Planner component to introduce all the narrative goals. In case domain objects should appear in a certain order in the final presentation, she may also provide the desired temporal ordering relations. Note, however, that the complete temporal layout need not be specified by the user. In addition to the narrative outline, the user may input a number of presentation constraints, such as the degree of detail, the style of presentation (e.g. agent-based versus audio presentation) or the output format, e.g., SMIL or PowerPoint. The task of the presentation planner is then to work out a presentation based on the user’s input. This includes the realization of the create, retrieve and narrative goals as well as the determination of the spatial and temporal layout. The result of the presentation is a document in the required format.

AUTOBRIEFER 7

3.1 Basic Technology

To accomplish the tasks sketched above, we rely on our earlier work on presentation planning. The essential idea of this approach is to formalize action sequences for composing multimedia material and designing scripts for presenting this material to the user as operators of a planning system. The header (i.e., effect) of a planning operator refers to a complex communicative goal (e.g. to provide a biography of a certain person) whereas the expressions of the body of the operator indicate which acts have to be executed in order to achieve this goal (e.g. to show a picture of that person).

The input goal to the presentation planner is a complex presentation goal, e.g. to provide a briefing. To accomplish this goal, the planner looks for operators whose headers subsume it. If such an operator is found, all expressions in the body of the operator will be set up as new subgoals. During the decomposition process, a presentation is built up by collecting and expanding the nodes of the user’s narrative outline. To accomplish this task, we start from an empty output file that is continuously augmented by SMIL-expressions or expressions in an XML-based markup language that serves as an intermediate format for the PowerPoint slides to be generated. The planning process terminates if all subgoals have been expanded to elementary execution acts or have already been worked out by the user. Elementary execution acts include retrieve or create goals as well as commands, such as SAddString, that augment the output file by expressions in the required language.

3.2 Narrative Planning

As mentioned above, the system can construct a narrative to accompany the briefing. Narrative goals are generated to cover captions, running text, and segues. The captions and running text are realized from information provided in the narrative outline. In the case of create goals, the running text and captions may be easily generated by the system, as a by-product of creating a media object. In the case of retrieve goals, the objects retrieved may not have any meta-information, in which case a default caption and running-text is generated. For example, if a retrieve goal associated with a picture lacked a user-supplied caption, the Narrative Planner would use the narrative outline as the basis for constructing a caption. This flexibility allows the system to fill in what the user left unspecified. Clearly, a system's explanatory narrative will be enhanced by the availability of rich meta-information.

The segues are realized by the system. For example, an object with a label slot “A biography of Victor Polay” could result in a generated segue “Here is a biography of Victor Polay”. The Narrative Planner, when providing content for narrative nodes, uses a variety of different canned text patterns. For the above example, the pattern would be “Here is ?label”, where ?label is a variable whose value is the label of the current node.

Segue goals are generated automatically by the system and integrated into the narrative outline. We always introduce a segue node at the beginning of the presentation (called a preamble node), which provides a segue covering the root of

8 ELISABETH ANDRE ET AL.

the narrative outline tree. A segue node is also produced at the end (called a coda). (Both preamble and segue can of course be specified by the author if desired).

For introducing intervening segue nodes, we use the following algorithm based on the distance between nodes and the height in the narrative outline tree. We traverse the non-narrative leaves of the tree in their temporal order, evaluating each pair of adjacent nodes A and B where A precedes B temporally. A segue is introduced between nodes A and B if either (a) the maximum of the 2 distances from A and B to their least common ancestor is greater than 3 nodes or (b) the sum of the 2 distances from A and B to the least common ancestor is greater than 4 nodes. This is less intrusive than introducing segues at random or between every pair of successive nodes, and appears to perform better than introducing a segue at each depth of the tree.

In case an animated presenter is used, the narrative planner also decides on the non-verbal communicative means to be employed. In particular, we make use of mimics and body gestures to convey the discourse structure of the presentation and the character’s emotions, see also (Poggi et al., this volume) or (Theune et al., this volume). Since a deliberate use of communicative means requires detailed knowledge about the content of the presentation, it is hard to determine appropriate non-verbal gestures for the presentation of content associated with retrieve goals and create goals (that heavily rely on pre-authored multimedia material). The use of non-verbal gestures for the realization of narrative goals essentially depends on their type. For instance, segues that relate to a change in topic are accompanied by a shift in posture, see also (Cassell et al. 2001).

Strategies for expanding the narrative content are coded by means of plan operators. In the following, we present a simple plan operator which may be used to work out a narrative goal, namely to present a new PowerPoint slide. The body includes a communicative gesture as well as a canned text pattern which is instantiated with the value for ?label and uttered by the character. It should be obvious how this operator could be used to realize the segues for several of the nodes in the narrative outline in Figure 4.

(define-plan-operator :header (A0 (PresentPersona ?id ?SlideNumber)) :constraints (Label ?id ?label) :inferiors ( (A1 (sAddPPCode ("PLAY OneHand;;"))) (A2 (SAddPPCode ("SAY Here is a " ?label ";;")))) )

The presentation operators also specify which media type is used to realize a certain goal. By default, we attempt to realize narrative goals as speech media type, using rules on text length and truncatability to less than 250 bytes to decide when to use text-to-speech. Captions are always realized as text (i.e., they have a text realization and a possible audio realization).

AUTOBRIEFER 9

3.3 Layout Generation

As described above, we are able to generate both SMIL and PowerPoint presentations. The generation of Powerpoint presentations starts from an XML-based markup language that represents both the contents of a Powerpoint slide as well as the accompanying comments and gestures by the presenter. The XML-representation is then converted into Powerpoint presentations making use of Bridge2Alpha from IBM alphaWorks, a tool that allows Java programs to communicate with ActiveX objects. While the material of a presentation appears on the slides pages, suggestions regarding its presentation are included in the note pages. Essentially, these suggestions are related to AutoBriefer’s narrative goals. The spatial layout of the slides is given by the corresponding PowerPoint style sheet and can be directly inferred from the hierarchical structure as indicated by the human author.

When generating SMIL-presentations, we also allow for dynamic media, such as video and audio, all of which have to be displayed in a spatially and temporally coordinated manner and need to be synchronized with the agents’ communicative gestures. For instance, a character may point to a certain object in a video while it is being played. In order to relieve the user from the burden to completely work out a spatial and temporal layout, the following extensions of the presentation planner have become necessary (Andre and Rist 1996):

• The formulation of temporal and spatial constraints in the plan operators We distinguish between metric and qualitative constraints. Metric temporal and spatial constraints appear as metric (in-)equalities, e.g. (5 ≤ Bottom VideoRegion – Top VideoRegion). Qualitative temporal constraints are represented in an “Allen-style” fashion which allows for the specification of thirteen temporal relationships between two named intervals, e.g. (SpeakInterval (During) PointInterval). Qualitative spatial constraints are represented by a set of topological relations, such as LeftOf, CenterHor or TopAlign. Note that the plan operators include general temporal constraints that should be satisfied for a certain document type while the temporal constraints the user specifies with the GUI refer to a particular document instance.

• The development of a mechanism for designing a spatial and temporal layout We collect all spatial and temporal constraints during the presentation planning process, combine them with the temporal constraints specified by the user for the document to be generated and use the incremental constraint solving toolkit Cassowary (Borning et al. 1997) to determine a consistent spatial and temporal layout which is then represented as a SMIL document. In case of a conflict between the constraints specified by the GUI user and the constraints specified by the author of plan operators, the user’s constraints is given preference, but a warning is produced.

In the following, we present a plan operator which may be used to synchronize text, spoken voice and video. The body of the plan operator contains a complex action (PresentVideo) that requires further expansion and two elementary actions

10 ELISABETH ANDRE ET AL.

(SAddSpeech and SAddSmilCode). SAddSmilCode will extend the SMIL output file with an animation that visualizes the talking. SAddSpeech will cause the Festival speech synthesizer (Taylor et al. 1998) to generate an audio file for ?text which will be included into the SMIL output file as well.

The title should be visible while the comment is spoken and the video is run. In particular, the verbal comment starts one time unit after the display of the title which preceeds the video. After the video is played, the title will be visible for one more time unit. The spatial constraints specify where the regions for the title A1 and the video sequence A3 should be located with respect to the region corresponding to the superior presentation segment A0.

(define-plan-operator :header (A0 (PresentSegment ?segment)) :constraints (*and* (BELP (Title ?segment ?title)) (BELP (Comment ?segment ?comment)) (BELP (Video ?segment ?video))) :inferiors (A1 (SAddSmilCode (?title))) (A2 (SAddSpeech Festival ?comment)) (A3 (PresentVideo (?video))) :temporal (1 <= Begin A2 – Begin A1 <= 1) (A2 (Before) A3) (1 <= End A1 – End A3 <= 1) :spatial (CenterHor A1) (AlignTop A1) (CenterHor A3) (1 <= Bottom A1 – Top A3 <= 1)

Note that the plan operators do not necessarily represent a complete specification

of the temporal and spatial layout of the resulting presentation, but only include specific design constraints. For instance, the plan operator above does not specify how long the title should be displayed, but only that it should start one time unit earlier and end one time unit later than the video. The advantage of our approach is that a consistent schedule is generated automatically on the basis of the constraints specified by the author of the plan operators and the user of the GUI shown in Figure 3.

4. EXAMPLE OUTPUT

In the following, we will present two different multimedia briefings that may be generated (among many other variations) from one and the same narrative outline: a PowerPoint slide show that makes use of an animated character and a SMIL-based briefing that just employs synthesized audio.

We start from the narrative outline discussed in Section 2. It includes two create goals, involving agents that carry out summarization functions: a multi-document summarizer (Mani and Bloedorn 1999) and a biographical summarizer (Schiffman et al. 2001).

AUTOBRIEFER 11

Figure 5 shows the XML-representation of the first slide of the PowerPoint presentation, which was created automatically by the Narrative Planner.

Figure 5. An Introductory Slide Represented by the Narrative Planner in XML

<?xml version='1.0' encoding='utf-8'?> <!DOCTYPE Presentation SYSTEM "presentation.dtd"> <Presentation> <Slide> <Title>Peru Action Brief</Title> <Content> <S>Preamble</S> <S>Situation Assessment <S> <S>Chronology of Events </S>

<S>Biographies </S> </S> <S>Coda </S> </Content> <Persona> SHOW kaufmann, Kaufmann4.31.acs, 1,69;; PLAY Tie;; PLAY Normal;; PLAY OneHand;; SAY In this briefing I will go over the situation assessment.;; PLAY Normal;; SAY This will cover an overview of the chronology of events and a biography of Victor Polay.;; PLAY SpeakGiveTurn;; SLIDE 2=SAY Next slide please. </Persona> </Slide> … </Presentation>

The narrative goals include both natural language utterances as well as the corresponding gestures to illustrate the contents on the corresponding slide. The final PowerPoint presentation generated is shown in Figure 6. Here we show screen dumps of the six PowerPoint segments produced which also include a virtual speaker. To enable this style of presentation, we make use of Microsoft Narrator, a tool that allows for the enhancement of PowerPoint slides by Microsoft agents that narrate and automatically advance through each slide.

12 ELISABETH ANDRE ET AL.

Figure 6. PowerPoint Presentation

. In this briefing I will go over the situation assessment. This will cover an overview of the chronology of events and a biography of Victor Polay.

Here is the latest document summary.

Here is an overview of the chronology of events.

Next, a biography of Victor Polay.

AUTOBRIEFER 13

Victor Polay, also known as Comandante Rolando, is the Tupac Amaru founder, a Peruvian guerrilla commander, a former rebel leader, and the Tupac Amaru rebels' top leader. He studied in both France and Spain. His wife is Rosa Polay and his mother is Otilia Campos de Polay. His associates include Alan Garcia.

This briefing has assessed aspects of the situation in Peru. Overall, the crisis appears to be worsening.

The same narrative outline can be used to plan a presentation in SMIL. The

corresponding presentation generated in SMIL is shown in Figure 7.

Figure 7. SMIL Presentation

In this briefing I will go over the situation assessment. This will cover an overview of the chronology of events and a biography of Victor Polay.

Here is an overview of the chronology of events.

14 ELISABETH ANDRE ET AL.

Here is the latest document summary.

Victor Polay, also known as Comandante Rolando, is the Tupac Amaru founder, a Peruvian guerrilla commander, a former rebel leader, and the Tupac Amaru rebels' top leader. He studied in both France and Spain. His wife is Rosa Polay and his mother is Otilia Campos de Polay. His associates include Alan Garcia.

Next, a biography of Victor Polay.

This briefing has assessed aspects of the situation in Peru. Overall, the crisis appears to be worsening.

5. RELATED WORK

There is a fair amount of work on automatic authoring of multimedia presentations. (Wahlster et al. 1993), (Dalal et al. 1996), (André and Rist 1996) aim at the fully automated generation of documents. While these approaches make use of their own

AUTOBRIEFER 15

display format, more recent work has discovered SMIL as a useful basis for multimedia presentations, e.g., see (Geurts et al. 2001) and (Bes et al. 2001). Our work is distinguished in several ways from the approaches listed above. First of all, our work is not restricted to a single presentation format. Second, AutoBriefer allows for flexibly shifting the responsibility between the presentation system and the human author. For instance, it is up to the user to determine to what extent to flesh out the narrative outline. Third, our approach explicitly distinguishes between content-level and narrative-level goals. Our captions, which are very short, rely on canned text based on node labels in the initial narrative outline, or are based on shallow meta-information generated by the summarization agents along with the created media object. (Mittal et al. 1995) describe a variety of strategies for generation of longer, more explanatory captions, some of which may be exploited in our work by deepening the level of meta-information. The distinction between content-level and narrative-level goals has been proven useful especially with respect to the generation of presentations that make use of synthesized agents to convey this meta-information. Finally, the approaches listed above concentrate on the presentation planning task and don’t exploit methods for multimedia document analysis.

(Piwek et al., this volume) investigate how the same content may be expressed by different kinds of multimodal presentation ranging from plain texts and illustrated documents to dialogues between two animated agents. However, these functions are not yet included in a single system as in Autobriefer which is able to produce a large variety of presentation variants from one and the same outline.

In our ability to leverage automatic summarization agents, our work should be clearly distinguished from work which attempts to format a summary (from an XML representation) into something akin to a Powerpoint briefing, e.g., (Nagao and Hasida 1998). Our work, by contrast, is focused on using summarization in generating briefings from an abstract outline.

6. CONCLUSION

Overall, the advantage of our approach lies in the fact that we are able to generate a large variety of briefings starting from the same narrative outline generated as a result of GUI interaction. In addition, the approach allows for a flexible adaptation of the degree of automation. The generated PowerPoint document may be used in two different ways. In the simplest case, a human speaker presents the slides herself to an audience considering the information on the notes as suggestions on how to deliver the material. Another possibility is the integration of one or several synthesized agents as virtual presenters (André and Rist 2000).

As mentioned at the outset, one of the goals of Autobriefer is to save time in briefing production. In the future, we hope to evaluate the time savings by means of an experiment that would compare the amount of time naïve and expert users need to produce a variety of briefings using Autobriefer instead of authoring directly in PowerPoint or SMIL. We suppose that the savings will be greater for direct use of PowerPoint compared to SMIL presentations since PowerPoint already provides a

16 ELISABETH ANDRE ET AL.

convenient interface to the user. Nevertheless, a benefit for the authoring of PowerPoint slides is expected especially in situations where users will have to produce a larger variety of PowerPoint documents for different settings of generation parameters. In addition, PowerPoint cannot of course be used to generate a presentation based on a rough outline provided by the user.

Autobriefer has only touched the surface of what might be possible in terms of automated presentation of briefings. In the future, more use of natural language generation would allow for a greater degree of flexibility in generation of narrated briefings. To provide for more persuasive briefings, it would also be useful to allow the human speaker to specify in the GUI interaction the emotional and rhetorical gestures a character should convey since this information is hard to determine automatically.

ACKNOWLEDGMENTS

The work was funded by the BMBF under contract number 01 IW 001 and DARPA’s GENOA research program, under contract number DAA-B07-99-C-C201. We would like to thank Peter Rist for designing and animating the virtual presenter. Part of this work has been conducted by the first author within the DFKI project MIAU. Thanks to Hamish Cunningham from Sheffield University for providing us with the GATE system which made it possible to install a complete version of Autobriefer at DFKI without having to rely on the commercial information extraction software used within GENOA.

REFERENCES

André, E., and T. Rist. (1996) Coping with temporal constraints in multimedia presentation planning. In Proceedings of the AAAI ‘96. Menlo Park, Cambridge, London: AAAI Press/The MIT Press, Vol. 1:142–147.

André, E., Rist, T. , van Mulken, S., Klesen, M. and Baldes, S. (2000). The Automated Design of Believable Dialogues for Animated Presentation Teams. In: Cassell, J., Sullivan, J., Prevost, S. and Churchill, E. (eds.): Embodied Conversational Agents, 220-255, Cambridge, MA: MIT Press.

Andre, E. and Rist, T. (2001) Controlling the Behavior of Animated Presentation Agents in the Interface: Scripting versus Instructing. AI Magazine, Vol. 22(4): 53-66, 2001.

Borning, A., Marriott, K., Stuckey, P. and Xiao, Y. (1997). Linear Arithmetic Constraints for User Interface Applications, Proc. of the 1997 ACM Symposium on User Interface Software and Technology, 87-96.

Cassell, J., Nakano, Y., Bickmore, T., Sidner, C. and Rich, C. (2001). Non-verbal Cues for Discourse Structure, Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics, 106-115.

Dalal, M., Feiner, S., McKeown, K., Pan, S., Zhou, M., Hollerer, T., Shaw, J., Feng, Y., and Fromer, J. (1996) Negotiation for Automated Generation of Temporal MultimediaPresentations. Proceedings of ACM Multimedia '96, 55-64.

Bes, F., Jourdan, M., and Khantache, F. (2001) A Generic Architecture for Automated Construction of Multimedia Presentations. The 8th International Conference on Multimedia Modeling, Amsterdam, 2001, 229-246.

Geurts, J. van Ossenbruggen, J., and Hardman, L. (2001) Application-Specific Constraints for Multimedia Presentation Generation. The 8th International Conference on Multimedia Modeling, Amsterdam, 247-266.

AUTOBRIEFER 17

Mani, I. and Bloedorn, E. (1999). Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1(1), 35-67.

Mani, I., Gates, B., and Bloedorn, E. (1999) Improving Summaries by Revising Them. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, 558-565.

Mittal, V., Roth, S., Moore, J., Mattis, J., and Carenini, G. (1995) Generating Explanatory Captions for Information Graphics. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95), 1276-1283.

Nagao, K. and K. Hasida, K. (1998) Automatic Text Summarization Based on the Global Document Annotation. Proceedings of COLING'98, Montreal, 917-921.

ORATRIX (2002). www.oratrix.com Piwek, P.. Power, R., Scott, D., and van Deemter, K. (2003). Generating Multimedia Presentations: From

Plain Text to Screenplay, this volume. Poggi, I., Pelachaud, C., de Rosis, F., De Carolis, B., and Carofiglio, V. (2003) Greta. A Believable

Intelligent Embodied Conversational Agent, this volume. REAL (2002). www.real.com Schiffman, B., Mani, I., and Concepcion, K. Producing Biographical Summaries: Combining Linguistic

Knowledge with Corpus Statistics. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL'2001), 450-457.

SMIL (2002). Synchronized Multimedia Integration Language. www.w3.org/AudioVideo/ Taylor, P., Black, A., and Caley, R. (1998) The architecture of the Festival Speech Synthesis System.

Proceedings of the Third ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, 147-151. Theune, M., Heylen, D., and Nijholt, A. (2003) Generating Embodied Information Presentations, this

volume. Wahlster, W., Andre, E., Finkler, W., Profitlich, H.-J., and Rist, T. (1993) Plan-Based Integration of

Natural Language and Graphics Generation. AI Journal, 63:387-427.