Post on 21-Mar-2023
Informally Prototyping Multimodal, Multidevice User Interfaces
By
Anoop Kumar Sinha
B.S. (Stanford University) 1996 M.S. (University of California, Berkeley) 1997
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Engineering-Electrical Engineering and Computer Sciences
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge: Professor James A. Landay, Chair
Professor John C. Canny Professor Robert E. Cole
Fall 2003
The dissertation of Anoop Kumar Sinha is approved:
Chair Date
Date
Date
University of California, Berkeley
Fall 2003
1
Abstract
Informally Prototyping Multimodal, Multidevice User Interfaces
by
Anoop Kumar Sinha
Doctor of Philosophy in Engineering-Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor James A. Landay, Chair
Increasingly, it is important to look at the end-user’s tool of the future as not
a solitary PC, but as a diverse set of devices, ranging from laptops to PDA’s to tablet
computers. Some of these devices do not have keyboard and mouse, and thus
multimodal interaction techniques, such as pen input and speech input, will be
required to interface with them. Interaction designers are beginning to be faced with
the challenge of creating interfaces that target this style of interface. Our study into
their interface design practice uncovered the lack of processes and tools to help
them.
This dissertation covers the motivation, design, and development of
CrossWeaver, a tool for helping these designers prototype multimodal, multidevice
user interfaces. This tool embodies the informal prototyping paradigm, leaving
design representations in an informal, sketched form and creating a working
2
prototype from these sketches. Informal prototypes created with CrossWeaver can
run across multiple standalone devices simultaneously, processing multimodal input
from each one. CrossWeaver captures all of the user interaction when running a test
of a prototype. This input log can quickly be viewed for the details of the users’
multimodal interaction, and it can be replayed across all participating devices, giving
the designer information to help him or her iterate the interface design.
Our evaluation of CrossWeaver with professional designers has shown that
we have created an effective tool for early creative design of multimodal,
multidevice user interfaces. CrossWeaver dovetails with existing design processes
and can assist in a number of current design challenges.
ii
Table of Contents
Table of Contents ____________________________________________________ ii
List of Figures_______________________________________________________ v
List of Tables ______________________________________________________ xii
Acknowledgments _________________________________________________ xiii
1 Introduction ____________________________________________________ 1
1.1 Research Goals ______________________________________________ 2
1.2 Drawbacks of Current Methods _________________________________ 3
1.3 Advantages of the Proposed Method _____________________________ 4
1.4 Programming by Illustration____________________________________ 5
1.5 Thesis Statement_____________________________________________ 5
1.6 Design Range _______________________________________________ 6
1.7 Contributions ______________________________________________ 11
1.8 Dissertation Outline _________________________________________ 12
2 Related Work __________________________________________________ 14
2.1 Commercial Prototyping Tools ________________________________ 14
2.2 Informal User Interfaces______________________________________ 17
2.3 Multimodal User Interfaces ___________________________________ 21
2.4 Programming by Demonstration _______________________________ 24
3 Field Studies ___________________________________________________ 27
3.1 Interaction Designers ________________________________________ 27
3.2 Game Designers ____________________________________________ 30
3.3 Movie Designers____________________________________________ 32
3.4 Implications from Field Studies ________________________________ 34
4 Multimodal Theater _____________________________________________ 36
4.1 Multimodal Photo Album_____________________________________ 37
4.2 Multimodal In-Car Navigation System __________________________ 40
4.3 Design Implications for Multimodal Design Tools _________________ 42
iii
5 Design Evolution of CrossWeaver __________________________________ 44
6 First Interactive Prototype of CrossWeaver ___________________________ 56
6.1 Defining Interaction in CrossWeaver____________________________ 58
6.2 Matching the Designers’ Mental Models _________________________ 67
6.3 Informal Evaluation of the First Interactive Prototype_______________ 69
6.4 Design Implications _________________________________________ 72
7 CrossWeaver’s Final Implementation _______________________________ 74
7.1 CrossWeaver Final Implementation Definitions ___________________ 75
7.2 Design Mode ______________________________________________ 76
7.3 Test Mode_________________________________________________ 86
7.4 Analysis Mode _____________________________________________ 89
7.5 Final Prototype Summary_____________________________________ 92
8 Evaluation_____________________________________________________ 94
8.1 Recruitment _______________________________________________ 94
8.2 Experimental Set-Up ________________________________________ 96
8.3 Pre-Test Questionnaire _______________________________________ 96
8.4 Training __________________________________________________ 97
8.5 Protocol___________________________________________________ 97
8.6 Tasks_____________________________________________________ 98
8.7 Post-Test Questionnaire ______________________________________ 99
8.8 Designers’ Profiles __________________________________________ 99
8.9 Designers’ Case Studies _____________________________________ 103
8.10 Designers’ Post-Test Survey Results ___________________________ 115
8.11 Designers’ Post-Test Questionnaire Results _____________________ 119
8.12 Design Issues _____________________________________________ 122
8.13 Evaluation Summary _______________________________________ 124
9 Implementation and Limitations___________________________________ 126
9.1 Implementation Details _____________________________________ 126
9.2 Limitations _______________________________________________ 130
iv
10 Future Work and Conclusions __________________________________ 133
10.1 Future Work ______________________________________________ 133
10.2 Contributions _____________________________________________ 135
10.3 Conclusions ______________________________________________ 137
Appendix A. Evaluation Materials ____________________________________ 139
A.1 Demo Script_________________________________________________ 139
A.2 Participant Background Questionnaire ____________________________ 143
A.3 User Interface Designer Pre-Test Questionnaire _____________________ 144
A.4 User Interface Designer Post-Test Survey__________________________ 145
A.5 User Interface Designer Post-Test Questionnaire ____________________ 147
A.6 Consent Form _______________________________________________ 148
A.7 Participant’s Designs Built Using CrossWeaver During User Testing ____ 149
References _______________________________________________________ 167
v
List of Figures
Figure 1-1. Four motivating applications from the academic literature. ...................10
Figure 3-1. Storyboard showing a multimodal map navigation system for an in-car
dash.....................................................................................................................28
Figure 3-2. Artifacts produced by game designers include bubble diagrams and
characteristic storyboards. ..................................................................................30
Figure 3-3. Movie designers have a formal storyboarding process encompassing
annotations for camera, character, and director instructions. .............................33
Figure 4-1. Multimodal Photo Album Room simulation. .........................................38
Figure 4-2. User testing a Multimodal in-car navigation system (left) with a Wizard
of Oz using the Speech Command Center (right). .............................................40
Figure 5-1. Design evolution of CrossWeaver. .........................................................44
Figure 5-2. Early design sketch for CrossWeaver’s interface for creating
representations of multimodal input. ..................................................................45
Figure 5-3. Early design sketch for CrossWeaver’s interface showing a potential
way of representing “Operations”. .....................................................................46
Figure 5-4. Early design sketch for CrossWeaver’s interface showing the
representation of a “Selection Region” in which a command will be active. ....47
Figure 5-5. The first interactive prototype of CrossWeaver.....................................48
Figure 5-6. The design mode in the Collaborative CrossWeaver Prototype. ............50
vi
Figure 5-7. The test mode browsers in the Collaborative CrossWeaver Prototype. .51
Figure 5-8. A design sketch of CrossWeaver before the final implementation. .......52
Figure 5-9. The Design Mode of CrossWeaver in the final implementation. ...........53
Figure 5-10. The Test Mode of CrossWeaver in the final implementation...............54
Figure 5-11. The Analysis Mode of CrossWeaver in the final implementation. ......55
Figure 6-1. The first prototype of CrossWeaver shows two scenes in a multimodal
application and a transition, representing various input modes, between them.
Transitions are allowed to occur in this design when the user types ‘n’, writes
‘n’, or says ‘next’................................................................................................56
Figure 6-2. A participant executes the application in Figure 6-1 in the multimodal
browser. The browser shows the starting scene (a); the participant then draws
the gesture “n” on the scene (b); the browser then transitions to the next pane in
the multimodal storyboard shown in (c).............................................................57
Figure 6-3. The first CrossWeaver prototype shows thumbnails representing
reusable operations (top), scenes that have imported images (left and right), a
scene that targets screen and audio output (left) and a scene that targets PDA
output (right).......................................................................................................59
Figure 6-4. The available output devices for scenes: screen output (default), audio,
PDA, and printer (future work). These icons are dragged onto scenes to specify
cross device output. ............................................................................................60
vii
Figure 6-5. The available input modes for transitions in the first CrossWeaver
prototype: mouse gesture, keyboard, pen gesture, speech input, and phone
keypad input. These are dragged onto transition areas to specify multimodal
interaction. ..........................................................................................................61
Figure 6-6. (a) The transition specifies a keyboard press ‘n’ to move to the next
scene or a gesture ‘n’ via pen input or a speech command ‘next’. (b) With the
two bottom elements grouped together, the transition represents either a
keyboard press ‘n’ by itself or the pen gesture ‘n’ and the speech command
‘next’ together synergistically. ...........................................................................62
Figure 6-7. The designer designates a storyboard sequence as a specific operation by
stamping it with the appropriate operation primitive icon. There are six basic
primitives that can be used, from top to bottom: defining adding an object,
defining deletion, defining a specific color change, defining a view change
(zoom in, zoom out, rotate, or translation), defining an animation path, or
defining a two point selection (as in a calculate distance command).................63
Figure 6-8. The designer designates a pushpin as a reusable component by creating a
storyboard scene and dragging the “+/add object” icon on to it. A pushpin can
then be added in any scene by selecting a location and using any of the input
modes specified (e.g., pressing ‘p’ on keyboard or saying ‘pin’ or drawing the
‘p’ gesture)..........................................................................................................64
viii
Figure 6-9. The designer designates coloring blue as a reusable operation by
creating a storyboard scene and dragging the “define color” icon onto it. A
selected object in a scene can be colored blue in any scene by any of the input
modes specified (e.g., clicking on the object and pressing ‘b’ on the keyboard or
saying ‘make blue’). ...........................................................................................65
Figure 6-10. The designer defines zooming in and out as separate operations by
drawing examples of growing and shrinking with any example shape and
stamping the view operation icon onto them. These operations are triggered in
the browser by the input operations in the transition between them (e.g.,
pressing ‘z’ on the keyboard, gesturing ‘z’ with the pen, or saying ‘zoom in’
and pressing ‘o’ on the keyboard, gesturing ‘o’, or saying ‘zoom out’). ...........66
Figure 6-11. The user can click anywhere and say “add pin” to add the reusable
pushpin component, as defined in Figure 6-8, at the clicked point. ...................67
Figure 6-12. The user triggers the make blue color operation, as defined in Figure
6-9, by selecting any object and saying “make blue”.........................................68
Figure 7-1. The CrossWeaver design mode’s left pane contains the storyboard,
which is made up of scenes and input transitions. The right pane contains the
drawing area for the currently selected scene.....................................................75
Figure 7-2. A scene in the storyboard contains (a) a thumbnail of the drawing, (b)
device targets and text to speech audio output, (c) input transitions showing the
natural inputs necessary to move from scene to scene, including mouse click,
ix
keyboard gesture, pen gesture, and speech input, and (d) a number identifying
the scene and a title.............................................................................................78
Figure 7-3. Here we specify an input region, a dashed green area in the pane (the
circles), in which linked gesture commands must happen to follow the
transitions. ..........................................................................................................80
Figure 7-4. CrossWeaver’s comic strip view shows the storyboard in rows. Arrows
can be drawn (as shown) or can be turned off....................................................81
Figure 7-5. Grouping two scenes and creating an “Operation.” Based on the
difference between the scenes, this operation is inferred as the addition of a
building to a scene, triggered by any of the three input modes in the transition
joining the two scenes. .......................................................................................83
Figure 7-6. A global transition points back to the scene from which it started. The
third input panel specifies that a keyboard press of ‘h’ or a gesture of ‘h’ or a
spoken ‘home’ on any scene will take the system back to this starting scene of
the storyboard. ....................................................................................................85
Figure 7-7. A bitmap image of a map has been inserted into the scene. A typed text
label has also been added. These images co-exist with strokes, providing
combined formal and informal elements in the scene. Images can be imported
from the designer’s past work or an image repository. ......................................87
Figure 7-8. Clicking on the “Run Test…” button in design mode brings up the Test
Mode Browser, which accepts mouse, keyboard, and pen input. (Top) In the
x
first sequence, the end user gestures ‘s’ on the scene and the scene moves to the
appropriate scene in the storyboard. (Bottom) In the second sequence, the user
accesses the ‘add building’ operation, adding buildings to the scene. This is
occurring in the standalone browser running on device PDA #0, as identified by
the ID in the title bar of the window. Pen recognition and speech recognition
results come into the browser from separate participating agents......................88
Figure 7-9. CrossWeaver’s Analysis Mode shows a running timeline of all of the
scenes that were shown across all of the devices. It also displays the interaction
that triggered changes in the state machine. The red outline represents the
current state in the replay routine. Pressing the play button steps through the
timeline step-by-step replaying the scenes and inputs across devices. The
timestamp shows the clock time of the machine running when the input was
made. ..................................................................................................................91
Figure 8-1. Diagram of the Experimental Setup. ......................................................97
Figure 8-2. Participant #4’s storyboard for Task #1................................................105
Figure 8-3. Participant #4’s storyboard for Task #2................................................106
Figure 8-4. Participant #1’s storyboard for Task #1................................................109
Figure 8-5. Participant #1’s storyboard for Task #2................................................111
Figure 8-6. Participant #9’s storyboard for Task #1................................................112
Figure 8-7. Participant #9’s storyboard for Task #2................................................115
Figure 8-8. Rating of CrossWeaver’s functional capability. ...................................117
xi
Figure 8-9. Rating of CrossWeaver’s ease of use. ..................................................118
Figure 8-10. Rating of CrossWeaver’s understandability. ......................................119
Figure 9-1. CrossWeaver Architecture....................................................................126
Figure A-1. Participant #1, Task #1 ........................................................................149
Figure A-2. Participant #1, Task #2 ........................................................................150
Figure A-3. Participant #2, Task #1 ........................................................................151
Figure A-4. Participant #2, Task #2 ........................................................................152
Figure A-5. Participant #3, Task #1 ........................................................................153
Figure A-6. Participant #3, Task #2 ........................................................................154
Figure A-7. Participant #4, Task #1 ........................................................................155
Figure A-8. Participant #4, Task #2 ........................................................................156
Figure A-9. Participant #5, Task #1 ........................................................................157
Figure A-10. Participant #5, Task #2. .....................................................................158
Figure A-11. Participant #6, Task #1 ......................................................................159
Figure A-12. Participant #6, Task #2 ......................................................................160
Figure A-13. Participant #7, Task #1 ......................................................................161
Figure A-14. Participant #7, Task #2 ......................................................................162
Figure A-15. Participant #8, Task #1 ......................................................................163
Figure A-16. Participant #8, Task #2 ......................................................................164
Figure A-17. Participant #9, Task #1 ......................................................................165
Figure A-18. Participant #9, Task #2 ......................................................................166
xii
List of Tables
Table 1-1. What is Multimodal, Multidevice? ............................................................7
Table 1-2. Motivating applications for multimodal, multidevice interface design. ....9
Table 4-1. Multimodal Theater Simulations .............................................................36
Table 8-1. Participant designers’ backgrounds in the final CrossWeaver user tests
..........................................................................................................................102
Table 8-2. Results of the Post Test Survey given to the participants. .....................116
Table 8-3. Selected participant’s general comments from the post-test questionnaire.
..........................................................................................................................122
xiii
Acknowledgments
Primary thanks go to Prof. James Landay, the head advisor for this thesis and
the biggest proponent of Informal Prototyping as a paradigm in User Interface
Design. Prof. Landay’s insights and constructive comments have improved this
work in many ways.
The other two readers of this dissertation, Prof. John Canny and Prof. Bob
Cole (Haas) have been inspirational teachers and advisors throughout my years in the
Ph.D. program in graduate school. Prof. Jen Mankoff and Prof. Ken Goldberg,
members of my thesis proposal committee, have provided creative and patient
feedback on this work.
Great thanks go to the entire Group for User Interface Research (GUIR) at
UC Berkeley, who have been like brothers and sisters in my years in graduate
school. Jimmy, Jason, and Scott have been there from the very beginning for
collaboration, discussion, debate, and friendship. Jimmy has been an especially
valuable collaborator, including writing some lines of early CrossWeaver code, and
had many great insights along the way in this thesis. Xiaodong, Hesham, Yang,
Richard, Sarah, Mark, Wai-ling, Katie, and Francis have been collaborators,
officemates, and friends that I appreciate greatly. Corey, Amit, Gloria, Alan, and
others have been great undergraduate collaborators during the course of this
xiv
research. Prof. Marti Hearst, Prof. Anind Dey, and Dr. Rashmi Sinha have always
given extremely valuable feedback and instruction during talks about this work.
The Diva Group in the EECS Department was instrumental in the final
implementation of CrossWeaver. Michael, Heloise, John, and Steve all deserve
special thanks. Michael in particular has been a constant collaborator and colleague
from the beginning of graduate school, and I am fortunate to have benefited from
that excellent interaction while I was completing this Ph.D.
I spent one summer at SRI with a pioneering group that did work in
interactive multimodal applications. Luc Julia and Christine Halverson were
instrumental in supporting this research.
Special thanks to all of the participants in the design interviews and
evaluation. This group was extremely interesting and valuable to work with during
the course of this Ph.D.
The last but not least groups to thank are my family and friends who traveled
with me through this journey. Mom, Dad, Gita, and Anoop P. are always my
supporters and my fans, and have believed in me more than I have believed in myself
at times. I unfortunately cannot name all friends, but Steve, Sam, Emilie, Ameet,
Lela, Hemanth, and Amir have given support and seen me grow through many years.
My wife Aparna is a wonderful life partner who I cherish greatly.
Chapter 1
1
1 Introduction
Increasingly, it is important to look at the end-user’s tool of the future as not a
solitary PC, but as a diverse set of smart, cooperating devices, ranging from laptops
to PDA’s to cell phones to web tablets. Some of these devices do not have keyboard
and mouse, and thus techniques such as pen input or speech input, are required to
interface with them. Applications that utilize pen and speech and other natural input
modes are called multimodal applications (Oviatt, Cohen et al. 2000). At present,
various networking infrastructures such as BlueTooth (Bluetooth 2003) and IEEE
802.11 (IEEE 2003) are being built to enable users to collaborate with each other or
run applications that span more than one device per end-user. Applications that span
across devices are what we term multidevice applications. Together, applications
that use natural input modes and span across devices are what we term multimodal,
multidevice applications.
Many interaction designers are already faced with the challenge of
developing interfaces for multimodal, multidevice applications (Sinha and Landay
2002). The enormous increase in the number of mobile devices in use, such as
cellular phones, palm-sized devices, and in-car navigation systems, has precipitated
this (InStat 2003). Many mobile devices lack screen real estate and lack keyboards,
and thus push these designers towards pen interfaces and speech interfaces. At
Chapter 1
2
present, there are few techniques and tools to help these designers prototype
multimodal, multidevice interfaces (Sinha and Landay 2002).
This dissertation covers the motivation, design, development, and evaluation
of CrossWeaver, a tool that will allow designers to informally prototype multidevice,
multimodal interfaces. CrossWeaver embodies the informal prototyping paradigm
(Landay and Myers 2001), leaving design representations in an informal, sketched
form and creating a working prototype from these sketches. CrossWeaver allows the
designer to quickly specify device-appropriate user interfaces that use pen, speech,
mouse, or keyboard. It supports immediate testing of those interfaces with end-
users, using the desired interaction modes on the proper devices. It also allows the
designers to collect and analyze that interaction to inform their iterative designs.
1.1 Research Goals
The primary goal of this research is to show that informal prototyping
principles can be used to enable designers to creatively design multimodal,
multidevice interfaces. To date, informal prototyping tools have been created for
graphical user interfaces in SILK (Landay and Myers 1995; Landay 1996), web
interfaces in DENIM (Lin, Newman et al. 2000), speech user interfaces in SUEDE
(Klemmer, Sinha et al. 2001; Sinha, Klemmer et al. 2002); and multimedia interfaces
in DEMAIS (Bailey, Konstan et al. 2001; Bailey 2002). Multimodal and multidevice
interfaces is a new domain for informal prototyping and one for which few formal
Chapter 1
3
tools even exist. This dissertation covers the development of an informal tool for
multimodal, multidevice prototyping.
1.2 Drawbacks of Current Methods
Multimodal, multidevice systems are difficult to prototype. This difficulty is
due to the complexity of the hardware and software used for multimodal design
experimentation combined with the lack of established techniques. A typical
multimodal interface today is built with a complex programming application
programming interface (API) that involves asynchronous processing of recognition
results, as in the Tablet PC (Microsoft 2003d), or with a distributed agent-based
architecture with messages that are defined to enable and process communication, as
in the Adaptive Agent Architecture (Cohen, Johnston et al. 1997) and Open Agent
Architecture (Moran, Cheyer et al. 1998). In both cases, multimodal interface
development involves brittle recognizers that are difficult to integrate into an
application. These recognizers require formal specification of input grammars and
lengthy logic to process and interpret results.
At present there are few tools for non-programmers to utilize recognizers and
build multimodal systems, making it difficult for non-programmer designers to
explore and envision different designs. We address this problem by building a
multimodal, multidevice user interface prototyping tool based on sketching of
interface storyboards.
Chapter 1
4
1.3 Advantages of the Proposed Method
Sketching storyboards as a computer input technique builds upon designers’
experience with sketching on paper in the early stages of interface design (Landay
1996). By relying on sketching as the input technique, CrossWeaver allows non-
programmer designers to use our system.
We enable our designers to utilize multimodal recognition and multidevice
output by abstracting the underlying software, recognition systems, and hardware
required. CrossWeaver is based on the Open Agent Architecture, one of the agent
architectures that is commonly used for implementing multimodal interfaces (Moran,
Cheyer et al. 1998), but we hide this from the designers that use the system. We also
offer abstractions for speech and pen recognition systems and shield the designer
from the details of the recognizers. The recognizers in the system can easily be
substituted with Wizards of Oz (Kelley 1984), people who perform the recognition
and input it via a separate computer interface.
Dividing the early informal prototyping process into three phases – design,
test, and analysis – has proved successful in past informal prototyping tools
(Klemmer, Sinha et al. 2001). We have adopted this three phase process in
CrossWeaver. Putting all three phases into the same tool simplifies the early stage
design process and enhances the designer’s ability to iterate his or her designs.
Chapter 1
5
1.4 Programming by Illustration
Defining a working prototype via a set of example sketches is a process that
we term Programming by Illustration, in which there is enough information in the
sketches, the sequencing, and the designer’s annotations to create a working
prototype (Sinha and Landay 2001). Programming by Illustration is specifically
useful in the case of user interface design in which sketching is already a key part of
the design process. Similar to many Programming by Demonstration (PBD)
techniques (Cypher 1993), Programming by Illustration relies on a specific set of
examples to represent an application. In contrast to many PBD techniques,
Programming by Illustration is an informal visual specification. It more closely
matches the informal visual language style used by user interface designers (Wagner
1990; Landay and Myers 2001). Sketched storyboarding in Programming by
Illustration takes the place of more formal template matching used in most
Programming by Demonstration techniques. In sketched storyboarding, the full
interface is specified in the scenes and transitions that are drawn in the design.
1.5 Thesis Statement
CrossWeaver, an informal prototyping tool, allows interface designers to
build multimodal, multidevice user interface prototypes, test those prototypes with
Chapter 1
6
end-users, and collect valuable feedback informing iterative multimodal, multidevice
design.
1.6 Design Range
Our working definition of multimodal, multidevice applications that is
presented below introduces applications for which it is useful. We motivate our
work from past research efforts in multimodal and multidevice interfaces.
1.6.1 What is Multimodal, Multidevice?
Multimodal systems have been viewed as an attractive area for human-
computer interaction research since Bolt’s seminal “Put That There” (Bolt 1980) for
positioning objects on a large screen using speech and pointing. The promise of
multimodal interaction has been and continues to be more natural and efficient
human-computer interaction (Cohen, Johnston et al. 1998). The multimodal design
space is growing in popularity due to the increasing accuracy of perceptual input
systems (e.g., speech recognition, handwriting recognition, vision recognition, etc.)
and the increasing ubiquity of heterogeneous computing devices (e.g., cellular
telephones, handheld devices, laptops, and whiteboard computers).
In the HCI academic community, multidevice has typically referred to the use
of a variety of devices for computing tasks (Rekimoto 1997). In industry, the term
multimodality has instead started to include the use of multiple devices for
Chapter 1
7
computing tasks, and interfaces that scale or morph across these different devices
(W3C 2000). In this dissertation, multidevice is introduced as the term referring to
applications that might span multiple devices simultaneously, such as in a
collaborative application or in an application that has multiple devices for a single
user. Multimodal is a term used specifically to refer to natural input modes. In
contrast, multimedia refers to systems that incorporate multiple output modes (e.g.,
visual and audio). Although multimodal and multidevice are terms still open to
formal definition, we embrace both multiple input modalities and multidevice
interaction and propose the following working definitions:
Table 1-1. What is Multimodal, Multidevice?
Multimodal Communication with computing systems using perceptual input modalities such as speech, pen, gesture, vision, brain wave recognition, etc., either fused multiple modes at once or un-fused and possibly used in place of one another Human-computer interaction generated from our “normal” sensory interaction with the world, not based on restrictive keyboard and mouse input.
Multidevice Applications that span heterogeneous devices cooperatively. Can be with multiple users as in a collaborative application or with one user using multiple devices simultaneously.
Multimedia Communication from the computing system using perceptual output modes, such as visuals and audio.
Chapter 1
8
Multidevice is a broad category, ranging from handhelds to tablets to laptops
to whiteboard computers. In this dissertation, we restrict our attention to popular
existing devices, including handhelds, laptops, tablet computers, desktops, and
whiteboard computers.
The broadness of the definition of multimodal interaction necessitates picking
specific modalities to consider at the outset. Commercial systems currently support
speech recognition (IBM 2003; Nuance 2003) and handwriting/pen gesture
recognition (Paragraph 1999; Microsoft 2003d). These two input modes are
tractable while still being potentially rich. Thus, in this dissertation we focus on
speech and pen as input modalities. Multimedia has been explored in detail over the
last 20 years in both research (Bailey 2002; SIGMM 2003) and commercial systems
(Adobe 2003a; Macromedia 2003a). It is not a research focus of this dissertation.
1.6.2 Where is Multimodal, Multidevice useful?
Table 1-2 lists some of the motivating applications in the multimodal,
multidevice interface design space. When implemented with input mechanisms that
are different from traditional keyboard and mouse, these applications often benefit
from improved efficiency of use, immersion, realism, and natural communication
with the end-user.
Chapter 1
9
Various multimodal and multidevice prototypes have already been built in
academic research and have demonstrated the benefits shown in Table 1-2. Four of
these motivating applications are shown in Figure 1-1. The Infowiz kiosk (Cheyer,
Julia et al. 1998) involves speech control of a kiosk touch screen. This provides two
methods of navigation input in one interface. The CARS navigation system (Julia
and Cheyer 1999) combines augmented reality glasses, pointing, and speech
commands to allow a user to identify buildings and landmarks while driving.
Quickset (Cohen, Johnston et al. 1997) is a map navigation system that uses speech
and pointing for military planning. The eClassroom (Abowd, Atkeson et al. 1996;
Abowd 1999) at Georgia Tech links whiteboard computers, desktop computers, and
laptops in the classroom into a classroom instruction system. This is an example of a
multimodal and multidevice application. Each of these applications required a
Table 1-2. Motivating applications for multimodal, multidevice interface design.
CommunicationGesture, Avatars, Voice, VideoComputer Mediated Collaboration, Meetings, Classrooms
Practicing on Realistic Situations,Military Preparation
Hand/feet Coordination, Display Panel
Flight Simulation
Controlling Characters, Immersion
Hand/feet CoordinationGames and Virtual Reality
Fluidity, SpeedPointing, Drawing, and SpeechMap and Drawing Applications
Human ConcernNovel inputExisting Examples
CommunicationGesture, Avatars, Voice, VideoComputer Mediated Collaboration, Meetings, Classrooms
Practicing on Realistic Situations,Military Preparation
Hand/feet Coordination, Display Panel
Flight Simulation
Controlling Characters, Immersion
Hand/feet CoordinationGames and Virtual Reality
Fluidity, SpeedPointing, Drawing, and SpeechMap and Drawing Applications
Human ConcernNovel inputExisting Examples
Chapter 1
10
significant investment of time and equipment to implement. Even the earliest
prototypes of these interfaces were only demonstrable after significant development
by expert programmers.
1.6.3 What can CrossWeaver prototype?
CrossWeaver has been designed to prototype a wide variety of applications
similar to the motivating applications shown above. As introduced in Chapter 5, it
accomplishes this by providing the user with a generic storyboarding model, suitable
for creating scenes of an application in many different domains.
Certain applications require more functionality than a storyboard-based model
can provide, such as map navigation or drawing tools, query-response systems, and
CARS navigation assistant from SRI by Julia, et. al. 1999
InfoWiz kiosk from SRI by
Cheyer, et. al. 1999
QuickSetfrom OHSIby Cohen,et. al. 1999
eClassroomfrom GeorgiaTech by Abowd, et. al. 1999
CARS navigation assistant from SRI by Julia, et. al. 1999
InfoWiz kiosk from SRI by
Cheyer, et. al. 1999
QuickSetfrom OHSIby Cohen,et. al. 1999
eClassroomfrom GeorgiaTech by Abowd, et. al. 1999
Figure 1-1. Four motivating applications from the academic literature.
Chapter 1
11
interactive multi-person whiteboards. CrossWeaver has within it basic facilities for
prototyping features of map and drawing tools, such as zooming, panning, and
adding objects. CrossWeaver does not have within it general functionality for other
domains. Extensions can be made to CrossWeaver for other domains as will be
discussed in the future work section.
1.7 Contributions
CrossWeaver extends Informal Prototyping to the multimodal, multidevice
domain, giving non-programmer designers one of the first tools that they can use to
explore this design space. Specific contributions include:
Concepts and Techniques:
Extending Wizard of Oz simulation to multimodal, multidevice interface
design.
Extending the methodology of informal prototyping with electronic sketching
to the multimodal, multidevice interface domain.
Introducing a compact way of representing multimodal input and output.
Creating a storyboarding scheme that represents a testable prototype of a
multimodal, multidevice application.
Enabling the testing of an informal prototype that spans multiple devices and
uses multiple input recognizers simultaneously.
Chapter 1
12
Capturing the execution of a multimodal, multidevice prototype across
multiple devices with multiple input recognizers.
Artifacts:
The first tool that can be used for the earliest phases of multimodal,
multidevice user interface design experimentation.
Implementation of the three phases (design, test, and analysis) of the early-
stage, iterative design process for multimodal, multidevice interface
designers.
Experimental Results:
A survey of professional interface designers with an interest in multimodal,
multidevice interface design showing that they presently use ad hoc
techniques to approach the multimodal, multidevice design space.
An evaluation that shows that such designers believe that CrossWeaver will
enable them to better explore the new design space of multimodal,
multidevice interface design and will help them with their designs.
1.8 Dissertation Outline
The rest of this dissertation covers the design, development, and evaluation of
CrossWeaver. The dissertation begins with a review of the related work in the
different research areas that impact this work. It continues with a description of the
background field studies of designers interested in multimodal, multidevice design.
Chapter 1
13
The next chapter describes techniques and guidelines that we developed and
experimented with to assist designers in multimodal, multidevice design using
traditional paper prototyping. From the lessons learned in the field studies and in the
paper prototyping techniques, we discuss the design evolution of CrossWeaver,
including the key problems that we faced in the design process. We describe in
detail the first interactive prototype that we built of CrossWeaver, including an
informal evaluation of that system. We then describe the final implementation of
CrossWeaver. We next cover an evaluation of the final CrossWeaver tool with nine
professional interaction designers, and report on their experiences using
CrossWeaver in the user test. We conclude with a discussion of the future work for
this research and a review of the contributions of this research.
Chapter 2
14
2 Related Work
Bill Buxton’s refrain in his invited plenary at the Computer Human
Interaction Conference in 1997 was “Don’t take the GUI as given” (Buxton 1997).
Many researchers including our research group have heeded this call. CrossWeaver
adds to existing research in informal user interfaces (Landay and Myers 2001),
multimodal user interfaces (Oviatt, Cohen et al. 2000), and Programming by
Demonstration (Cypher 1993). It combines lessons and approaches from these
different areas towards a new approach for building user interface prototypes that go
beyond GUI interfaces.
2.1 Commercial Prototyping Tools
The professional designers that we worked with in the development of
CrossWeaver (see Chapter 3) were all familiar with the technique of paper
prototyping (Wagner 1990; Rettig 1994) in which drawings on paper represent
scenes of an interface. The designer or one or two assistants can play the role of
computer with those drawings and show them in the proper sequence in front of an
end-user who is testing the simulated interface. Sketched drawings can evolve into
formal storyboards, which are like formal comic strip sequences showing the
temporal sequence of a set of user actions in an interface (McCloud 1993; Modugno
1995).
Chapter 2
15
Even though all of the designers were comfortable with sketching on paper,
some preferred to use electronic tools to assist in storage, versioning, and
collaboration (Sinha and Landay 2002). All of the designers used some professional
drawing or prototyping tool, usually soon after drawing a few rough sketches on
paper. Some of these prototyping tools have become very sophisticated and have
underlying programming environments to allow interface simulation and movement.
Others are used simply as drawing tools, without any capability of creating or
simulating active applications.
The most popular tool used by the designers that we worked with was
Microsoft PowerPoint (Microsoft 2003b). PowerPoint was used by designers for
organizing prototype screenshots, creating wireframe storyboard scenes, organizing
design requirements, keeping design notes, and making presentations to
management. It is an extremely versatile tool for the designers and can fit in many
steps in the design process.
The next most popular tools were Adobe Photoshop (Adobe 2003c) and
Adobe Illustrator (Adobe 2003b), both used by designers as drawing tools.
Photoshop is particularly adept at taking existing images or screenshots and splitting,
merging, or making modifications to them. Illustrator is used by designers for
drawings of screens, wireframes, or creating scenes of a storyboard. Both of these
tools create artifacts that are commonly printed or exported to other programs, such
as Microsoft PowerPoint, before being shown.
Chapter 2
16
Macromedia Director (Macromedia 2003b) was used by many of our
designers to create interactive prototypes. Director enables sophisticated multimedia
output using the metaphors of actors, scenes, and storyboards. The actors are given
behaviors in Director with built-in templates or via a programming language, Lingo.
Director is a fairly advanced tool and is not as well suited for early stage prototyping
as pen and paper, as it requires considerable expertise in computer use and
programming, which many designers do not have (Landay 1996).
These tools and other tools like them (e.g., Macromedia Freehand for
storyboarding and drawing (Macromedia 2003d), Microsoft Visio for workflow
diagrams (Microsoft 2003e), Macromedia Fireworks for graphics manipulation
(Macromedia 2003c), PaintShopPro for drawing (Software 2003), Solidworks 3D
CAD Software for mechanical design (Solidworks 2003) are popular with the
interface designers that we worked with, but none have yet approached the problem
of multimodal input using speech or pen recognition systems. The designers that we
talked to utilized different attributes of these tools into their own process to build
prototypes and explore the design space. Our approach is positioned between pen
and paper and more sophisticated computer tools, enabling designers to carry out
informal prototyping for multimodal, multidevice interfaces in a quick, self-
contained fashion.
Chapter 2
17
2.2 Informal User Interfaces
The Informal User Interface approach has been shown to successfully support
user interface designers’ early stage work practice (Landay and Myers 1995; Lin,
Newman et al. 2000; Klemmer, Sinha et al. 2001; Landay and Myers 2001;
Newman, Lin et al. 2003). In the informal user interface approach, designers work
from the natural forms of input: sketches, audio, other sensory input and only
transform into more formal representations gradually, if at all. Specifically,
CrossWeaver starts with designer-sketched screen shots, usually sketched with a
tablet on the computer.
Electronic sketching traces its roots to Sutherland’s original Sketchpad
(Sutherland 1963) which pioneered the use of a stylus (in Sketchpad’s case an
electronic light-pen) to draw on one of the first graphical displays. Stylus-based
graphical drawing and the potential benefits of pen-based computer interfaces have
been studied in various research efforts since then (Negroponte and Taggart 1971;
Negroponte 1973; Wolf, Rhyne et al. 1989; Brocklehurst 1991) and also in some
pioneering commercial efforts (Wang 1988; GO 1992; Microsoft 1992; Apple 1993).
(A comprehensive survey of the history of pen interfaces in research can be found in
(Landay 1996; Long 2001) and in commercial projects in (Bricklin 2003).)
More recently, electronic sketching has been applied to the exploratory
design process for designers in various domains. For example, Gross’s Electronic
Chapter 2
18
Cocktail Napkin supports freeform electronic sketching for architectural design
(Gross 1994; Gross and Do 1996). Alvarado’s ASSIST allows mechanical designers
to sketch diagrams which are then interpreted as working mechanical systems (Davis
2002). Hammond’s Tahuti (Hammond and Davis 2002) and Damm’s Knight tool
(Damm, Hansen et al. 2000) have incorporated electronic sketching in the design
process for software UML diagrams. Li’s SketchPoint enables quick electronic
sketching of informal presentations (Li, Landay et al. 2003). Kramer’s Translucent
Patches supports conceptual design involving freeform sketches organized as layers
(Kramer 1994). Through tests and field studies, these projects and an expanding list
of others have shown the potential benefits of electronic sketching in the design
process. One claim becoming more evident from these projects is that tools that
support informal design, those leaving designs in sketched, unrecognized, and un-
beautified form, might elicit greater comments than more finished prototypes (Wong
1992; Hong, Li et al. 2001). Leaving designs in unrecognized form is the approach
that CrossWeaver takes.
SILK (Landay and Myers 1995; Landay 1996) is a tool that was created for
sketching graphical interfaces. SILK introduced the idea of sketching user interfaces
and leaving them in unrecognized form when testing. SILK also introduced a
storyboarding paradigm for this type of design (Landay and Myers 1996).
CrossWeaver has extended SILK’s storyboard to multimodal commands and
Chapter 2
19
introduced the new concept of multidevice prototyping as an extension to the work
that SILK pioneered.
DENIM (Lin, Newman et al. 2000; Lin, Thomsen et al. 2002; Newman, Lin
et al. 2003) is an informal prototyping tool for web design. DENIM has sketched
pages and transitions as its basic elements. DENIM uses an infinite sheet layout,
arrows, and semantic zooming for web page layout and linking. In DENIM,
transitions are based on mouse events of different types (e.g., left or right click).
DENIM runs its designed web pages as a single state machine in its integrated
browser or in a standard standalone web browser. CrossWeaver has also adopted
sketched scenes and transitions. CrossWeaver uses a linear storyboard instead of an
infinite sheet to match the multimodal designers’ preference for thinking in terms of
short linear examples. CrossWeaver adds gesture transitions and speech transitions
to promote multimodal experimentation.
The design, test, analysis paradigm for informal prototyping was introduced
in SUEDE (Klemmer, Sinha et al. 2001; Sinha, Klemmer et al. 2002), an informal
prototyping tool for Wizard of Oz design of speech interfaces. SUEDE’s design
mode creates a flowchart of the speech interface in which transitions are speech
commands. The test mode of SUEDE maintains one active state, the currently
spoken prompt. SUEDE analysis mode captures the full execution of a user test. A
CrossWeaver design is also a flowchart. In CrossWeaver, transitions can be mouse
clicks, keyboard input, pen gestures, or speech input. In CrossWeaver test mode,
Chapter 2
20
however, the storyboard maintains a separate state per device and executes logic in
the storyboard across multiple devices. CrossWeaver’s analysis mode also captures
a full user test across those devices.
CrossWeaver’s analysis mode is influenced by the Designer’s Outpost history
mechanism (Klemmer, Thomsen et al. 2002), which captures the history of the
creation of an information architecture by web designers. CrossWeaver’s focus is on
capturing the log of the test of an interactive multimodal, multidevice application,
versus a history of commands in a design tool.
DEMAIS (Bailey, Konstan et al. 2001; Bailey 2002) is an informal tool for
multimedia authoring. It includes the concept of joined formal and informal
representations, in which audio and video clips co-exist with sketched
representations, and claims the informal representations have specific benefits
(Bailey and Konstan 2003). It also includes the concept of rich transitions between
scenes in its storyboard based on mouse events. As a multimedia tool, DEMAIS is
focused on the design of rich, diverse output. CrossWeaver’s focus is on natural
input for an interactive application. The two tools are quite complementary.
Among informal user interface techniques, Wizard of Oz (Kelley 1984) is
used for simulation when recognizers are not available or not convenient to use
(Gould and Lewis 1985). In a Wizard of Oz study, a human simulates the
recognition system, as a substitute for a real speech recognizer. Wizard of Oz has a
long history in the prototyping of speech applications (Dahlbäck, Jönsson et al.
Chapter 2
21
1993). Yankelovich made use of Wizard of Oz simulations in the design of the
Office Monitor application (Yankelovich and McLain 1996) and recommends them
generally in her work on Designing SpeechActs (Yankelovich, Levow et al. 1995;
Yankelovich and Lai 1998). Wizard of Oz simulation has also been used in
multimodal interface design and execution, which we cover in more detail in Section
2.3. CrossWeaver specifically supports Wizard of Oz techniques on the computer,
enabling the Wizard to participate as a speech or pen recognizer.
2.3 Multimodal User Interfaces
Past multimodal research, starting with Bolt’s Put that There pointing and
speaking application (Bolt 1980), has laid a fair amount of groundwork that we build
upon. Advances in recognition technologies (Paragraph 1999; IBM 2003; Nuance
2003) and also in hardware devices (Microsoft 2003d) are making multimodal
research more accessible and fruitful (Oviatt and Cohen 2003). Recent studies on
QuickSet in the map domain (Cohen, Johnston et al. 1997) and MultiPoint in the
presentation domain (Sinha, Shilman et al. 2001) have shown that end-users tend to
be very interested and attracted to multimodal user interfaces when asked about their
preference of multimodal versus graphical user interface interaction. (A
comprehensive review of multimodal interface history can be found in (Oviatt
2003).)
Chapter 2
22
Nigay and Coutaz outlined a design space for multimodal user interfaces
(Nigay and Coutaz 1993) that pointed out the possible use of concurrent and fused
user input modalities, where input recognition could be used one after another or
simultaneously. This design space focused on the possible technical combinations of
multimodal input modalities and how they might be used in an application.
CrossWeaver supports exploration of applications that lend themselves to
multimodal interaction, though not all of Nigay’s design space is fully supported.
Instead, CrossWeaver can support prototyping of any application that can fit into its
storyboarding model (see Section 7.2).
Dahlback gave guidelines for Wizard of Oz simulation (Dahlbäck, Jönsson et
al. 1993), with a focus on speech user interfaces. Wizard of Oz has also been viewed
as a successful technique for multimodal simulation in many experiments. Mignot
studied the potential future form of multimodal commands using Wizard of Oz
techniques (Mignot, Valot et al. 1993). Salber and Coutaz developed the NEIMO
platform (Salber and Coutaz 1993) for multimodal user interface simulation together
with tools that enable multimodal interaction and logging of user interactions. Nigay
and Coutaz formalized a computer architecture for multimodal interface
implementation (Nigay and Coutaz 1995). Cohen (Cohen, Johnston et al. 1997) and
Cheyer and Julia (Cheyer, Julia et al. 1998) have built systems with similar
capabilities for multimodal interface implementation either with computer-based
recognizers or with Wizard of Oz simulation. In those systems, the Wizard
Chapter 2
23
participates as a recognizer or as a remote controller of the application. However, all
of these Wizard of Oz systems require the application to pre-exist, pre-programmed
in the specific application environment. CrossWeaver enables Wizards of Oz to
participate as recognizers in the execution of an interface specified only by a
storyboard. Application programming that integrates real recognizers is not
required.
The seminal platform for interactive multimodal application design is the
QuickSet system (Cohen, Johnston et al. 1997) for implementing multimodal
applications built using the Adaptive Agent Architecture (AAA) (OGI 2003).
QuickSet is a programming platform that has been used to create multimodal
applications for map and military planning and mobile domains (Cohen, Johnston et
al. 1998; McGee and Cohen 2001; McGee, Cohen et al. 2002). It includes all of the
capabilities for creating multimodal, multidevice applications, and the applications
created with QuickSet could be considered target applications that could be designed
in CrossWeaver. Prototyping by non-programmers has not been the target of
QuickSet.
STAMP (Clow and Oviatt 1998), a multimodal logging tool accompanying
QuickSet, has been used to capture detailed information about multimodal user input.
These input logs can be analyzed to understand multimodal ordering, preferences,
and statistics. The CrossWeaver analysis display is designed only to give the
information that is of relevance to designers at the very first stage of multimodal,
Chapter 2
24
multidevice design, specifically the attempted commands and subsequent scenes
displayed. Hence CrossWeaver shows much less information than the STAMP tool,
but it shows the information that we found most interesting to non-programmer
designers.
An additional set of multimodal applications (Moran, Cheyer et al. 1998;
Ionescu and Julia 2000) have been built using the Open Agent Architecture (OAA)
from SRI (SRI 2003), which is the predecessor to QuickSet. Because of our need for
only basic multimodal features and based on our research collaboration with SRI,
CrossWeaver is built on top of OAA, using its distributed recognition agent
capabilities and its ability to manage the standalone CrossWeaver browsers.
Multimodal application design with OAA also has not previously been available to
non-programmer designers.
2.4 Programming by Demonstration
Specifying an application from example sketches, described later in this
dissertation, takes its inspiration from Programming by Demonstration techniques
(Halbert 1984; Cypher 1993; Smith, Cyper et al. 2000), those that build applications
from a set of examples. CrossWeaver takes the approach of using example scenes
and storyboards to create a testable application, but since it focuses only on informal
prototyping, CrossWeaver does not incorporate learning algorithms or underlying
Chapter 2
25
model-based architectures that are common in Programming by Demonstration
systems.
Among past Programming by Demonstration systems, CrossWeaver is most
similar to the work done in Kurlander’s Chimera system (Kurlander 1993),
Lieberman’s Mondrian (Lieberman 1993), and Modugno’s Pursuit (Cypher 1993).
In Chimera and Mondrian, individual graphical editing operations are represented by
storyboard sequences. The primary difference between our approach and Chimera is
the sequence of creating the examples. Rather than demonstrating examples of the
program to build editable macros, the examples that the designer draws become the
application. Additionally, our approach’s focus is not on speeding repetitive
operations for an existing application, but on prototyping a new system from a set of
illustrations. In these two ways it is similar to Mondrian. In Pursuit, a visual
language represents trainable actions performed in an operating system shell. Our
sequences also represent actions, namely the sequence of storyboard scenes to be
displayed. Our approach differs from all three systems in its target of multimodal
user interfaces and its informal sketch-based form.
CrossWeaver takes inspiration from Myers’ Peridot (Myers 1990), one of the
earliest systems for creating user interface components by demonstration. Peridot
targeted graphical user interface components and translated example actions into
textual programs. In CrossWeaver, the full information required for the program is
in the visual sketches themselves. Additionally, CrossWeaver is designed as a
Chapter 2
26
multimodal application prototyping tool. Many of Peridot’s goals, such as creating a
visual programming system that enables users to easily create a working application,
are the same as the goals for CrossWeaver.
Chapter 3
27
3 Field Studies
We interviewed 12 professional designers with an interest in multimodal user
interfaces in a field study similar to other studies done for other informal prototyping
tools (Landay 1996; Bailey 2002; Newman, Lin et al. 2003). Each of the
interviewed designers uses sketching during the first stage of his or her design
process to conceptualize user interface ideas. Most use some type of informal
storyboarding to string together the sketches into more complex behaviors.
Three of these designers were professional interaction designers targeting
applications for PDA’s, phones, or in-car navigation systems. Four of the designers
were game designers, a category of designers that has used alternative input devices
such as joysticks since the early pong games. Five of the designers were animators
and movie designers, who specify multimodal interaction through their storyboards,
even though they are not necessarily designing an interactive application. We review
the techniques and artifacts from each category of designers below.
3.1 Interaction Designers
The three professional interaction designers that we talked to all had
backgrounds in graphical user interface design. They were assigned by their
companies to work on projects for non-personal-computer devices in the last two or
three years. They considered these new projects for PDA’s and speech interfaces
Chapter 3
28
more challenging than the graphical user interface projects, and they changed some
of their design processes to address these new concerns.
The three designers all used some form of “Sketch-and-Show,” a term one of
them invented. They would sketch scenes of the proposed application and the
transitions among scenes and show them to other designers and others in the office.
On these sketches, they would add arrows and other informal annotations to show
different interaction techniques, such as pen gestures or speech commands.
One designer, who worked on a speech-based car navigation system, showed
us sketches with speech balloons systematically added to visual scenes to represent
combined visual and speech input and output (see Figure 3-1). These side-by-side
You are at Soda Hall, Berkeley
Computer, where am
I?
Turn leftTake me to Cory
Hall
Cory Hall on right
Speech input Speech outputVisual output
You are at Soda Hall, Berkeley
Computer, where am
I?
Turn leftTake me to Cory
Hall
Cory Hall on right
Speech input Speech outputVisual output
You are at Soda Hall, Berkeley
Computer, where am
I?
Turn leftTake me to Cory
Hall
Cory Hall on right
Speech input Speech outputVisual output
Figure 3-1. Storyboard showing a multimodal map navigation system for an in-car dash.
Chapter 3
29
visualizations addressed the challenge of combined audio and visual output
modalities.
For each of the designers, the sketched designs were difficult to evaluate.
The designer of the speech-based car navigation system had to wait until the system
was nearly complete before being able to see and hear what she had designed.
Building adequate prototypes to get comments from others was a common
challenge in the design process among the designers that we talked to. Typically, the
prototyping process for these interaction designers quickly jumps from sketches to
coded prototypes after a few iterations with paper designs.
One of the PDA designers that we talked to mentioned a prototyping process
that involved building screen representations in a tool such as Adobe Photoshop and
then stringing together those representations with hotspots in a tool such as Adobe
Acrobat or in HTML. These representations were adequate for walk-throughs, but
not for running prototypes, as they did not simulate pen-based interaction.
The car navigation system designer was unable to find any tool adequate for
simulating the in-car navigation system. The hardware system was being designed in
parallel to the interaction design and was unavailable for prototyping. The
multimodal interaction design for this system needed to be very good before
implementation, because it would be nearly impossible to make changes once the
hardware was fully developed.
Chapter 3
30
The interaction designers were particularly challenged when new hardware
devices were targeted. Not only does the interaction for these hardware devices need
to be worked out well in advance of the hardware being built, but the hardware
devices typically have many implementation challenges, making prototyping on
actual hardware extremely difficult.
3.2 Game Designers
The game designers that we talked to were primarily designing applications
for role-playing or action games in which there is a character traveling around an
animated world with a set of tasks, usually involving shooting. They were primarily
targeting the most popular video game consoles and were expecting input that used
joysticks, buttons, or special input devices based on the specific game system.
Soda Hall
Faculty Row
Undergrad Labs
Dungeon
Bubble diagrams show the outline of the different levels
Grad Offices
Sunny BeachStudent:programsneeds caffeine
Tools:backpack
Student:programsneeds caffeine
Tools:backpack
Figure 3-2. Artifacts produced by game designers include bubble diagrams and characteristic storyboards.
Chapter 3
31
The game designers that we interviewed emphasized that the plot of the game
being designed was much more important than any of the interaction design.
Typically, the script for the game is written; among our interviewees, scripts ranged
from a few dozen pages to 150 pages. The scripts for the game were thought of as
movie scripts, with the exception that they were translated into an interactive
application with characters and a set.
The storyboards sketched in game design generally outline the behaviors and
features of a particular character. The “characteristic storyboard” sketches out
different views of a character or a vehicle or a weapon, and lists the item’s behaviors
together with the view (see Figure 3-2). Additionally, most role-playing games
involve some sort of map, representing the levels or the virtual geography that the
end-user needs to negotiate. Some of the designers that we talked to drew sketches
representing physical maps; others represented virtual geography with bubble
diagrams (see Figure 3-2).
For the game designers, the input modes were determined by the target game
console, typically some sort of joystick or control pad. Consequently, the designers
were trying to fit their designs into available input capabilities. The designers
considered these non-keyboard and mouse capabilities more immersive than desktop
interaction for their games.
The game designers that we talked to were also experts in image editing tools
such as Adobe Photoshop. This allowed them to create screenshots and image
Chapter 3
32
representations close to the actual look of the final application. However, they were
unable to use these graphic representations for simulation in any of the available
tools. They had to rely on techniques similar to what they used when working with
the sketches to imagine the interaction.
Game designers typically work with expert programmers who pick up the
implementation when the story for the game is finalized. The game programmers
utilize a game engine that assists with graphics rendering, texture mapping, and
interaction with the input devices. These sophisticated game engines speed the
development process and allow refinement of the game as it is being created.
Typically the design evolves during implementation, with new levels being created
by the game designers in parallel to the programmers implementing the graphics.
3.3 Movie Designers
Movie designers specify elements similar to those specified by interaction
and game interface designers (i.e., characters, actions, and behaviors). In particular,
they need to think about characters interacting with props, which is akin to device
interaction, and human-human dialog, which is akin to multimodal interaction.
Storyboarding is a formal process in movie design (Thomas 1981; McCloud
1993). The movie designers that we talked to were quite comfortable representing
multimodal interaction in their storyboards. From our interviews, we saw three
Chapter 3
33
different styles of annotations: annotations for camera instruction, annotations for
character instructions, and annotations for director instructions (see Figure 3-3).
Typically there is a storyboard for every scene in the movie, because the
storyboard is used to guide the filming. Some designers mentioned that while the
script was the central artifact determining the movie plot and story, the movie
storyboards were the central artifact guiding the filming of the movie.
Movie storyboards have a remarkable ability to convey visual, verbal, and
non-visual information to guide filming. A large part of this effect is accomplished
with captions or annotations, which tie the storyboards back to the script, specify the
expression of the actors, and describe the movement of the actors within the scene.
CameraDirections
e.g., cut
zoom in
Description of scene, actions (e.g., our hero enters the saloon)
“Dialogue” (if present)
3 5
dissolve dissolve
1
All the players around the table“Hit me”
Pan across the players faces
2
4
Sound of cards shuffling“I raise you $1”
Don’t show any faces
Pictures of cards tossed on the table“I fold”
CameraDirections
e.g., cut
zoom in
Description of scene, actions (e.g., our hero enters the saloon)
“Dialogue” (if present)
3 5
dissolve dissolve
1
All the players around the table“Hit me”
Pan across the players faces
2
4
Sound of cards shuffling“I raise you $1”
Don’t show any faces
Pictures of cards tossed on the table“I fold”
Figure 3-3. Movie designers have a formal storyboarding process encompassing annotations for camera, character, and director instructions.
Chapter 3
34
Most of the movie designers that we talked to learned how to storyboard from
classes in film school or from other formal instruction. The look of the storyboards
that we saw was similar among the different movie designers that we interviewed.
Typically the storyboard is the artifact necessary to move to pre-production
and then filming. A good storyboard documents each shot in a movie with
significant detail. Much like in storyboarding for interaction design, changes to the
storyboard are best made at the drawing stage, rather than when the filming has
already begun.
3.4 Implications from Field Studies
From the field studies we learned that informal storyboarding was common
among multimodal designers. We also learned that the designers were comfortable
with special symbols representing different modalities; this was a fairly natural
concept for them akin to speech bubbles in comic strips (McCloud 1993). We also
learned that designers used annotations in all of their storyboards, representing
sequences, behaviors, and actions.
The most important visual elements of multimodal storyboarding were
arrows and text. Arrows were used to represent sequences, types of transitions,
movement, and actions. Since they were used in so many ways, the context in which
they were used was critical to understanding their meaning. Text is used for
Chapter 3
35
descriptions, dialog, commands, scenes, and other explanations necessary in the
storyboard. Generally the position of the text is enough to convey the context.
Each of the designers we talked to felt that the early stage processes for
multimodal user interface designs were more challenging than the corresponding
processes for two-dimensional graphical user interface design. In graphical user
interface design, paper prototyping is a well known technique (Rettig 1994). There
are no well-known similar techniques in the multimodal application domain. In
Chapter 4, we explore one such technique that we have developed for multimodal,
multidevice interface design.
Chapter 4
36
4 Multimodal Theater
To address some of the designers’ prototyping challenges (see Chapter 3) and
to explore techniques that might be adapted to interactive tools, we have made
efforts to extend some of the techniques of paper prototyping that are familiar to
interaction designers (Rettig 1994) to multimodal simulation using paper, other
physical materials, and Wizard of Oz participation (Chandler, Lo et al. 2002).
We call these experiments “Multimodal Theater” since they involve a
participant or participants, a set of actions specified in a script, a variety of props,
and a cast of Wizards of Oz in the actual simulation. In Multimodal Theater, the
script defines the behavior of the application. The props are paper sketches of screen
shots or other physical representation of devices. The Wizards of Oz simulate the
application based on the details in the script while the participant runs the
application. The applications that we prototyped and simulated include:
Table 4-1. Multimodal Theater Simulations
1) A voice-activated digital desk application for organizing photos (see Figure 4-1)
2) A mobile assistant to be used in an automobile (see Figure 4-2) 3) A collaborative document editing application 4) A technology-enhanced classroom with a digital whiteboard and a Personal
Digital Assistant (PDA) for the professor, as well as collaborative PDAs for the students
5) A handwriting recognition application 6) A dictation application 7) An application for creating animations 8) A voice-activated MP3 player
Chapter 4
37
Through these simulations we can quickly explore a variety of multimodal
interface commands. We can stick specifically to the application script or we can
allow the Wizards to improvise. With this flexibility, we have been able to capture
users’ preferred commands, the commands that should be added to the design, and
some of the required error handling behavior of the application. These simulations
also inform our design of computer support for interactive simulation of multimodal
applications, since we can see where paper techniques fail to be adequate for
multimodal simulation.
4.1 Multimodal Photo Album
For example, in the multimodal photo album application (see Figure 4-1), our
scripted design included the use of a simulated whiteboard computer, a simulated
table computer, and multimodal and unimodal commands for moving and arranging
photos. We tested this application with three participants recruited from the
computer science department. In our test, we gave the participants a brief
introduction to the simulation and the Wizard of Oz methodology that we would be
using. We provided the participant with a written set of instructions, which detailed
the task and available commands in the interface. The task was very basic for this
application: create a photo album of your vacation using the available photos.
Chapter 4
38
We had designed the application to behave like a photo album organizer with
multimodal commands. For example, a participant could point to a space on the
table computer and say “make a page.” This would create a photo album page.
They could then go to the whiteboard computer and point to photos and say “move
this to the album” to add photos to the album page on the table computer. Since we
were using Wizard of Oz simulation, the wizards had wide flexibility in the
commands that they would accept. For instance, “make a page” could be triggered
Figure 4-1. Multimodal Photo Album Room simulation.
Chapter 4
39
by “create a page,” “new page,” or any other phrase with similar meaning. Moving
the photo from whiteboard to table could be performed with a point-and-speak
command, like “move this photo,” or it could be performed with a gesture of
dragging the photo to a special area on the whiteboard computer.
In our tests, each participant was able to successfully complete the task.
Since the task was open-ended by its nature, the participants took different amounts
of time and used different commands to create pages and arrange the photos. A
given individual tended to choose a specific style, either unimodal or multimodal, for
most of the commands that they used. One of the participants tended to not use any
speech commands, and would only point and drag. Another participant most
commonly used point-and-speak whenever it was possible. The last participant had
elements of both styles.
We assigned two Wizards and one observer to simulate this application. One
Wizard was responsible for the table computer and one Wizard was responsible for
the whiteboard computer. When the participant would come to that computer, the
Wizard in front of it would respond to the participant’s commands based on the
script. Occasionally this involved the Wizard informing the user that a certain
speech command or other command was not possible. This was done in a consistent
manner based on the specification for feedback errors in the script. We developed
enough expertise in multimodal simulation involving speech and pointing commands
to recognize multimodal commands and react to them appropriately as a pair of
Chapter 4
40
Wizards. For this, Wizard rehearsal and training was important, because a
multimodal command might involve the cooperation of two Wizards.
In the multimodal photo album, we showed the ability to create and specify
multimodal simulations that immerse the end-users. We measured successful
immersion through interview questions after each simulation. Participants typically
said that they felt like they were “interacting with a computer system” and not
interacting with the Wizards involved in running the application.
4.2 Multimodal In-Car Navigation System
Based on the desire to simulate speech commands and feedback, we
developed a speech control application, called the Speech Command Center, which
Figure 4-2. User testing a Multimodal in-car navigation system (left) with a Wizard of Oz using the Speech Command Center (right).
Chapter 4
41
we provided to one of the Wizards in a scenario simulating speech-based in-car
navigation (see Figure 4-2). In the in-car navigation system, the participant sits in
the driver seat with a headset. The Wizard sits in the right seat with a computer that
has the Speech Command Center on it. As the participant makes requests via speech
commands the Wizard responds using the Speech Command Center. This Speech
Command Center is a Microsoft Excel spreadsheet that the designer pre-creates. The
spreadsheet contains links that trigger playing the various speech commands. It also
has the facility to record what the participant is saying and possibly use it in follow-
up commands. Running on a laptop, the Speech Command Center is a portable
Wizard of Oz tool that allows the Wizard to do a fairly sophisticated speech-based
simulation.
We tested the in-car navigation system with two participants. We had the
participants drive around the local area and use the system as they would normally
use their car stereo, if it had speech capabilities, listening to news reports or getting
other pieces of information. Both participants had no trouble using the system. The
Wizard needed to respond quickly to the participants’ command to make the
simulation seem realistic. This was done successfully for both participants, who felt
they were immersed in the application, rather than talking to the Wizard. Since the
voice commands were pre-recorded and played on the computer, the participants felt
like they were using a computer rather than talking to an individual.
Chapter 4
42
4.3 Design Implications for Multimodal Design Tools
Based on informal observation, we have noticed that the Wizards’ actions
become more predictable with a formal script, and end users sense that predictability.
In general, we have found that the most successful simulations are ones in which the
application is fully scripted rather than improvised. For a design tool, this means
that the designer can fully script the interface, rather than leaving many details
unfinished.
We have found that we can build a prototype for a given application in a few
hours, and refine that prototype quickly. Happily, a large fraction of the time is
spent actually thinking about and refining the design of the application rather than
working with physical materials to put together the simulation. A design tool should
not take significantly more time, otherwise the incentive to use the design tool
decreases in comparison to the paper-based simulation.
The Wizards had a complicated task in Multimodal Theater. They had
commands to memorize, devices to manage, and props to utilize. In a design tool,
the ability to use a computer to handle the response of commands alleviates the time
required to train the Wizards. A tool can perform a similar simulation more
consistently without the participation of the Wizards.
However, using Wizards was valuable for collecting a broad set of
commands that participants wanted to use in the simulation. The designer would not
Chapter 4
43
get that same breadth of commands in a design tool that relied on computer-based
recognition. Thus, Multimodal Theater could be used to get early ideas about
possible multimodal commands and behaviors, even if a recognition-based design
tool exists.
Chapter 5
44
5 Design Evolution of CrossWeaver
The study of multimodal interface designers (see Chapter 3) and experiments
with Multimodal Theater (see Chapter 4) has given us insight into the design process
for multimodal applications. This design process is similar in its steps to the design
process for graphical user interfaces (Landay 1996) or web interfaces (Newman, Lin
et al. 2003), involving sketching and storyboarding at the early stages. The visual
form of the multimodal designers’ storyboards is unique in its use of symbols and
annotations to represent different input modalities. We used this insight into the
Input modes
Output modes
Modifiers
Text:
Ctrl+P
“add pushpin”
“add pushpin here”
Tools
Pushpin
Definition of adding a pushpin
Zoom in
In put modes
Outpu t modes
Mo di fie rs
Text
:
Ctrl
+Z
“zoom
i
n”
“zoom
in
to
here”
ToolsZoom
De fining zoo min in
File Edit View
Connect
Points
This is the way from Point 1 to Point 2
Connect
Points
This is the wa y fr om Po int 1 to Po int 2
Connect
Points
This is the wa y fr om Po int 1 to Po int 2
Selec t
Point 2
Out put De vices
Ti tle
Selec t
Point 2
Out put De vices
Ti tle
Back ground
Of
Map
Outpu t Devices
T itle
Back ground
Of
Map
Outpu t Devices
T itle
Select
Point 1
Outp ut Device s
Ti tle
Background Of Map Select Point 1 Select Point 2 Connect Points
Connect Points
Paper Prototype Design Sketches Early Prototype
CrossWeaver Prototype Collaborative CrossWeaver Prototype Design Sketches
CrossWeaver
Input modes
Output modes
Modifiers
Text:
Ctrl+P
“add pushpin”
“add pushpin here”
Tools
Pushpin
Definition of adding a pushpin
Zoom in
In put modes
Outpu t modes
Mo di fie rs
Text
:
Ctrl
+Z
“zoom
i
n”
“zoom
in
to
here”
ToolsZoom
De fining zoo min in
File Edit View
Connect
Points
This is the way from Point 1 to Point 2
Connect
Points
This is the wa y fr om Po int 1 to Po int 2
Connect
Points
This is the wa y fr om Po int 1 to Po int 2
Selec t
Point 2
Out put De vices
Ti tle
Selec t
Point 2
Out put De vices
Ti tle
Back ground
Of
Map
Outpu t Devices
T itle
Back ground
Of
Map
Outpu t Devices
T itle
Select
Point 1
Outp ut Device s
Ti tle
Background Of Map Select Point 1 Select Point 2 Connect Points
Connect Points
Paper Prototype Design Sketches Early Prototype
CrossWeaver Prototype Collaborative CrossWeaver Prototype Design Sketches
CrossWeaver
Figure 5-1. Design evolution of CrossWeaver.
Chapter 5
45
sketching representation of multimodal storyboards and used it in our design of
CrossWeaver.
The stages in the design evolution of CrossWeaver are shown in Figure 5-1.
Each of the stages grappled with the issue of representation of input modalities and
output modalities. Each stage also tried to further develop the storyboarding
paradigm in CrossWeaver. The key challenge with the storyboard visual form
involves incorporating the input and output representations in a space-efficient way.
A multimodal storyboard that has many scenes and input transitions can get cluttered
Input modes
Output modes
Modifiers
Text:
Ctrl+P
“add pushpin”
“add pushpin here”
Tools
Pushpin
Definition of adding a pushpin
Zoom in
In put m od es
Ou tp ut modes
Modi fier s
Text:
Ctrl
+Z
“z
oom
i
n”
“zoom
in
t
o
her
e”
To olsZoom
Defi nin g z oomi n in
Figure 5-2. Early design sketch for CrossWeaver’s interface for creating representations of multimodal input.
Chapter 5
46
given the variety of input and output modes that need to be labeled.
The paper prototype and early design sketches introduced an iconic
representation for input and output modalities (see Figure 5-2). The figure shows
icons that represent mouse, keyboard, pen gesture, speech, and other inputs on the
left side of the screen. The column to the right of that shows output modalities, such
as output to a screen, audio speaker or printer. Both screen and audio output were
Input modes
Output modes
Modifiers
Text:
“Calculate distance” 30 miles
“30 miles”
Defining Calculate Distance
“Calculate distance”
30 miles
“30 miles”Tools
Blah blah
Find Distance
Blah blah
Zoom in
Input modes
Out put mod es
Modif ier s
Text
:
Ctrl+Z
“z
oom
i
n”
“
zoom
in
t
o
her
e”
ToolsZoom
Defin ing zoomin in
Add push pin
Input modes
Output modes
Modifiers
Text:
Ctrl+P
“add
pushpin”
“add
pushpin
h
ToolsPushpin
Definition of adding a pushpin
Select pushpin
Input modes
Output modes
Modifiers
Text:
Ctrl+S
“select
pushpin”
“select
this
pushpin”
ToolsPushpin
Definition of selecting a pushpin
Select pushpin
Input modes
Output modes
Modifiers
Text:
Ctrl+S
“select
pushpin”
“select
this
pushpin”
ToolsPushpin
Definition of selecting a pushpin
Select pushpin
Input modes
Output modes
Modifiers
Text:
Ctrl+S
“select
pushpin”
“select
this
ToolsPushpin
Definition of selecting a pushpin
Figure 5-3. Early design sketch for CrossWeaver’s interface showing a potential way of representing “Operations”.
Chapter 5
47
retained through all of the design iterations of CrossWeaver. The screen concept
expanded to multidevice output in later designs.
The storyboard in Figure 5-2 shows a scene-by-scene storyboard layout that
uses annotated arrows with input modalities to connect the scenes. This layout
assumes an “infinite sheet” metaphor so that storyboard panes can exist anywhere in
the screen area and can be linked to any other screen.
To enable more sophisticated functionality in a limited space, the design
sketches also incorporate the concept of grouping scenes in the storyboard together
Input modes
Output modes
Modifiers
Text:
Zoom in
In put modes
Output modes
Mod if ier s
Text:
Ctrl
+Z
“z
oom
in”
“
zoom
in
to
here”
ToolsZoom
Defini ng zo omin in
Add push pin
Input modes
Output modes
Modifiers
Text:
Ctrl+P
“add
pushpin”
“add
pushpin
h
ToolsPushpin
Definitio n of a dding a pushpin
Select pushpin
Input modes
Output modes
Modifiers
Text:
Ctrl+S
“select
pushpin”
“select
this
p
ToolsPushpin
Definition of selecting a p ushpin
Calculate distance
I np ut
m od e s
O ut p ut
mo de s
Mod if i er s
Text:
“C alc ul ate
dist a nce” 30 miles
“30 mi les”
De fi ning Cal culate Distance
“Calcu l at e dist an ce”
30 miles
“30 mi les”To olsBlah bl ahFind Di stanceBlah bl ah
Zoo m in
Input modesOutput modesModifiersText:Ctrl+Z“zoomin”“zoomintohere”ToolsZoomDefining zoomin in Ad d push pin
I npu t mod esO utput modesM odi fier s
Text:
Ctrl+
P
“add
pushp
in”
“
add
pu
shpin
here”
T ool sPu shp in
De fini tio n o f a ddi ng a pus hp in
Select p ushp in
In put mode s
Ou tp ut mode sMo dif ier s
Text:
Ctrl+S
“sel
ect
pus
hpin”
“s
elect
this
p
ushpin
”
Too lsPus hpi n
De finitio n o f s electin g a p ush pi n
Sel ect pushp in
I npu t
mod es
O utp ut
m od esM od ifie rs
Tex
t:
Ctr
l+S
“s
elect
pushpi
n”
“selec
t
this
pushp
in”
ToolsPushp in
D efin iti on of sel ect ing a pus hpin
Se le ct pu shpi n
Inp ut mo de s
Ou tpu t mo de sMo difi ers
Text:
Ctrl+
S
“se
lect
p
ushpin
”
“
select
this
pushpi
n”
T oo lsPu sh pin
Def ini tion of se lec tin g a pu sh pin
SELECTION SCOPINGAFTER SELECTION TOOLPLACED ON CANVAS AREA
Figure 5-4. Early design sketch for CrossWeaver’s interface showing the representation of a “Selection Region” in which a command will be active.
Chapter 5
48
into operations, shown as a thumbnail at the bottom of Figure 5-2 for the operation
“zoom in” and shown in detail in Figure 5-3 for the operation “calculate distance.”
Reusable operations allow the incorporation of higher-level concepts into the
application, such as zooming, panning, or adding objects. The operations concept
was a primary thrust of the first interactive prototype, which is described in Chapter
6.
In Figure 5-4, dashed lines represent areas in which a gesture command will
be active, which we call selection regions. In a map, a selection region could be the
Figure 5-5. The first interactive prototype of CrossWeaver.
Chapter 5
49
region of a country so that an operation would not be active in other countries or in
ocean regions.
Each of the concepts in these early sketches has been retained in some
fashion throughout the later designs of CrossWeaver. The first interactive prototype
of CrossWeaver was built soon after these sketches were made (see Figure 5-5). The
functions of this early prototype and an informal evaluation of it are described in
Chapter 6 of this dissertation. This prototype included built-in operations that
enabled the design of map and drawing applications.
The first interactive prototype of CrossWeaver targeted two types of devices,
PC’s and PDA’s, but there were no operations to support collaborative applications.
To support collaborative applications, we built the Collaborative CrossWeaver
Prototype, which added collaboration functionality in the form of icon stamps that
had semantic meaning (see Figure 5-6). Iconic stamps on scenes represented the
designation of slides, the definition of a broadcast area for those slides, shared ink,
and a collaborative survey tool. Running a storyboard built with these stamps would
cause the slides to be displayed in the broadcast area on the screen browser, which
would be running on a projected whiteboard computer. The slides would also be
displayed in a miniature version on the standalone PDA’s used by the participants in
the meeting or classroom (see Figure 5-7).
Chapter 5
50
The design in Figure 5-6 includes a set of slides (Global Slides) and a
multimodal command (Global Transition) representing the way for a user to move
from slide to slide. The Screen Layout is a scene that is divided into three different
areas: a broadcast area, a shared ink area, and a survey. These areas are labeled by
placing the corresponding stamp in the desired area. Upon running the application,
the screen layout appears on the Screen Browser, shown in Figure 5-7. The slides
show up as miniatures on the PDA Browsers that are participating in the test, also
shown in Figure 5-7. Each PDA Browser has a shared ink area, which accepts ink
strokes. The screen browser has a shared ink area, which displays the aggregate ink
from all of the participating devices. Each PDA browser also has a survey area
slide broadcast sharedink
survey
Globalslides
Globaltransition
Screenlayout
slide broadcast sharedink
survey
Globalslides
Globaltransition
Screenlayout
Figure 5-6. The design mode in the Collaborative CrossWeaver Prototype.
Chapter 5
51
which accepts numbers as inputs. The screen has an area displaying a dynamic bar
graph of the aggregate responses that were entered in the PDA survey area.
Even though the functions that we built into this interactive prototype are
appropriate for a variety of collaborative applications, a test of this system showed
that it was very complex to run. We tested this system informally during a
presentation in a course focused on Computer Supported Cooperative Work (CSCW)
(Landay 2001). The design of the interface was very similar to the one shown in
Figure 5-6. The slides were originally created in Microsoft PowerPoint and then
imported into Collaborative CrossWeaver, which was a multi-step process since
Collaborative CrossWeaver could only import images. The multimodal transition
was not particularly useful in this scenario, since we had ready access to the
keyboard and needed to use the mouse for pen gesture input. We did use the voice
commands to transition among slides, but voice commands were unreliable due to
Screen BrowserPDA Browser(per device)Screen BrowserPDA Browser(per device)
Figure 5-7. The test mode browsers in the Collaborative CrossWeaver Prototype.
Chapter 5
52
ambient noise and the correspondingly poor performance of the speech recognizer.
The PDA Browsers were not particularly attractive to the participants in the test.
They commented that the miniature version of the slide was not necessary since it
was displayed in full form in the Screen Browser. Furthermore, the shared ink area
and survey area quickly lost their novelty among the participants.
The Collaborative CrossWeaver prototype indicated that our design and
implementation were not adequate and also that adding too much domain
functionality into this prototyping tool might be counter-productive. Collaborative
File Edit View
Connect
Points
This is the way from Point 1 to Point 2
Connect
Points
This is the way from Point 1 to Point 2
Connect
Points
This is the way from Point 1 to Point 2
Select
Point 2
Output Devices
Title
Select
Point 2
Output Devices
Title
Background
O f
Map
Output Devices
Title
Background
O f
Map
Output Devices
Title
Select
Point 1
Output Devices
Ti tle
Background Of Map Select Point 1 Select Point 2 Connect Points
Connect Points
Figure 5-8. A design sketch of CrossWeaver before the final implementation.
Chapter 5
53
CrossWeaver increased the complexity beyond what would be comfortable for a
non-programmer designer, the target user of this tool.
After implementing Collaborative CrossWeaver, we drew a new set of design
sketches for the final implementation of the tool (see Figure 5-8 for one of those
sketches). Based on feedback from the initial implementation, we made the
storyboard scheme linear and changed the representation of the input and output
modes. In these sketches, operations are represented as a short sequence in the
storyboard instead of a thumbnail. The final implementation of CrossWeaver, which
we more fully describe in Chapter 7, does not include built-in collaborative
Figure 5-9. The Design Mode of CrossWeaver in the final implementation.
Chapter 5
54
functions, due to the complexity of their use that we encountered during testing.
However, interfaces for collaborative applications can be simulated using storyboard
scenarios in the final CrossWeaver tool.
By design, the final CrossWeaver tool is primarily a storyboarding tool. The
Design Mode (see Figure 5-9) is focused on storyboard creation and management,
where links among storyboard scenes are annotated with multimodal input
transitions. Output devices, including audio output, can also be added to each scene.
The Test Mode browser (see Figure 5-10) can run on multiple devices in parallel.
And Analysis Mode (see Figure 5-11) captures the display and interaction across all
of the devices participating in an application for later data analysis. The full details
of this final prototype are given in Chapter 7.
Figure 5-10. The Test Mode of CrossWeaver in the final implementation.
Chapter 5
55
The design evolution of CrossWeaver eventually converged on a prototype
that solves the issues of input and output modality representation, keeps the
storyboard model understandable and simple, and gives the designer flexibility in the
types of interfaces that can be created.
Figure 5-11. The Analysis Mode of CrossWeaver in the final implementation.
Chapter 6
56
6 First Interactive Prototype of CrossWeaver
This chapter describes the first interactive prototype of CrossWeaver, which
was the first substantial implementation performed in the design process. This
prototype was ultimately tested with users, and the design ideas as well as the
feedback gained from those end users heavily influenced the final form of
CrossWeaver described in Chapter 7.
In the first interactive prototype of CrossWeaver, a user interface design
consists of several scenes, sketched out by the designer (see Figure 6-1). The scenes
Figure 6-1. The first prototype of CrossWeaver shows two scenes in a multimodal application and a transition, representing various input modes, between them. Transitions are allowed to occur in this design when the user types ‘n’, writes ‘n’, or says ‘next’.
Chapter 6
57
show the important visual changes in the user interface that occur in response to end-
user input. The scenes can also incorporate other output modalities, such as speech.
Transitions between scenes, caused by end-user input, are represented by arrows
joining icons representing the input modes that the scene handles.
Sequences of scenes and transitions form a multimodal storyboard, which can
be tested with an associated, standalone multimodal browser (see Figure 6-2). The
multimodal browser displays scenes as output using visual displays and audio output.
It is connected to user interface agents using the Open Agent Architecture (Moran,
Cheyer et al. 1998), which enables the browser to respond to pen gestures, speech
commands, keyboard input, or any other input mechanism for which there is an
associated input agent. The multimodal browser runs on multiple platforms,
including Windows CE handhelds (e.g., the Compaq iPaq Pocket PC) and
whiteboard computers (e.g., the SmartBoard).
(a) (b) (c)
Figure 6-2. A participant executes the application in Figure 6-1 in the multimodal browser. The browser shows the starting scene (a); the participant then draws the gesture “n” on the scene (b); the browser then transitions to the next pane in the multimodal storyboard shown in (c).
Chapter 6
58
With this architecture, we can also optionally substitute humans as Wizard of
Oz recognition agents (Kelley 1984). For instance, we have consoles for the Wizard
to simulate speech and pen recognition from any networked computer. Wizard of Oz
would be an appropriate testing method if a computer recognizer is not convenient to
use, such as in the first attempt at testing, or if a computer recognizer is not available.
A Wizard serves only as a recognizer, and does not participate in deciding the flow
of the scenes executed when browsing.
6.1 Defining Interaction in CrossWeaver
The interaction in CrossWeaver is meant to encourage experimentation with
different user interface ideas. In CrossWeaver, designers can quickly change the
gestures, keystrokes, or speech commands used in their application design using a
visual interface; they do not have to modify formal grammars or code to experiment
with different interaction ideas.
Unlike many traditional visual programming approaches, most of which
require formal specification or complex state transition diagrams (Burnett and
McIntyre 1995), CrossWeaver enables the designer to use a more intuitive visual
form based on drawing example sketches, a paradigm we call defining operations.
This is especially helpful when the designer defines reusable operations, described
later in this chapter.
Chapter 6
59
6.1.1 Defining Scenes
Scenes are sketched using a mouse or a pen tablet. In this early stage of
prototyping, pen tablet-based sketches encourage fluid interaction and keep the
representations in an informal form (Gross and Do 1996; Landay and Myers 2001).
The informal form of the design ensures that designers and end-users focus on the
interaction and not on the fit-and-finish (Wong 1992).
CrossWeaver also allows import of multiple types of media into each scene.
For instance, images can be imported into a scene using the camera icon from the
Figure 6-3. The first CrossWeaver prototype shows thumbnails representing reusable operations (top), scenes that have imported images (left and right), a scene that targets screen and audio output (left) and a scene that targets PDA output (right).
Chapter 6
60
tool panel (see Figure 6-3 top left). Sequences of images can be imported using the
movie icon to create a rough flip-book style movie, keeping with the informal, fluid
nature of the tool. Both of these import tools can directly utilize physical capture
devices, such as digital cameras and scanners. In the two scenes shown in Figure
6-3, we have imported images of a map.
In CrossWeaver, scenes can specifically target different output devices types,
which are shown in Figure 6-4. In Figure 6-3, we have attached audio output, the
phrase “Welcome to Minnesota,” to the first scene. When the audio icon is stamped
on the scene, a text box appears below the scene representing text to be played via a
text-to-speech agent simultaneously with the visual output. The second screen
targets a PDA represented by the PDA icon at the bottom of the screen. That scene
will be visually displayed in the PDA version of the multimodal browser. This list of
output modes is also extensible; it could be expanded to printer, pager, or phone
screen output.
Figure 6-4. The available output devices for scenes: screen output (default), audio, PDA, and printer (future work). These icons are dragged onto scenes to specify cross device output.
Chapter 6
61
6.1.2 Defining Transitions
The various input modes available in this first prototype are shown in the
extensible input tool panel at the bottom of the CrossWeaver design screen (see
Figure 6-5). If new input modes become available, they can be added to this area.
For instance, one could imagine adding camera/vision input.
When used to define interaction, each input mode has an associated modifier,
representing the value needed for that particular input mode to cause a scene
transition. The example in Figure 6-6(a) shows how we have stamped three possible
input modes, each with different modifiers, to define the possible input transitions.
The multimodal browser responds to input of these various types and matches them
with the multimodal storyboard to determine what scene to present next.
By specifying transitions in this way, it is possible to accept input modes that
are independent or fused (Nigay and Coutaz 1993). If the inputs are fused (see
Figure 6-6b), CrossWeaver uses a simple slotting system to wait for recognized
inputs and joins them if the designated inputs arrive within a certain pre-specified
Figure 6-5. The available input modes for transitions in the first CrossWeaver prototype: mouse gesture, keyboard, pen gesture, speech input, and phone keypad input. These are dragged onto transition areas to specify multimodal interaction.
Chapter 6
62
time period; for testing purposes we used two seconds. In the future, we may allow
the designer to set the time interval and other parameters related to fusion, though at
this early stage in the design, designers might not be interested in setting these
parameters. Ultimately, taking advantage of fused input modes enhances recognition
performance using mutual disambiguation (Oviatt 1999).
6.1.3 Defining Operations
Specifying scene-to-scene transitions for each and every interaction would
quickly lead to visual spaghetti. This is a common problem with visual languages
(Chang, Ichikawa et al. 1986; Chang 1987). Not only is complete transition
(a) (b)(a) (b)
Figure 6-6. (a) The transition specifies a keyboard press ‘n’ to move to the next scene or a gesture ‘n’ via pen input or a speech command ‘next’. (b) With the two bottom elements grouped together, the transition represents either a keyboard press ‘n’ by itself or the pen gesture ‘n’ and the speech command ‘next’ together synergistically.
Chapter 6
63
specification not practical, but as previously described, transitions do not allow
various commands to happen at any time, in an unspecified order.
To address the scalability problem and to enable parallel commands,
CrossWeaver supports designer-defined operations, essentially storyboards with
specific meaning that can execute at any time, built from a set of existing operation
primitives (see Figure 6-7). For example, CrossWeaver allows the designer to create
a reusable pushpin that can be added anywhere in the scene (see Figure 6-8). A full
CrossWeaver design contains a set of operations as thumbnailed storyboard
Figure 6-7. The designer designates a storyboard sequence as a specific operation by stamping it with the appropriate operation primitive icon. There are six basic primitives that can be used, from top to bottom: defining adding an object, defining deletion, defining a specific color change, defining a view change (zoom in, zoom out, rotate, or translation), defining an animation path, or defining a two point selection (as in a calculate distance command).
Chapter 6
64
sequences, shown at the top of Figure 6-3. These behaviors are available to the end-
user at any time when executing in the multimodal browser.
In the first interactive prototype, the designer can implement operations using
a small category of flexible operation primitives most appropriate for map and
drawing style applications: adding objects, deleting objects, changing colors of
objects, changing the view of a scene, and specifying an animation path (see Figure
6-7). CrossWeaver knows the meanings, or semantics, of these built-in primitives.
Figure 6-8 illustrates the designer designating a pushpin as a reusable
component by drawing it on a blank screen and stamping the “+/add object” icon on
Figure 6-8. The designer designates a pushpin as a reusable component by creating a storyboard scene and dragging the “+/add object” icon on to it. A pushpin can then be added in any scene by selecting a location and using any of the input modes specified (e.g., pressing ‘p’ on keyboard or saying ‘pin’ or drawing the ‘p’ gesture).
Chapter 6
65
the scene. The key press ‘p’ or speech command ‘pin,’ or proper pen gesture will
trigger the addition of a pushpin in the current multimodal browser screen at the
currently selected point (see Figure 6-11).
In Figure 6-9, we have used a specific example of coloring a shape to define
the color blue operation, triggered by clicking on the ‘b’ key or by speaking ‘make
blue’. By attaching the “define color” primitive to the scene we know that this
particular sequence represents the globally available blue coloring operation in the
designer’s application. Figure 6-12 shows the blue coloring operation in action.
Figure 6-9. The designer designates coloring blue as a reusable operation by creating a storyboard scene and dragging the “define color” icon onto it. A selected object in a scene can be colored blue in any scene by any of the input modes specified (e.g., clicking on the object and pressing ‘b’ on the keyboard or saying ‘make blue’).
Chapter 6
66
In Figure 6-10, we have defined two additional operations. The top operation,
composed of before and after scenes and a transition between them, gives the
example of zooming in on an object. The bottom operation gives the corresponding
example for zooming out. After the “view change” operation icon is stamped on the
scenes, the system infers with a simple algorithm that these sequences become zoom
in and zoom out operations in the multimodal browser, triggered by the designated
input modes. The rectangles used in these scenes are strictly examples; they could be
any example shapes that the designer draws. The semantic meaning, zooming of the
scene, is carried out by the system during browsing. Additional view changes that
Figure 6-10. The designer defines zooming in and out as separate operations by drawing examples of growing and shrinking with any example shape and stamping the view operation icon onto them. These operations are triggered in the browser by the input operations in the transition between them (e.g., pressing ‘z’ on the keyboard, gesturing ‘z’ with the pen, or saying ‘zoom in’ and pressing ‘o’ on the keyboard, gesturing ‘o’, or saying ‘zoom out’).
Chapter 6
67
can be defined by illustrative examples are scene translation and rotation.
Allowing the designer to define addition and deletion of objects, coloring,
view changes and other operations enables him or her to prototype and test a
multimodal application with map or drawing functionality using CrossWeaver. We
believe other application domains can be similarly divided into primitive operations.
We saw evidence of this in our field studies in which the designers would show us
short sequences of sketches which represented the basic operations in the interface
that they were designing.
6.2 Matching the Designers’ Mental Models
CrossWeaver’s style of sketching maps well to the mental models of the user
interface designers that we interviewed (see Chapter 3). They often add annotations
to their sketches to give a semantic meaning to the sketch. For instance, game
designers draw a character, meaning for the character to be a reusable object. They
Figure 6-11. The user can click anywhere and say “add pin” to add the reusable pushpin component, as defined in Figure 6-8, at the clicked point.
Chapter 6
68
might even draw three or four different styles of that character and experiment with
using each of them in a background scene. In CrossWeaver, the equivalent
annotation for designating a drawing as a reusable component is using the “+/add
object” operator.
Designers also make sketches to represent example operations in the
applications that they are designing. In envisioning complex ideas, we have found
that some designers sketch a sequence representing an operation and then say the
sequence applies to many scenes. For instance, for defining “calculate distance,” a
designer would draw the selection of two locations, show their connection, and
display the distance between them. Then the designer would point out that the
operation applies to any two locations. This was the motivation for CrossWeaver’s
operators.
Figure 6-12. The user triggers the make blue color operation, as defined in Figure 6-9, by selecting any object and saying “make blue”.
Chapter 6
69
6.3 Informal Evaluation of the First Interactive Prototype
To gauge the understandability of operations and the usability of the first
interactive prototype, we evaluated CrossWeaver in an informal test with 10 people,
five advanced HCI computer science graduate students with significant programming
ability and five advanced university students without significant programming
ability.
Each participant was placed in front of a desktop computer (1 GHz Pentium
4, 512 MB RAM) with a Wacom pen tablet, a mouse, and a regular keyboard to use
for the CrossWeaver input. We gave each of the participants a brief verbal tutorial in
CrossWeaver’s operation and showed them screen shots of different steps in building
a map-based multimodal application, much like the screen shots in this dissertation.
We then asked them to build their own multimodal map editing application in 30
minutes and were available for questions as they were using the system. The
screenshots that we showed to them specified a generic map and the adding pushpins
operation. We asked the participants to pick their own domain, such as trip planning,
finding directions, or browsing a map. For the purposes of this test, we simulated
computer speech recognition using a Wizard of Oz application.
All of the participants, with programming experience or not, said that they
understood the CrossWeaver paradigm of stamping sketches to turn them into
operations within five minutes of starting to use the tool. Each participant was able
Chapter 6
70
to successfully create three or more reusable operations and test them out in the
multimodal browser. These operations included defining “add mountain,” “color
yellow,” “delete,” “rotate 45 degrees,” and “zoom in.”
Different participants supported different styles of multimodal interaction in
their application. One participant exclusively defined speech commands saying that
if speech was available, she did not see why anyone would want to use the keyboard.
In contrast, another participant defined interaction using only using keyboard
commands. One participant was immediately attracted to speech output phrases for
some of the scenes that he created. He also attached the PDA stamp to some of the
output scenes, but in this test, we did not specifically test the PDA platform.
When asked, the five participants without significant programming
experience all commented that they did not feel that using CrossWeaver was
anything like their concept of programming; they did not need to learn complex
syntax or decipher cryptic text. Some mentioned that using CrossWeaver felt like
drawing, others mentioned that it felt like storyboarding, others said that it was like
making a comic strip.
The participants with programming experience, in contrast, mentioned that
defining operations reminded them somewhat of programming. Two of these
participants felt a bit frustrated about not being able to build certain functions, such
as filtering, that they wanted to include in their designs. One participant asked if
operations were extensible. Another asked about scoping the various defined
Chapter 6
71
operations to specific scenes. These concerns appeared to come out of experienced
programmers’ mental models. One programmer, however, specifically mentioned
that, versus programming, the visual ideas in the CrossWeaver interface were very
easy to grasp.
When asked to rate the understandability of the Programming by Illustration
paradigm on a 0 (not understandable) to 10 (simplest possible) scale, the participants
without programming experience rated the paradigm easier to understand
(Mean=8.4, Variance=1.3) than those with programming experience (Mean=6.8,
Variance=2.2). This difference is statistically significant t(8)=1.91, p<0.046,
suggesting that those without programming experience will likely react more
favorably to CrossWeaver than experienced programmers who might choose to
implement these interfaces with formal programming tools. This is a positive result,
since CrossWeaver is targeted at non-programmer designers.
There was no statistically significant difference in between programmers’ and
non-programmers’ ratings of the overall usability of CrossWeaver. Four participants
mentioned that some of their interaction problems were due to inexperience with the
Wacom tablet. For some participants, there were interruptions caused by a problem
in the agent infrastructure. Many participants said that with greater time and use of
CrossWeaver, they would be able to create a more sophisticated application.
The participants mentioned that the tutorial was essential for their
understanding of the operations in CrossWeaver. That they were able to learn from a
Chapter 6
72
single example, however, shows that the operations concept is straightforward. In
Chapter 8, we describe an evaluation of the final implementation of CrossWeaver
with professional designers.
6.4 Design Implications
From the user test of the first interactive prototype, we learned that a stylus-
only interface did not work particularly well. Many of the critical incidents were
related to poor resolution of the stylus on the screen and the difficulty in selecting
menu items. One of the users even asked for the mouse to avoid using the stylus. In
the final CrossWeaver implementation, we chose to target a system that could use
mouse or stylus equally well and one that also used the keyboard.
Many users also commented that they thought the interface had unwieldy
management of scenes and arrows. They found zooming unnecessary and difficult.
This led us to re-think the scheme for the CrossWeaver storyboard. In the final
version, we chose to make the storyboard linear, to emphasize scene-by-scene
drawing, and to make CrossWeaver more focused on storyboarding.
Users also wanted a more functional “copy and paste” in the final version.
They also wanted to use the import of outside drawings. We added these features
and others as general usability enhancements in the final CrossWeaver design.
This interactive prototype is not suited to widget or web-based user
interfaces. Since our focus is multimodal interface design and other tools are better
Chapter 6
73
suited for informal prototyping of GUI (Landay and Myers 1995) and web interfaces
(Lin, Newman et al. 2000), we chose not to include any concept of widget in the
final implementation. The concept of hotspots did seem like a useful addition, so
that CrossWeaver could simulate the addition of multimodal functionality to specific
regions, and we chose to add that in the final interface.
Multimodal user interfaces might in the future be most beneficial for use on
small devices, which is why the initial prototype of CrossWeaver included multi-
device support. However, the initial support of two types of devices was lacking in
that it did not allow the designer to differentiate among specific devices. In the final
implementation, we have enhanced multiple device support to target specific
devices.
The first implementation of CrossWeaver was suited for basic map and
drawing-style applications, those that start with a background canvas scene and then
add, delete, move, and color objects, and change views. This style of application has
been explored in the multimodal user interface community since Bolt’s Put That
There (Bolt 1980). Though it does not support the full capabilities of many of these
pioneering and existing multimodal map and drawing-based systems (Oviatt 1996;
Moran, Cheyer et al. 1998), this first prototype was novel in its ability to allow the
designer to quickly change the multimodal input interaction used to control the
designed application. This domain proved interesting and rich, and the final
implementation of CrossWeaver retains its focus on that domain.
Chapter 7
74
7 CrossWeaver’s Final Implementation
Our original study of multimodal, multidevice user interface design practice
uncovered the lack of processes and tools for experimenting with multimodal,
multidevice interfaces (Sinha and Landay 2002). We also learned how traditional
multimodal platforms have required significant programming expertise to deal with
natural input recognition (Oviatt, Cohen et al. 2000). Our design goal since the start
of our research has been to remove programming as a requirement for multimodal
interface design through the use of visual prototyping of simple multimodal
interfaces (Sinha and Landay 2001).
This final version of CrossWeaver builds on our past experiments with
multimodal storyboarding, introducing a linear, example-based storyboard style that
also enables prototyping of multidevice interfaces. Designers can now create
multimodal and multidevice interfaces, which can use unimodal natural input or
multimodal input and can span multiple devices simultaneously. This allows
interaction designers to conceptualize user interface scenarios that they were
previously unable to create on the computer.
Our final version of the tool also formalizes the informal prototyping process,
supporting distinct design, test, and analysis phases, as in SUEDE (Klemmer, Sinha
et al. 2001; Sinha, Klemmer et al. 2002). What follows in this chapter is a
description of the designer’s process using our tool in each of those phases. In
Chapter 7
75
Chapter 8, we describe a user study of this version of the tool with professional
interaction designers.
7.1 CrossWeaver Final Implementation Definitions
In the design phase (Section 7.2), the designer creates the storyboard, the
artifact that describes the prototype, which includes sketches and input
specifications. In the test phase (Section 7.3), the designer can execute the
Figure 7-1. The CrossWeaver design mode’s left pane contains the storyboard, which is made up of scenes and input transitions. The right pane contains the drawing area for the currently selected scene.
Chapter 7
76
prototype. Execution involves running the prototype with end-users, collecting both
quantitative and qualitative data about how the interface performed. The analysis
phase (Section 7.4) occurs after a test. The analysis data contains a log of all of the
user interaction – the scenes that were displayed across all participating devices and
the user input that was received. To aid analysis, CrossWeaver allows the designer
to review the user test data and replay a user test later, across all of the participating
devices.
CrossWeaver uses sketches that are displayed in an unrecognized and un-
beautified form. Rough sketches have been suggested as a better way to illicit
feedback about interaction in the early stages of design rather than obtaining
comments about fit-and-finish (Wagner 1990; Wong 1992; Landay and Myers 1995).
Comments about interaction and high level structure are what designers seek in the
first stages of design.
7.2 Design Mode
The CrossWeaver design mode is shown in Figure 7-1. The design being
created is the Bay Area Map, an example inspired by the various multimodal map
applications that have been developed over time (Oviatt 1996). This version shows a
simple hand-drawn map representing the San Francisco Bay Area. Different
multimodal transitions allow users to navigate to different locations on the map. The
scenes that show the resulting directions can be displayed on different devices.
Chapter 7
77
Definitions
A storyboard is a collection of scenes and transitions. A scene is a pane in
the storyboard. An input transition, or simply, transition, is the input mode and
parameter value that triggers switching from one scene to another. An output target
or output device is the label of the device onto which the scene will be shown,
typically labeled by an identification number. In the main design screen, the
storyboard is linear, though it also supports branching. Transitions connect the first
scene to scenes much lower in the storyboard in Figure 7-1. The linear form
encourages the designer to think in terms of short, step-by-step examples, which we
have found in our studies to be quite common in the design process (see Chapter 3),
while branching gives the flexibility to connect different scenarios.
Drawing Area
The right pane in Design Mode is the drawing area (see Figure 7-1). In
“Draw” mode, the designer can use a pen-based input device or a mouse to add
strokes in the drawing area. The toolbar palette has buttons that can change the
stroke color, fill color, and line width, represented in the fourth, fifth, and sixth
buttons, respectively. In “Gesture” mode, strokes can be selected, moved, and
erased. Circling a stroke selects it and scribbling on top of a stroke deletes it.
The left pane contains the storyboard. The currently highlighted scene is also
shown in the drawing area. The yellow transitions to the right of each scene show
Chapter 7
78
the possible inputs that go from scene to scene when testing the prototype (see
Figure 7-2c).
Input Transitions
Each transition can specify four different input modes. The top input mode is
mouse click. Below that is keyboard press. The next after that is pen gesture. And
the bottom input mode is speech input. Beneath each icon is an input area to specify
the recognition parameter. For instance, the first transition in Figure 7-2c specifies
‘n’ as a keyboard input as well as a pen gesture input. (For the purposes of our
examples, we are using a letter recognizer and all gesture recognition parameters are
letters.)
d
a
b c
d
a
b c
Figure 7-2. A scene in the storyboard contains (a) a thumbnail of the drawing, (b) device targets and text to speech audio output, (c) input transitions showing the natural inputs necessary to move from scene to scene, including mouse click, keyboard gesture, pen gesture, and speech input, and (d) a number identifying the scene and a title.
Chapter 7
79
By default, the set of inputs specified in a vertical transition represents a
logical “or”. If one of the vertical transitions matches, then the arrow connected at
the bottom of the transition will be followed. In Figure 7-2, this means a pen gesture
‘n’ or a keyboard press ‘n’ will lead to a transition from the scene labeled 2. Also, a
multimodal command of pen gesture ‘w’ and spoken input ‘west’ will lead to a
transition from scene labeled 2. The arrow and number at the top of the transition
specify the next scene to show if the input is matched. For example, in Figure 7-2,
each transition will go to scene 3.
The second vertical transition is a multimodal command, specified by ‘mm’
for multimodal in the mouse input area. It specifies that the gesture ‘w’ and the
spoken command ‘west’ must both occur within two seconds of each other for the
transition to proceed. None of the designers we interviewed mentioned a need to
experiment with strategies for processing fused natural input commands. Thus, we
have only incorporated a simple strategy for multimodal input fusion in the tool. A
future version of the tool could easily add strategies for specifying fused input
(Oviatt 1999).
Input Regions
If a designer wants to specify that an input command can only occur in a
certain region of a scene, he switches to the Region tool shown in the toolbar of
Figure 7-1. With the region tool, he can draw on areas of the scene. As shown in
Figure 7-3, the region tool specifies input regions, dashed green areas in the pane, in
Chapter 7
80
which linked gesture commands must happen to trigger the transitions. This is
analogous to the concept of a web hotspot -- a multimodal command must happen in
a specific region for it to be interpreted.
Processing Recognition
We use an agent-based architecture for processing recognition input (Moran,
Cheyer et al. 1998). In our examples, pen gesture recognition is done by a
commercial letter recognizer (Paragraph 1999) or optionally by Wizard of Oz. A
different recognizer could be used to return named gestures such as ‘up’, ‘down’,
‘copy’, or ‘paste.’
In our example, the speech command ‘down’ might be an individual speech
command or it might be a keyword returned by a keyword spotting speech
recognition agent. We presently use a speech recognition agent written using
Microsoft’s Speech Recognition Engine (Microsoft 2003c).
Figure 7-3. Here we specify an input region, a dashed green area in the pane (the circles), in which linked gesture commands must happen to follow the transitions.
Chapter 7
81
Output Devices
The output devices for each scene are specified in the bottom panel
underneath the thumbnail (see Figure 7-2b). The number next to the screen icon
represents the PC screen identifiers that would show this scene (i.e., PC Device #0).
The PDA icon is next to the PDA identifiers (i.e., PDA #0, PDA #1). A “-1” for
device number means no devices of that type are targetted The sound icon is next to
the text-to-speech audio that is played when this scene is shown.
Each of the devices specified needs to be running a standalone version of the
test browser, which is further described with Test Mode in Section 7.3. The audio is
Figure 7-4. CrossWeaver’s comic strip view shows the storyboard in rows. Arrows can be drawn (as shown) or can be turned off.
Chapter 7
82
played when the scene is shown via a text-to-speech agent. In Figure 7-2, this
specific scene is broadcast to all three devices at the same time. With changes in the
device identifiers, it is also possible to specify output to only a single device.
Arrows and Comic Strip View
The green arrows in the storyboard show connections between scenes and
transitions. In Figure 7-1, we see that the different scenes are laid out linearly, one
on top of another in the storyboard view. Branching is fully supported by arrows
that connect to non-adjacent scenes as in Figure 7-1. An alternative view of the
storyboard, called the “Comic Strip View,” can be brought up in another window
(see Figure 7-4). This is similar to the storyboard representation in SILK (Landay
1996).
Operations
A designer can group together two scenes and create an Operation (see
Figure 7-5). An operation is like a production rule for CrossWeaver. In the example
shown in Figure 7-5, the tool determines the difference between the two scenes; here
the difference is the addition of an object, a building. During test mode, the
transitions in the operation can be used to add the object to an arbitrary scene.
During test mode, the operation applies globally.
CrossWeaver interprets operations by looking at the difference between the
two scenes and assigning a meaning to that difference. In this version, CrossWeaver
understands adding an object, change of color, zooming in and out, moving objects,
Chapter 7
83
and deleting an object. These are the set of operations that are most useful in a map
or drawing application. The specific parameters for each operation, e.g., location to
move to, are inferred from the drawn scenes.
The first scene must be copied-and-pasted to the second scene and then
modified for CrossWeaver to make the correct inference about the difference
between the two scenes. Each stroke in the scene has an associated identifier.
CrossWeaver looks at the changes to the strokes in its inference algorithm. If the
Figure 7-5. Grouping two scenes and creating an “Operation.” Based on the difference between the scenes, this operation is inferred as the addition of a building to a scene, triggered by any of the three input modes in the transition joining the two scenes.
Chapter 7
84
second scene has more strokes than the first scene, CrossWeaver treats this as an
addition, and uses the difference of strokes between the scenes as the added object.
If CrossWeaver sees a color difference of one of the strokes, then it interprets the
difference as coloring. If CrossWeaver calculates that the bounding box of the
strokes in the first scene has grown or shrunk, while the number of strokes has
remained the same, then CrossWeaver interprets this as a zoom operation. Likewise,
if the bounding box has stayed within 5% of the original size and has just shifted,
then CrossWeaver interprets the operation as a move of an object. If CrossWeaver
counts that the second scene has fewer strokes than the first, then CrossWeaver
interprets this as a deletion.
This method of specifying operations focuses the designer on experimenting
with different input commands that trigger the interaction. Any input transition in an
operation will be valid for triggering the operation. The designer can modify the
input transitions to quickly try different input modes that trigger an operation.
Operations were easily understood by the designers we spoke with and were
called potentially useful. But in the first few trials, the designers were more
concerned with using different storyboard scenes individually rather than
parameterized operations (see Chapter 8). Thus, for designers, operations will be a
more advanced feature that is learned over time.
Chapter 7
85
Global Transition
A transition that points back to its originating scene is a global transition,
which can be activated from any other scene. This provides a method for a tester to
jump to a specific scene from any other place in the storyboard. In Figure 7-6, the
keyboard press ‘h’ or the gesture ‘h’ or the spoken input ‘home’ will transition to
this first scene in the storyboard.
Imported Images and Text Labels
To allow the designer to quickly reuse past art work or previous designs,
CrossWeaver allows the insertion of images into the scenes (see Figure 7-7). The
designer can also insert typed text labels. These images and labels co-exist with
drawn strokes. This combination of formal and informal representations is
potentially quite powerful. It allows the designer to reuse elements that might
Figure 7-6. A global transition points back to the scene from which it started. The third input panel specifies that a keyboard press of ‘h’ or a gesture of ‘h’ or a spoken ‘home’ on any scene will take the system back to this starting scene of the storyboard.
Chapter 7
86
already have been created in another tool, such as a background design or a template.
The designer can also quickly change the elements that are more in flux using drawn
strokes. Imported text labels can also be used for labeling images, as shown in the
example in Figure 7-7.
7.3 Test Mode
Once the storyboard is complete, the designer can execute the application by
clicking on the “Run Test…” button. A built in multimodal browser will be started,
corresponding to PC device #0.
Architecture of Test Execution
The browser accepts mouse button presses, keyboard presses, pen gestures,
and speech input from a tester. Gesture recognition and speech recognition are
performed in separate recognition agents participating with the test browser in the
execution of the application.
In the storyboard, each scene specifies target output devices. A standalone
version of the multimodal browser, written in Java, can be started separately on the
devices that are participating in the test (see the bottom of Figure 7-8). This
standalone version works just like the built-in browser, displaying scenes and
accepting mouse, keyboard, pen and speech input. The standalone browsers and the
recognition engines are all joined together as agents using the Open Agent
Architecture (SRI 2003).
Chapter 7
87
The state management of the test is controlled by CrossWeaver in Test Mode
running on the main PC. CrossWeaver tracks the current scene being displayed on
each device and also translates an input action into the next scenes to be shown. Any
input is tagged by a specific device ID. Communication among test browser is done
via messages passed by the Open Agent Architecture (Moran, Cheyer et al. 1998)
that include the specific device ID. Thus, CrossWeaver is managing a state machine
based on the storyboard with multiple active states. Each active state corresponds to
a device participating in the test.
Figure 7-7. A bitmap image of a map has been inserted into the scene. A typed text label has also been added. These images co-exist with strokes, providing combined formal and informal elements in the scene. Images can be imported from the designer’s past work or an image repository.
Chapter 7
88
Two or more standalone device browsers can share the same device ID. This
provides a shared scene configuration, where the device screens are kept in sync. A
shared screen configuration would be used for a broadcast presentation tool, for
example. Race conditions on input would be handled by the Open Agent
Architecture (Moran, Cheyer et al. 1998), and would generally resolve with the first
input received determining the next state.
Figure 7-8. Clicking on the “Run Test…” button in design mode brings up the Test Mode Browser, which accepts mouse, keyboard, and pen input. (Top) In the first sequence, the end user gestures ‘s’ on the scene and the scene moves to the appropriate scene in the storyboard. (Bottom) In the second sequence, the user accesses the ‘add building’ operation, adding buildings to the scene. This is occurring in the standalone browser running on device PDA #0, as identified by the ID in the title bar of the window. Pen recognition and speech recognition results come into the browser from separate participating agents.
Chapter 7
89
End-User Experience running a Test
Multiple users can participate in a test simultaneously, as in a Computer
Supported Cooperative Work (CSCW) application with one device per user. Each
test device will run a standalone browser.
The end-users running a test are not aware of the full storyboard that the
designer has created. Each user is focused only on the scene in front of him and on
his input. The designer can help the end-users discover commands by adding audio
instructions or using text or handwritten prompts and instructions in each scene.
7.4 Analysis Mode
After running a test, clicking on “Analyze” in the top of the Test Mode
browser (see Figure 7-8) brings up a timeline log of the most recent test (see Figure
7-9). Each row in the display corresponds to a different participating device in the
execution of the test. Any device that was identified in design mode will show up in
the analysis view.
In Figure 7-9, from left to right we see a time-stamped display of the scenes
that were shown across each device and the corresponding input transition that led to
a change in the state of the test execution. The first transition from the left on the top
row shows that the user on PDA #0 drew ‘h’ with the pen input. If an input
transition happened on another device, it would show in the row for that device.
Chapter 7
90
Some rows have a scene following a blank area, such as the third row
corresponding to PDA #1. Blank grid spaces mean that no changes were made on
that specific device during a transition on another device.
This view allows the designer to examine the inputs attempted by the end
participants even if they were not successful in triggering a transition, such as the
second input transition in row one in Figure 7-9. All input attempts are captured in
this log. The designer can use the analysis display to see if the user switched input
modes or strategies to attempt to trigger the next scene. These repair actions are
especially important in multimodal applications.
Viewing Multimodal Input
Analysis mode will also show the building of a fused multimodal input. If an
input can possibly match a multimodal command, the system waits for other inputs
within a specified time period. If those other inputs happen, they are added to the
display as a multimodal input transition, with multiple input mode slots filled in.
The end transition looks like the multimodal transition as specified in Figure 7-2.
Replay
From within the Analysis Mode display, a designer can replay the entire log
of execution across all of the devices by pressing the play button in the toolbar. This
replay allows the designer to see the user test again, in action, across all of the
devices. Replay is valuable for finding critical incidents or more subtle design flaws
long after the user test is finished.
Chapter 7
91
Returning Full Circle to the Design
The analysis view purposely looks similar to the original design view: it is
linear for each device and has the same scenes and transition layout.
Analysis files can be saved individually and loaded into separate windows,
allowing the designer to compare two or more different participants side-by-side. By
Figure 7-9. CrossWeaver’s Analysis Mode shows a running timeline of all of the scenes that were shown across all of the devices. It also displays the interaction that triggered changes in the state machine. The red outline represents the current state in the replay routine. Pressing the play button steps through the timeline step-by-step replaying the scenes and inputs across devices. The timestamp shows the clock time of the machine running when the input was made.
Chapter 7
92
having the analysis data collected in the design tool itself, designers can quickly
review a number of user tests and refine the design.
An analysis view does not have to correspond to the current scenes in the
storyboard. For example, analysis views of different user tests can be displayed even
after the storyboard itself has changed. (There is presently no facility in
CrossWeaver to assist with labeling different versions of the storyboard.) Keeping
all design, test, and analysis data together in the same tool facilitates rapid iterative
design.
7.5 Final Prototype Summary
CrossWeaver gives a non-programmer designer the ability to conceptualize,
design, test, and analyze a multimodal, multidevice informal prototype. They can
also quickly create a user interface that spans multiple devices simultaneously, and
test that interface with end-users. Interaction designers tasked with creating an
application in this new domain can quickly create a prototype that incorporates pen
gestures, speech input, and multimodal commands.
The design, test, and analysis methodology in CrossWeaver dovetails with
the existing design process and provides the capabilities for rapid multimodal design
in one tool. CrossWeaver’s analysis mode helps capture the spirit of each end-user
test, helping designers decide on the multimodal vocabulary that they will use in
their final application. Rapid design experimentation with CrossWeaver will help
Chapter 7
93
improve a design and help extend the proliferation of multimodal, multidevice
interfaces.
Chapter 8
94
8 Evaluation
We evaluated the final version of CrossWeaver with nine professional
interaction designers working in a range of companies, from design consultancies to
software companies, in the San Francisco Bay Area. Testing with professional
interaction designers brought us full circle from the original field studies for
CrossWeaver as described in Chapter 3. We wanted to test how well we met the
interests of this audience in the final version of the tool.
8.1 Recruitment
These interaction designers were recruited via e-mail lists maintained by the
Group for User Interface Research at UC Berkeley (GUIR 2003). Four of the nine
designers had previously participated in an interview or user test for a different
GUIR project. Two of the participants had participated in the formative interviews
for CrossWeaver. Testing with them brought CrossWeaver full circle and helped test
how well the ideas in CrossWeaver matched their responses in the formative
interviews.
The interaction designers were verbally surveyed about their past experience
with multimodal and multidevice user interfaces, such as speech interface design
experience or design for handheld devices. To participate in the tests, it was not
explicitly required that they had created interfaces in these domains before. Two
Chapter 8
95
designers did have significant experience building multimodal applications. Five of
the interface designers had some experience creating applications for small devices.
The two remaining designers were primarily designers for web and graphical user
interfaces and did not have experience with multimodal or multidevice UIs. None of
the designers worked primarily in speech interfaces. A more detailed profile of the
designers is given in Section 8.8.
The designers were from the following companies:
• BravoBrava!, an educational and game software start-up that is based
in Union City, California and is building applications that use pen and
speech input with rich multimedia output
• IDEO, a product, industrial, and interaction design firm based in Palo
Alto, California
• Roxio, a multimedia hardware and software company based in San
Jose, California
• Adobe, a multimedia software tools company based in San Jose,
California
• PeopleSoft, an enterprise software company based in Pleasanton,
California
• Google, a search engine company based in Mountain View, California
Two designers were tested at BravoBrava!, IDEO, and Google, while one designer
was tested at each of the other three companies.
Chapter 8
96
8.2 Experimental Set-Up
Each user tested the CrossWeaver system on a Toshiba Portege 3500 Tablet
PC (Toshiba 2003), with 256MB RAM and a 1.0GHz Mobile Pentium III, in laptop
configuration (see Figure 8-1). The Tablet PC was connected via a null Ethernet
cable to a Fujitsu laptop (Fujitsu 2003), which served as a second device in the test
mode. In addition to having a built-in stylus for the screen, the Tablet PC also had a
mouse attached to it, and the designer was allowed to use either the mouse or the
stylus. The Tablet PC and the laptop were placed on a table in a conference room,
with the Tablet PC in front of the designer and the laptop to the side of it. The
observer (myself) sat in a seat next to the designer. The observer recorded a log of
the user test using handwritten notes. The observer also periodically saved the
current state of the participant’s design onto the disk.
Tests for the designers at BravoBrava!, IDEO, and Google were performed at
the offices of those companies. Tests for Roxio and Peoplesoft were performed in
the homes of the designers. Tests for Adobe were performed in a conference room
in the Computer Science division’s building at UC Berkeley.
8.3 Pre-Test Questionnaire
Each designer was asked to fill out a paper survey about their background as
a designer and they were interviewed about their current and past design work and
Chapter 8
97
their role on their current team (see Appendix A.2 and A.3). They were asked in the
survey about what prototyping and drawing tools they had used in their past work
and about any likes and dislikes that they had for those tools.
8.4 Training
Each designer was given the same 10 minute demonstration of
CrossWeaver’s features as a training tutorial. This tutorial stepped through the
basics of defining input and output modes in the CrossWeaver system and showed all
three modes of the tool: Design, Test, and Analysis. The demonstration was done
with tutorial examples, and not any specific scenario (See Appendix A.1).
8.5 Protocol
Since this is the first time the tool was introduced to each user, we chose to
PC #0PDA #0
Participant Observer
PC #0PDA #0
Participant Observer
Figure 8-1. Diagram of the Experimental Setup.
Chapter 8
98
use a modified think aloud protocol for this user study, where the user was also
permitted to ask questions about the system if they did not remember how to perform
a certain task or wanted assistance. Since the number of requests for help varied
from participant to participant, time on the task and time on the help was not
captured. Critical incidents and usage log were captured. Assistance was provided
as requested, and was done to explain the use of different features or to remind the
user about the tutorial from the training demonstration if they got stuck.
The three metrics that we focused on in this user study were:
understandability, usability, and capability. We measured this through a
combination of observations about what problems the participant had in
accomplishing the tasks and also in a post-test questionnaire given to the
participants.
8.6 Tasks
Each designer was asked to perform two tasks (see Appendix A.1). The first
task involved creating a multimodal web site, where pages could be accessed via pen
gestures or speech commands in addition to mouse clicks. This task tested the ability
of the designers to use and understand the multimodal input capabilities of
CrossWeaver, by having them create a storyboard that accesses the underlying
recognizers. The designers were allowed to pick what inputs and outputs they used
for the scenario that they implemented. For seven of the designers, this was the first
Chapter 8
99
time that they had ever themselves created an application that used a speech
recognizer.
The second task was to create a remote control, as in a multimedia
entertainment center, where the Tablet PC controlled what was displayed on the
laptop. This task focused on the multidevice output capabilities of CrossWeaver,
and emphasized the fact that scenarios could be used to create applications that span
devices simultaneously. The designers were asked to use the Tablet PC as the
remote control and the laptop as an output device, simulating a TV or radio.
8.7 Post-Test Questionnaire
Each participant was asked to fill out a paper survey (see Appendix A.4) and
written questionnaire (see Appendix A.5) giving their ratings of different aspects of
using the system. The survey and questionnaire measured their subjective reactions
to the system. They were also briefly interviewed to elicit their comments about the
system and their suggestions for feedback.
8.8 Designers’ Profiles
The profiles of the designers that participated in the evaluation are shown in
Table 8-2. Demographically, all but one of the designers that we interviewed were
male, simply based on the designers who responded to our initial inquiry. The
designers had between 3 and 15 years professional design experience with an
Chapter 8
100
average of about 6 years. This was an experienced group of interface designers with
a broad range of completed design projects. All had an interest in multimodal
interface design.
Three of the designers considered themselves as having significant
programming experience and said that they did programming in their daily jobs.
Four of the others said they had minimal programming experience, and mainly
counted their use of HTML as programming experience. The two others considered
themselves non-programmers, and did not claim knowledge of any particular
programming environment.
Only two of the participants had significant exposure to speech and pen
interfaces. These two were directly involved in multimodal application design and
implementation in their jobs. Most of the other designers had some experience using
speech interfaces in telephone directory assistance domains and using pen interfaces
on Palm PDA’s or tablet computers.
The most popular programming language was HTML followed by Visual
Basic. Two of the designers were focused on web design and delivered HTML as
part of their design output. The other designers either used HTML intermittently for
prototyping or for organizing prototypes. Visual Basic was the next most popular
tool. Only two of the designers used Visual Basic frequently. The other designers
had used Visual Basic in the past but not on a daily basis. One of the designers was
Chapter 8
101
an expert in using Director, including the use of the Director programming facilities
through the Lingo programming language (Macromedia 2003b).
The most popular tools used for drawing were Microsoft PowerPoint, Adobe
Photoshop, and Adobe Illustrator. The only designers who did not use Photoshop or
Illustrator were the two programmers. The tool of preference varied significantly
from designer to designer. One designer said that he uses Macromedia Fireworks for
almost all design tasks that he is doing, even though it is targeted to web image
creation. He finds a way to fit Fireworks into his other design problems. This was a
common observation from the users who had specific preferences among drawing
tools. They would typically stick with one tool, learn it well, and fit it into their
design process, even if it was not the most appropriate tool for some of their design
tasks.
Chapter 8
102
Table 8-1. Participant designers’ backgrounds in the final CrossWeaver user tests
CrossWeaver Background Survey Results Designer ID#
1 2 3 4 5 6 7 8 9
Average
Gender M M M M M M M F MAge (years) 30 28 28 33 40 26 27 25 28 29
College Major Fine ArtsMechanical Engineering
Computer Science
Computer Engineering Humanities
Information Systems
Cognitive Science
Information Systems
Computer Graphics
How long have you been a designer? 5 3 4 11 15 3 5 4 5 6
How long have you been using computers? (years) 10 22 8 13 20 20 12 15 20 16
How long have you been programming computers? (Years. Criteria varied. Some were HTML only.) 6 (HTML)
22 (Basic, C, CAD) 8 (C, Java) 7 (C, Java) 20 (Lingo) 1 (HTML) 5 (HTML) 8 (HTML) 20 (C, Java) 10
Have you used speech interfaces? When?
Phone directory
Directory assistance
Yes. Last 3 years
All sorts. 3 years
Airline reservations
Insurance application
Airline directory No
Experiments on Mac
Have you used pen interfaces? When? Palm PDA Palm PDA
Yes. Last 3 years
All sorts. 3 years
Old CAD systems Tablet PC Palm PDA
Yes, drawing program
Yes, experimental
Have you used a multimodal interface? When? No No
Yes. Last 3 years
All sorts. 3 years. No No No No No
Mark all of the following tools that you have used:
Total
HTML x x x x x x x 7Visual Basic x x x x x 5Director x x 2Hypercard x 1TclTk x 1Other: PDF Seq Visual C# Visual C++
FoamcoreAfter Effects
Mark all of the drawing/painting tools that you have used:PowerPoint x x x x x x x x x 9Photoshop x x x x x x x x 8Illustrator x x x x x x x 7Freehand x x x x 4Visio x x x x 4MacDraw x x x 3MacPaint x x x 3Corel Draw x x 2Paintbrush x x 2Other (Fireworks) x x 2Other (Solidworks) x x 2Other (CAD) x x 2Canvas 0Color It! 0Cricket Draw 0SuperPaint 0Xfig 0
Other Cinema 4DPaint Shop
Pro
Chapter 8
103
8.9 Designers’ Case Studies
Below we profile the experience of three of the participants in our evaluation.
These three case studies represent the range of designers that participated in the tests
and the experiences that they had. We describe how these participants reacted to
CrossWeaver and what they accomplished in the evaluation.
8.9.1 Analysis of Participant #4 Usage Log
The first participant described here (Participant #4) is tasked in his day to day
job with designing multimodal user interfaces. One of his projects includes the
multimodal home in which he is designing a multimodal user interface to control the
TV, stereo and other appliances. Another project involves creating an educational
training tool for students learning a second language. He is experienced with pen
interfaces, speech interfaces, and multimodal interfaces, and is both a designer of the
system and an implementer.
In his company, he works on his designs in small teams of two or three. The
team works together on all stages of the project, from the first stage whiteboarding of
the design to the actual coding. In his case, the design process has essentially only
those two steps. The design ideas, specification, and capabilities are brainstormed on
a whiteboard. The details from the whiteboard turn into the specification, which is
Chapter 8
104
often written up as a document. The document is used to build the application in the
programming environment.
One part of this designer’s overall philosophy is “Two is better than one,”
meaning two alternative unimodal commands are better than one. He said fused
commands are used on a case-by-case basis. There are often alternative unimodal
commands for the same operation.
The programming is done with tools such as Microsoft’s Visual C# or Visual
Basic, which can access libraries that utilize speech, pen, and multimodal
recognition. The first coding is done in this environment, and typically the same
code base is iterated on and added to as design refinements are made. Milestone
versions of the application are installed in the final environment and tested with
users; this last step is a laborious process and is done only occasionally.
Task #1
Participant #4 began the multimodal web page task by taking some time to
think about what to draw. He began scene #1 by sketching text in the web page
using the tablet stylus, creating a navigation menu that highlighted that this is the top
level page in his hierarchy (see Figure 8-2). In his case, up to this point he had been
spending more time (~3 minutes) thinking about what he would draw than actually
drawing it (~1 minute).
After creating scene #2, (see the left storyboard area of Figure 8-2), he first
added a speech command, “log-in,” to connect the first two scenes. He asked about
Chapter 8
105
branching to the other three scenes from the top level navigation scene, and upon
being reminded of the way to do this, he immediately creates the tree structure
navigation for his speech controlled web browser. By the end of his storyboard, the
“log-in” speech command points transitions from scene #5 to scene #4 (see Figure
8-2).
In his first test, he assumes that the first page in the storyboard is the starting
scene. In CrossWeaver’s case, the currently selected scene is the start of the tests,
Figure 8-2. Participant #4’s storyboard for Task #1.
Chapter 8
106
which allows the designer to start from any scene in the storyboard. He quickly
selects the home page scene that he has drawn and then runs his first test successfully
in the test browser.
Participant #4 is experienced in multimodal interface design and begins to
briefly experiment with fused input, combining speech commands and gesture input.
Figure 8-3. Participant #4’s storyboard for Task #2.
Chapter 8
107
For example, he added the pen gesture “l” to his speech command “log-in.” After
modifying his hierarchy and running tests with fused input successfully (3 minutes),
he is done with the first task and asks to move to the second.
Task #2
To avoid needing to redraw, Participant #4 starts with the storyboard that he
has created in the last task. He states that his objective is to modify the storyboard so
that the target scenes are displayed on the second device.
At this point, he spends some time building his own conceptual model of
what he is going to do (~4 minutes). He asks for a reminder about specifying
devices, and after being given it, he changes the specific output device of scene #1,
#3, and #4 to only PC #0, the Tablet PC, and changes scene #2 to PDA #1, the laptop
(see Figure 8-3). He also modifies other scenes. While doing this, he wanted to add
hotspot regions to the navigation page, and in doing so he did some trial and error
with defining regions (see Figure 8-3). After understanding the different ways to
define regions, he ran a test successfully.
This designer has designed multidevice scenarios before in his work for a
home entertainment control system. At this point, he requested speech control to be
available across all devices instead of just Device #0, which is an enhancement that
has been added to CrossWeaver since this test.
Participant #4 said that he saw the potential use of CrossWeaver anywhere
speech and pen are used in his designs. He thought it had the potential to very
Chapter 8
108
quickly show the preliminary form of the interface that he is designing. He saw
immediate application to the project he is doing designing a digital home. He also
saw the limits of the tool for helping him with an ESL (English as a Second
Language) project that relies heavily on dictation recognition, because there is no
facility in CrossWeaver to process input streams from a dictation recognizer.
8.9.2 Analysis of Participant #1 Usage Log
Participant #1 is a professional interaction and product designer who came
from a fine arts background and has no formal education in design. He describes his
job as building prototypes and feels very facile with the tools that he uses to build
those prototypes. He most frequently creates HTML pages for prototyping and has
experience with HTML coding, but uses no other programming languages. Among
prototyping, drawing and painting tools, he has used Director, Adobe Photoshop,
Adobe Illustrator and video simulations. He makes paper sketches often and
sometimes converts those sketches to representations in CAD tools if it is required in
his client work. He does make paper storyboards for illustrations and for illustrating
behaviors.
Chapter 8
109
Task #1
Participant #1 began the first task (see Figure 8-4) by experimenting with the
stylus on the tablet computer. He had used a stylus before on PDAs and an attached
tablet to make drawings on his computer, but had not used a tablet computer before.
His initial scenes in the drawings were almost exclusively words; he wrote
instructions for a user explaining the interaction that would trigger behaviors in the
Figure 8-4. Participant #1’s storyboard for Task #1.
Chapter 8
110
execution mode. He drew and erased a few different scenes. His storyboard only
shows the two remaining scenes at the end of his test (see Figure 8-4).
As a professional designer, he was very quick to make requests about
additional features in CrossWeaver’s Design interface, including handles and visual
enhancements that would help access the underlying behaviors and storyboard state.
The visual enhancements that he requested were inspired by the handles that he had
seen in his professional tools, such as Photoshop and Director.
He quickly understood the way to build a branch structure in the storyboard.
During his trial he made the request for a logical “NOT,” a transition that would
happen if the recognition in a test did not match a transition.
Task #2
In his second task (see Figure 8-5), Participant #1 began by spending a few
minutes puzzling over the concept of a multidevice application. He had never
designed such an application and questioned the usefulness of multidevice as a style.
He also wanted to minimize the device specification area, since he thought he would
be using this tool primarily to target a single device scenario.
Nevertheless, he quickly drew an abstract scenario of a house with some lines
underneath it. He had those lines act as hyperlinks on the primary device, the Tablet
PC, to create the picture of a house on the laptop. He created the scenario and
quickly tested it.
Chapter 8
111
At the end of the second task, this designer got excited about the possibility
of using CrossWeaver in some of his product design scenarios. He sketched a
storyboard quickly on paper with the picture of a toaster and toast in it, and described
how he often uses two dimensional mock-ups to represent three dimensional
products. The two dimensional mock-ups that he creates in HTML often have
hotspot areas on top of active elements in the 3D object, such as the lever of the
Figure 8-5. Participant #1’s storyboard for Task #2.
Chapter 8
112
toaster. He said he would like to use CrossWeaver for the toaster scenario and some
other product design scenarios that he was working on.
8.9.3 Analysis of Participant #9 Usage Log
Participant #9 was both a designer and a programmer, who primarily was
Figure 8-6. Participant #9’s storyboard for Task #1.
Chapter 8
113
working on web page design. His training was in Computer Graphics, and most of
his design training was self taught. He had a background in a number of computer
languages and drawing tools and considered himself an expert and daily user of
Macromedia Homesite, a web page creation tool. He frequently created paper mock-
ups of the web page designs that he was working on, and mentioned that one of his
specific problems with paper was losing it and keeping track of versions.
Task #1
Participant #9 started using the tablet stylus and created a page with two links
(see Figure 8-6). Before creating the links, the participant asked for a reminder
about defining hotspot areas. The links he created were done on regions of the page
that were otherwise blank; in other words, he did not create the link text before
creating the link. He just created a blank link region. This participant found that he
had to write slower than he expected on the tablet, based on the tablet hardware
speed.
He created the other pages as boxes with a small amount of text in them,
hooked up the links and then tested his small design. He then proceeded to
experiment with different color strokes and image and text label import. In total he
spent seventeen minutes completing the task and experimenting with the other
functions.
Chapter 8
114
After the test, he said he was impressed with the speech and gesture input
functionality, but he wanted the system to export to HTML. All of the mock-ups he
does at present are shown in HTML.
Task #2
Participant #9 created a new site for the remote control scenario (see Figure
8-7). He created a storyboard that showed a stoplight-like controller for the Tablet
PC changing the text that is shown on the remote device, the laptop. At this point,
the participant was more comfortable with the basic way of using this tool, and he
started making some requests about features he would like to see. He suggested
some sort of different visual to show the global transitions. He suggested a few
changes to the visuals such as collapsing the arrows of adjacent transitions. He also
asked for full screen edit mode, which is available in the tool, and suggested that
could be the default view and that the storyboard view could be separate from it.
His main concern at the end of the test was on the scalability of the
storyboard model and visuals as shown. He said that he liked the potential of the
tool to help him with scene management and organizing the mock-ups of his web
pages. For his web work, he wanted a tool that would more fully support a sitemap
view, export to HTML, and focus on web site design as the target domain.
Chapter 8
115
8.10 Designers’ Post-Test Survey Results
Each interface designer was given a survey (see Appendix A.4) and a
questionnaire (described in Section 8.11) at the end of the user test. The survey
results are shown below:
Figure 8-7. Participant #9’s storyboard for Task #2.
Chapter 8
116
Table 8-2. Results of the Post Test Survey given to the participants.
CrossWeaver Post Survey Results Participant #
Ranking Questions: 1 2 3 4 5 6 7 8 9
Average
Std Dev
1) How functional did you find this prototype? (1=not functional, 10=functional) 8 7 7 8 7 8 8 9 9 7.9 0.82) Did you find yourself making a lot of errors with this tool? (1=no, 10=yes) 2 5 3 3 4 8 7 2 6 4.4 2.23) Did you find it easy to correct the errors that you made? (1=no, 10=yes) 6 10 7 9 2 8 7 9 8 7.3 2.34) Did you find yourself making a lot of steps to accomplish each task? (1=no, 10=yes) 2 3 3 2 3 6 7 2 4 3.6 1.85) Ease of use of the prototype as given. (1=hard, 10=easy) 7 7 8 10 4 5 7 9 8 7.2 1.96) Quickness with which you could accomplish the tasks you had to complete. (1=very poor, 10=excellent) 8 9 7 9 7 5 7 9 7 7.6 1.37) How natural (understandable) were the commands that you used? (1=very poor, 10=excellent) 7 9 8 8 9 7 6 9 7 7.8 1.1Recognition performance:8) What are your comments about the performance of the speech recognizer? (1=very poor, 10=excellent) n/a n/a 3 9 4 3 10 9 9 6.7 3.09) What are your comments about the performance of the handwriting recognizer? (1=very poor, 10=excellent) 6 10 4 8 6 8 5 10 8 7.2 2.1
The survey was designed to elicit metrics of functional capability (Question
#1), ease of use (Question #5), and understandability (Question #7) from each of the
designers. The information on other freeform survey questions collected data about
the quality of the prototype and in what areas the participants had specific problems.
The responses to these questions are discussed in Section 8.12.
Chapter 8
117
Functional capability (see Figure 8-8) was rated an average 7.9 out of 10 (SD
0.8), which showed that the designers believed that they were able to create
applications that utilize multimodal and multidevice interaction using this tool.
Ease of use (see Figure 8-9) was rated average 7.2 out of 10 (SD 1.9).
Participants #5 and #6 had several critical incidents related to erasing and selection
during their user test, which may have contributed to them rating the tool lower.
This is also reflected in their responses to question #2, in which they said they found
themselves making a lot of errors with this tool. These two participants use a single
tool in their daily design work, and they both asked a few times during the test for
Functional Capability
0
1
2
3
4
5
6
7
8
9
10
Figure 8-8. Rating of CrossWeaver’s functional capability.
Chapter 8
118
features in CrossWeaver that had their origin in their tool of preference.
Understandability (see Figure 8-10) was rated 7.8 out of 10 (SD 1.1). (The
question asked in the survey was, “How natural were the commands that you used?”
This was clarified to each participant verbally as “How understandable were
CrossWeaver’s commands?”) After the initial training and experimentation, many of
the designers said that they felt prepared to try out the tool on their day-to-day design
tasks. Further details about understandability were elicited in the freeform survey
questions discussed later in Section 8.11.
Given that these results were measured in a subjective survey, the positive
results that the participants gave need to be taken with a bit of skepticism. With
survey measurement, negative numbers would more likely be more accurate
Ease of Use
0
1
2
3
4
5
6
7
8
9
10
Figure 8-9. Rating of CrossWeaver’s ease of use.
Chapter 8
119
indicators of problem areas than positive numbers are indicators of the complete lack
of problems.
8.11 Designers’ Post-Test Questionnaire Results
Each interface designer was also given a post-test, freeform questionnaire in
written form at the end of his or her test (see Appendix A.5). When asked about
whether they would use CrossWeaver to design interfaces, seven of the nine
participants wrote “Yes” outright. Two of the participants wrote “Yes” with the
caveat that they would use it if they were tasked with designing interfaces that use
Understandability
0
1
2
3
4
5
6
7
8
9
10
Figure 8-10. Rating of CrossWeaver’s understandability.
Chapter 8
120
pen and speech input. Those two participants were specifically attracted to the
ability to incorporate recognizers into their interface designs, but they said that they
would prefer to use their everyday tool for their interface design work, given the
investment in time and training that they have given to it.
Based on the responses to the freeform questions, the method of building
storyboards was uniformly understandable. Most of the comments were about
improving some of the interaction techniques used in the tool, such as an erase and
selection capability that did not use gestures. This capability exists in the tool
already, but is accessible only through selection with a keyboard modification key.
Half of the participants requested selection to be the default behavior of a mouse
click. The second most common comment was that inserting images and labels
should be made easier, since many of the designers believed that they would be
importing images from the disk into their designs at least part of the time. One
participant said that he would not likely use this tool for multi-device scenarios and
thus wanted a way to turn off the device target area in the storyboard. No other
participant had specific requests for changes in any of the basic top-level concepts in
the tool.
Based on the answers to the freeform questions, the participants found the
method of running the prototypes very understandable. Nearly all found that the
browser metaphor made sense. Three of the participants said that the idea of
browsers across devices with ID’s as targets made sense to them, but only after the
Chapter 8
121
tutorial demonstration. In their normal course of work, they are used to only
working with screens on a single device.
Using operations was not specifically part of any of the tasks, though it was
demonstrated in the tutorial. We prompted four of the participants who had
experimented with operations to answer the freeform question about the method of
creating “Operations.” These participants said that they were not immediately sure if
it was a useful technique in this tool. They primarily envisioned this tool as useful
for managing their storyboards, and did not see themselves creating reusable objects
or other drawing behaviors in their designs. One participant did write that he thought
primitive operations would be useful. He gave the example of operations that trigger
the same behaviors for different entertainment appliances, such as an on-off button,
volume control, and channel changing. He wanted to see the functions of a universal
remote control as primitive operations. Based on what he wrote, his request would
likely be fulfilled by allowing subcomponent scenes to exist in the CrossWeaver
tool, where the same sequence in the storyboard can be reused in other storyboard
sequences to trigger behaviors in other scenes.
Some of the general comments added to the end of the questionnaire are
shown in Table 8-3. These comments illustrate a positive response to the tool from
the participants. In the oral request for additional comments, all of the participants
asked about the availability of the tool for further experimentation.
Chapter 8
122
Table 8-3. Selected participant’s general comments from the post-test questionnaire.
Cool! Has GREAT potential as a tool for real world use!
Not too many bugs!
Looks good - I'd like to try out on real stuff
Fun
Very cool
Impressive work!
8.12 Design Issues
Questions #2, #3, #4, and #6 in the Post Test Survey (see Table 8-2) were
meant to specifically generate comments about the quality of the CrossWeaver
implementation and find areas that the participants had specific problems with during
the tests. Based on the results of those questions, we could see that two participants
(#6 and #9), who had significant interaction design errors as they were working with
the tool, gave rankings of 8 out of 10 and 6 out of 10 to question #2 (where 10 means
many errors and 1 means few). They specifically had problems with selecting and
erasing strokes. Both of those functions were accessed in the tool by switching to
the “Gesture” mode in the tool and performing a selection gesture or an erase
gesture. These two participants had to retry the gestures multiple times before
getting them to work.
Chapter 8
123
These participants also said that the errors were easy to correct, based on
their rankings 8 out of 10 and 8 out of 10 for question #3, about how easy the errors
were to correct. Other participants picked up the specific gestures fairly quickly
after the tutorial and explanation, but nearly all asked for clarification. Based on this
feedback, the method for erase and select has been changed in a later version of
CrossWeaver. Instead of using gestures, CrossWeaver uses click-select as the
default and accepts the delete key for erasing the selected object.
All of the designers thought that the user test capture in Analysis Mode was a
valuable feature in CrossWeaver and noted that it was not presently incorporated in
any of the tools that they use. Some designers wanted to see the user test capture
extended to capture think-aloud audio comments of the users during the test. This
could be added in a future version.
CrossWeaver’s Design Mode layout was inspired by Microsoft PowerPoint
(Microsoft 2003b). PowerPoint was listed as the most common drawing tool used by
the designers that we talked to. Even so, six of the designers mentioned that they
wanted to see CrossWeaver behave like their favorite tool (e.g., Director, Fireworks,
Homesite, Photoshop, or Illustrator). Two designers who saw themselves as tied to a
single tool (Director or Fireworks) gave rankings of 4 out of 10 and 5 out of 10 for
“Ease of use” for CrossWeaver (see Table 8-2). Choosing PowerPoint as the design
model was intended to make CrossWeaver more inviting for the participants in the
Chapter 8
124
test. Adding in features that are common to the other tools mentioned will increase
CrossWeaver’s appeal to certain designers.
Questions #4 and #6 related to how quickly the designers could accomplish their
design tasks in CrossWeaver. In general the designers found that CrossWeaver was
quick to use.
Questions #8 and #9 asked about the performance of the recognizers in the tests.
The designers’ ratings of performance varied depending on the performance during
their specific test. The recognizer performance did not interfere with CrossWeaver’s
use in any case.
8.13 Evaluation Summary
Our evaluation of CrossWeaver tested the initial reaction of nine professional
designers to the capabilities of CrossWeaver. These professional designers, who
work or will work on multimodal and multidevice interfaces, represented the target
users we envisioned when initially planning CrossWeaver. We designed two
evaluation tasks to cover both multimodal and multidevice interfaces. We utilized a
modified talk-aloud protocol since this was the first time these designers were using
this tool.
Based on the survey and questionnaire responses after the tests, it appears
that the two participants who were presently involved in the design of multimodal
interfaces were most excited and enthusiastic about CrossWeaver. Their typical
Chapter 8
125
process takes them from whiteboard directly into programming to allow them to
utilize recognizers and multimodal behaviors in their interface designs.
CrossWeaver enables them to use recognition in the early stages of their designs and
lets them try out scenarios with participants long before they finish programming.
The two participants who were primarily web designers found CrossWeaver
useful, but specifically asked for features that would be useful for them in their work,
such as support for HTML output. These are features that are part of the DENIM
tool (Lin, Newman et al. 2000; Newman, Lin et al. 2003), which was geared
specifically towards web site designers. These two designers were familiar with
DENIM and thought it was more appropriate for their day-to-day work than
CrossWeaver.
All of the participants said that they thought this tool opened up a new design
space versus the design space that they think about in tools that they presently use.
This appears to be based on having speech and pen recognition and multidevice
features integrated into the tool, which are unavailable in the other tools. Three of
the designers said the tool was appealing because they could envision using it to
design accessible interfaces for disabled persons.
Chapter 9
126
9 Implementation and Limitations
In the following chapter, we cover the implementation details and limitations of the
final implementation of CrossWeaver.
9.1 Implementation Details
CrossWeaver is built as a Java application using JDK1.4.1. The core
sketching capabilities in CrossWeaver and the scene models are extended from Diva
(Reekie, Shilman et al. 1998), an infrastructure for building sketch-based
applications. The standalone browser is built in Java as well, using only capabilities
compatible with Java on mobile devices.
The first interactive prototype of CrossWeaver (see Chapter 6) used the
This work
Open Agent Architecture [SRI]
CrossWeaver Design Mode
Informal Prototype
ApplicationLayer
DesignLayer
InfrastructureLayer
Recognition Agents
Speech Agent
Pen Agent
Output Agents
Visual Agent
Audio Agent
Built-in Browser Standalone Browser
WOz
Diva [Shilman]
CrossWeaver Analysis Mode
CrossWeaver Test Mode
MSFT
Paragraph/Diva
MSFT
This work
Open Agent Architecture [SRI]
CrossWeaver Design Mode
Informal Prototype
ApplicationLayer
DesignLayer
InfrastructureLayer
Recognition Agents
Speech Agent
Pen Agent
Output Agents
Visual Agent
Audio Agent
Built-in Browser Standalone Browser
WOz
Diva [Shilman]
CrossWeaver Analysis Mode
CrossWeaver Test Mode
MSFT
Paragraph/Diva
MSFTOpen Agent Architecture [SRI]
CrossWeaver Design Mode
Informal Prototype
ApplicationLayer
DesignLayer
InfrastructureLayer
Recognition Agents
Speech Agent
Pen Agent
Output Agents
Visual Agent
Audio Agent
Built-in Browser Standalone Browser
WOz
Diva [Shilman]
CrossWeaver Analysis Mode
CrossWeaver Test Mode
MSFT
Paragraph/Diva
MSFT
Figure 9-1. CrossWeaver Architecture
Chapter 9
127
SATIN infrastructure for building sketch-based applications (Hong and Landay
2000). SATIN provides basic stroke management capabilities and the ability to
zoom in and out when viewing the storyboard. Since we changed from an infinite-
sheet layout in the preliminary version of CrossWeaver to a linear storyboard layout
in the final version, we had no need for zooming. Thus, Diva was a more suitable
choice than SATIN for the underlying infrastructure in the final implementation.
CrossWeaver’s multimodal and multidevice capabilities are integrated
through SRI’s Open Agent Architecture (OAA) (SRI 2003) and off the shelf agents
for handwriting recognition and speech recognition are used to communicate with
the CrossWeaver system.
The architecture for CrossWeaver’s final implementation is shown in Figure
9-1. CrossWeaver’s application layer is the control center for the Open Agent
Architecture. During testing, CrossWeaver’s application layer communicates with
the Recognition Input Agents and Output Agents via OAA. The Recognition Agents
take natural input, either audio input from a microphone or a pen gesture represented
as a OAA string message from the browser, and return a recognition result. The
recognition agents and the output agents can run on any machine. Each output agent
is labeled by a device identifier which facilitates the management of multiple devices
by OAA.
The CrossWeaver application itself is composed of three parts, Design Mode,
Test Mode, and Analysis Mode as described in Chapter 7. The prototype built in
Chapter 9
128
Design Mode in CrossWeaver runs in the built-in browser in Test Mode and also
runs in the standalone browsers that have been activated. The standalone browsers
are implemented using the Output Agents. The application layer communicates with
the output and input agents.
CrossWeaver can be used to build a prototype that runs cooperatively across
desktop and handheld systems. Such an application responds appropriately to pen
gestures or key presses from the handheld device. Currently, speech input can be run
in an agent that specifies the device identifier. Hence, speech recognition can be
running on a full powered PC and still have effect on the low powered handheld
device.
Speech recognition agents have been written using Microsoft’s Speech SDK
5.0 (Microsoft 2003c), IBM ViaVoice (IBM 2003), and Nuance’s Voice Recognition
system (Nuance 2003). Text-to-speech agents have been written using Microsoft’s
Text to Speech Engine (Microsoft 2003c) and IBM’s Text to Speech Engine (IBM
2003). An agent wrapped around Paragraph’s handwriting recognition system
(Paragraph 1999) performs pen recognition. We have also wrapped a basic gesture
recognizer that comes in the Diva system as a pen recognition agent (Reekie,
Shilman et al. 1998).
We have created tools so that any of these recognition agents can be
simulated by a Wizard participating on a separate networked terminal. These tools
are standalone graphical applications that allow a Wizard to type in recognition
Chapter 9
129
results and send them as OAA messages to the running CrossWeaver application.
This architecture allows the recognition systems that are used to be easily extensible.
A user need only wrap up their recognizer with the Open Agent Architecture to have
it participate in the CrossWeaver prototype’s execution.
The standalone browsers can be written for different platforms using a similar
method. We have written standalone browsers using Java JDK 1.4 (Sun 2002) and
Java JDK 1.1.7 (Sun 1999). These can run on standard PC’s or Macintosh’s. The
JDK 1.1.7 version is specifically targeted to the Microsoft Pocket PC (Microsoft
2003a) and compatible handheld devices. The JDK 1.1.7 browser for the Microsoft
Pocket PC uses a proxy agent. The proxy agent is an OAA agent that runs on a PC
in the environment and receives messages targeted to the Pocket PC from
CrossWeaver. The proxy agent forwards the messages to the Pocket PC browser via
a lightweight socket mechanism. This lightweight communication allows the Pocket
PC browser to run more efficiently than it would if we tried to run the standalone
OAA browser agent directly.
By its nature, the CrossWeaver Design Mode enforces a specific usage
paradigm. To extend it involves source code modifications. The area where this is
easiest is in adding the interpretation of new operations. In other areas, such as
adding different input modalities to the visual interface or adding different categories
of target devices, possible changes are significantly more limited. We do not expect
non-programmer designers to modify the basic interaction of the tool and expect
Chapter 9
130
them to instead spend time concentrating on their designs. Modifications to the basic
design of CrossWeaver will have to be done as extensions and upgrades.
9.2 Limitations
Based on the implementation details, CrossWeaver also requires multiple steps
for installation and distribution. Most of the installation steps are related to the use
of the Open Agent Architecture and the set-up of OAA and the recognition agents.
To use CrossWeaver with recognition agents requires individual set-up of each
recognizer. For example, the installation of Microsoft’s Speech Recognizer and Text
to Speech engine each requires a separate download. The agents that access each of
these also require separate installation. Without the Open Agent Architecture
installed, CrossWeaver does allow testing of an interface with solely mouse and
keyboard input.
Based on the user test results and the details of the implementation, it has
become clear that CrossWeaver will require brief training or a demonstration video
for every participant who is going to use it. This is not an unexpected requirement,
since the original design target for CrossWeaver did not include learning without
training and the concepts represented in the tool are complex. Nearly all of the
designers volunteered that they felt comfortable with the tool at the end of their first
session using it. They felt they could now go ahead and try to use it for their design
tasks. Some of the advanced features in CrossWeaver, however, such as definable
Chapter 9
131
operations are not fully self-disclosing. The CrossWeaver tutorial video at
http://guir.berkeley.edu/crossweaver/ (CrossWeaver 2003) includes a walk-through
of all of the features for this reason.
Two participants in the user test were particularly concerned about scalability
of the storyboard and arrows after their user test. They foresaw how their designs
might get cluttered with excess arrows. CrossWeaver has the ability to turn off
display arrows as an alternative view of the storyboard, but their comments highlight
the fact that CrossWeaver’s storyboard has a bias towards being linear. The linear
bias originates from CrossWeaver’s focus on helping illustrate scenarios. Each short
linear sequence is a scenario. Different scenarios can be connected in the tool via
transitions that jump from one scenario to another scenario in a different part of the
storyboard. Multiple scenarios in the storyboard can ultimately add up to a full
application interface. However, a full application interface might have scalability
problems in CrossWeaver’s Design Mode if it is not divided into scenarios.
The concept of Operations was added into CrossWeaver to allow it to more
closely simulate a full application. Right now, only a few drawing domain
operations are in CrossWeaver, but they can be used to simulate the behaviors that
are in many map and drawing applications. CrossWeaver is not suited for
applications in other domains, unless those applications can be simulated using
storyboards alone.
Chapter 9
132
As one specific example, CrossWeaver’s Operations would have to be
extended to simulate data-driven applications. There is no facility in CrossWeaver to
look-up or utilize data. CrossWeaver cannot simulate interfaces that rely on
animation, since CrossWeaver’s scenes are static and do not show moving objects.
CrossWeaver would also have to be extended to cover graphical user interfaces,
since there is no concept of widget in CrossWeaver. SILK (Landay 1996; Landay
and Myers 2001) is a more appropriate tool for GUI’s. Likewise, DENIM (Lin,
Newman et al. 2000; Newman, Lin et al. 2003) is a more appropriate tool for web
design since it has a site-map view as its storyboard, and DEMAIS (Bailey, Konstan
et al. 2001; Bailey 2002) is a more appropriate tool for multimedia interfaces, since it
has incorporated multimedia import and synchronization.
The designers who experimented with Operations thought initially that some
of the behaviors that Operations enabled could be simulated using storyboards. To
them, the operations concept was “interesting” and “potentially useful” but was not
something that was immediately necessary for any of the designs that they wanted to
test. Greater use of CrossWeaver might change their opinion and might change their
requests for specific operations to assist them in their design tasks.
Chapter 10
133
10 Future Work and Conclusions
The following chapter covers potential future work for CrossWeaver, a review of the
contributions of this dissertation, and a statement of conclusions.
10.1 Future Work
Various refinements can be made to CrossWeaver, including inclusion of
additional recognition engines, drawing capabilities, and visual refinements that
would make it more appealing to designers. For example, an aggregate view could
be added to the Analysis Mode to help manage large numbers of user tests. Two
other valuable specific directions for CrossWeaver to take are enhanced multimodal
command design and enhanced sketch interpretation capabilities.
For multimodal command design, CrossWeaver presently only includes
simple algorithms for multimodal fusion. CrossWeaver could incorporate a visual
interface for adding multimodal parameters and disambiguating inputs (Oviatt 1999).
To assist with the design of multimodal fusion algorithms, CrossWeaver’s analysis
mode could also capture and display timing data on received inputs.
For sketch interpretation, CrossWeaver’s concept of an Operation can be
enhanced to include more capabilities in other domains based on sketch
interpretation. The existing mechanism is extensible, and applications in different
Chapter 10
134
domains might have different requirements for baseline Operations. For example,
shared ink might be an Operation in the classroom presentation domain.
Other application types that might be supported in CrossWeaver include
context-aware applications and collaborative applications. CrossWeaver currently
does not have internal support for these styles of applications.
For context-aware applications, one possibility is to enable multiple device
representations for a particular scene (e.g., visual large screen, visual small screen,
speech only) and use visual conditionals to determine which scene to display. This
resembles the component approach in DENIM (Lin, Thomsen et al. 2002; Newman,
Lin et al. 2003). We can choose the right scene to display based on the available
device, the user identity, location, or other pieces of actual or simulated context
information. Context conditionals might be visually represented similar to input
modes: a context type and an attached modifier parameter could determine when to
respond to the context information.
For collaborative applications, one idea is to add the ability, via additional
icons representing operations, to illustrate and execute collaboration functionality,
such as sharing strokes, sending drawn items among different devices, and archiving
items (as was illustrated in an earlier CrossWeaver prototype described in Chapter
5). Collaborative operations will increase the sophistication of the categories of
multimodal applications that CrossWeaver can support.
Chapter 10
135
In general storyboard management, CrossWeaver can be extended to utilize
not just a Wizard of Oz for recognition, but also a Wizard of Oz for determining
which storyboard scene is shown next. This would more closely resemble what is
presently performed in paper prototyping (Rettig 1994), where the Wizard actually
changes the scenes that the participant sees based on the input. CrossWeaver’s
Wizard would click on the appropriate next scene to display. This would eliminate
the need for the transitions to be fully specified before execution. CrossWeaver
would still be able to capture the user inputs and the time log of the interaction.
With this log information, the transitions might be inferred from a set of tests.
10.2 Contributions
CrossWeaver has extended the methodology of informal prototyping to the
multimodal, multidevice interface domain. CrossWeaver has retained the techniques
of electronic sketching, executing in sketched form, and incorporating design, test,
and analysis in one tool (Landay 1996; Lin, Newman et al. 2000; Klemmer, Sinha et
al. 2001; Newman, Lin et al. 2003). It has also introduced a compact way of
representing multimodal input and output and a storyboard scheme that represents a
testable prototype of a multimodal, multidevice application.
These techniques have been formalized in this dissertation under the term,
Programming by Illustration. Programming by Illustration is a powerful technique
that also builds upon and complements the pioneering work in programming by
Chapter 10
136
demonstration and other styles of end-user programming. Using this technique,
CrossWeaver enables a designer to build a functional multimodal interface prototype
where the interface is itself specified from the designer’s sketches. This work
includes the necessary tools to execute that interface across multiple devices that
may use multiple input recognizers simultaneously.
CrossWeaver integrates a wide set of different input and output modes into a
single prototyping environment and provides a flexible abstraction on top of an
underlying multimodal application infrastructure. By supporting sketching and
Wizard of Oz-based recognition techniques, CrossWeaver can be used to rapidly
create rough prototypes in the early stages of design. Additionally, support for
image import and working computer-based recognizers, allow CrossWeaver to ease
the path from these early designs to more finished prototypes.
CrossWeaver introduces the capture of a multimodal, multidevice prototype
execution across multiple devices with multiple input recognizers. This technique is
intended to assist designers in analyzing a user test of their interface prototypes and
helping them iterate on those designs.
CrossWeaver is the first tool enabling non-programmer UI designers to
prototype multimodal, multidevice applications. Previously multimodal, multidevice
interface designs were inaccessible to this category of designers, since complex
recognition systems, programming tools, and computing environments were
Chapter 10
137
required. Designing an interface with CrossWeaver is accessible to this previously
unsupported group.
10.3 Conclusions
This dissertation states that: CrossWeaver, an informal prototyping tool,
allows interface designers to build multimodal, multidevice user interface prototypes,
test those prototypes with end-users, and collect valuable feedback informing
iterative multimodal, multidevice design.
Based on the state of current research (see Chapter 2) and our background
interviews (see Chapter 3), we have learned multimodal, multidevice interface
design is still a new interface design domain with few established techniques or
tools. As shown in this dissertation, we have created a prototyping technique (see
Chapter 4) and iteratively created a tool, CrossWeaver (see Chapter 5, 6, 7), that
helps designers address the challenge of designing multimodal, multidevice
interfaces. Based on the interest of designers that participated in the final evaluation
(see Chapter 8), we have shown that we have created a tool that is of interest to
professional designers, our target audience.
From the investigations performed in this work, we have learned that
multimodal, multidevice interface design is an area of increasing interest and
importance to designers, even if their current area of activity is in a different domain.
For multimodal, multidevice interfaces to become more popular, tools that non-
Chapter 10
138
programmers can use are needed. Furthermore, as multimodal applications are not
yet easy to envision, tools for informal prototyping and experimentation are clearly
needed. CrossWeaver has taken the initial step, enabling user interface designers to
experiment with multimodal, multidevice interface design.
139
Appendix A. Evaluation Materials
A.1 Demo Script
Greetings. My name is Anoop Sinha and I am a graduate student in the Group
for User Interface Research at UC Berkeley. I am testing CrossWeaver, a system for
designing multimodal, multidevice user interfaces. Multimodal interfaces are those
that might use mouse, keyboard, pen, or speech commands as input to control the
interaction. Multidevice interfaces might target handheld devices, laptop computer
screens, or PC computer screens, potentially simultaneously, such as in a meeting
room application.
Thank you for agreeing to perform this user test. Now I would like to ask you
some questions about your experience as an interaction design and ask you to fill out
this background questionnaire.
CrossWeaver is a storyboarding tool for informally prototyping multimodal,
multidevice user interfaces.
This is it’s Design Mode layout, similar to PowerPoint, with thumbnails of
storyboard scenes here on the left and the drawing area on the right. In the drawing
area you can add stroke by drawing with the Tablet PC pen or the mouse while Draw
Mode is selected. To erase strokes, you switch to Selection Mode and draw a
scribble gesture on the stroke to erase it. You can also select strokes and erase them
140
by pressing the Delete key. In Region Mode, you define input regions which behave
like hot spots in the system.
To the right of the thumbnail, you see the input transitions. These signify how
the system will transition from one storyboard scene to another. There are four
different input modes, mouse, keyboard, pen gesture, and speech input. You fill in
the parameter underneath each input mode to signify the expected input. Input
transitions lined up next to each other each can happen as alternatives; they are
executed logically as Or’s. Parameters in the same input transition are expected to
happen fused together, meaning happening within two seconds of each other; they
are executed logically as And.
Underneath each thumbnail is the place to specify outputs. From left to right,
you have the PC screens that will show this screen, the PDA screens that will show
this screen, and the text-to-speech audio output that will be played simultaneously
with showing this screen. You can specify multiple devices by putting a comma
between the different devices.
CrossWeaver’s second Mode is the Test Mode, which executes the storyboard as
a working informal prototype. The default browser that starts in the CrossWeaver
view corresponds to PC #0. Standalone versions of the browser can be started
individually on different machines. The system will process mouse, keyboard, pen
gesture, and speech input from each machine simultaneously and execute the
storyboard as it has been specified.
141
After a user test, you can look at the third mode in the tool which is the Analysis
Mode. This shows a timeline sequence of all of the scenes that were displayed on
the different devices as well as the input that was received to trigger the changes in
inputs. This timeline can be replayed by clicking on the Play button. The system
will show the screens again across all of the input devices in the order they were
shown in the original test.
For this user test, I will give you two tasks.
Task #1. Universally Accessible Web Site
The first task is to create a multimodal web site of a few pages where the
transitions between pages are triggered by multimodal commands, such as pen
gestures or speech input. You can pick content for the web site from any domain
you are familiar with or something that you might be doing in a project at work.
You can consider this scenario of creating a Universally Accessible Web Site, where
the users might not have the ability to use a mouse to click on links.
Task #1 gets performed here.
Task #2. Remote Control
The second task is to use the Tablet PC (PC #0) as a remote controller of the
Fujitsu laptop (PC #1). You can consider this the scenario of creating a remote
control for a home control system, where one device controls the other screens.
Task #2 gets performed here.
142
Thank you for performing these two tasks. Now I would like to ask you some
questions about your experience with CrossWeaver and ask you to fill out this post
questionnaire.
143
A.2 Participant Background Questionnaire
ID _____ Gender: Female Male Age: _____ Status: Undergraduate: ______ year Graduate: ______ year Major: _____________________ How long have you been using computers? ______________ How long have you been programming computers? In what capacity?______________ Have you used speech interfaces? When? ______________ Have you used pen interfaces? When? ______________ Have you used a multimodal interface? When? ______________ Place a check mark next to all of the following tools with which you have used:
Director Visual Basic HyperCard HTML Tcl/Tk Other: _________________________
None of the above Place a check mark next to all of the drawing/painting tools which you have used:
Canvas MacPaint Color It! Paintbrush Corel Draw Photoshop Cricket Draw PowerPoint Freehand SuperPaint Illustrator Visio MacDraw Xfig Other: _________________________ None of the above
144
A.3 User Interface Designer Pre-Test Questionnaire
ID _____ General
Describe what you do. Are you primarily devoted to designing? How long have you been a designer? What is your background in terms of education and experience?
Projects How long does the design process typically take? How many people are typically involved? Describe how the design process works. Does it differ much from project to project?
Phases What are the distinct phases to the project? Do you, personally, concentrate or specialize in a particular phase? How do you communicate or preserve design ideas from one phase to another?
Tools What tools, software or otherwise, do you use during the design process? What do you like/dislike about the tools you use?
Sketching/Storyboarding Do you draw sketches or make storyboards on paper during the design process? How do you use the sketches? Do you convert the sketches to another medium at some point? Do you use an electronic tablet?
145
A.4 User Interface Designer Post-Test Survey
ID _____ Ranking Questions: How functional did you find this prototype? Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Did you find yourself making a lot of errors with this tool? Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Did you find it easy to correct the errors that you made? Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Did you find yourself making a lot of steps to accomplish each task? Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Ease of use of the prototype as given: Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Quickness with which you could accomplish the tasks you had to complete: Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10
146
How natural were the commands that you used? Not at all Neutral Extremely 1 2 3 4 5 6 7 8 9 10 Recognition performance: What are your comments about the performance of the speech recognizer? Worst Neutral Best 1 2 3 4 5 6 7 8 9 10 What are your comments about the performance of the handwriting recognizer? Worst Neutral Best 1 2 3 4 5 6 7 8 9 10
147
A.5 User Interface Designer Post-Test Questionnaire
Freeform Questions:
1. Would you use CrossWeaver to design user interfaces? Why or why not?
2. Does the method of building the storyboards make sense to you? ________ If so, what did you like about it? If not, how was it confusing, and how can it be improved?
3. Does the method of running the prototypes make sense to you? ________
If so, what did you like about it?
If not, how was it confusing, and how can it be improved? 4. Do you believe the primitive operation metaphor is useful for designing user
interfaces? Why or why not?
5. If you have any other comments about CrossWeaver, please write them below.
148
A.6 Consent Form
COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES BERKELEY, CALIFORNIA 94720-1776 527 SODA HALL Consent form for participation in research relating to Multimodal, Multidevice Prototyping Using Programming by Illustration. This research is being conducted by Mr. Anoop Sinha for Professor James Landay in the Department of Computer Science at the University of California, Berkeley. The aim of the research is to investigate sketching prototyping tools for designing multimodal, multidevice user interfaces. The main goal is to test the usability of a tool that we have designed and built. The information in this study will be collected by questionnaires and by software recording the interactions of participants with the tool. Additionally, in some situations interviews, observations, and video recording will be used as well. Participants will be notified of all information collection methods used during the time they are involved with the study. Participation in this experiment is voluntary. Participants may withdraw themselves and their data at any time without fear of consequences. Any concerns about the experiment may be discussed with the researcher, Mr. Anoop Sinha, or with Professor James Landay. Participant confidentiality will be provided by storing data so that it will only be identifiable by participant number. The video recordings of the session will not be released without the consent of the participant. No identifying information about the participants will be available to anyone except the researchers and their supervisors. There are no foreseeable risks to you from participating in this research. There is no direct benefit to you. However we hope that the research will benefit society by improving the techniques used to build user interfaces. There will be no costs to you, other than your time and any personal transportation costs. Participation involves filling out a pre-questionnaire, performing some tasks on the internet (such as finding a news story or retrieving maps and driving directions), and filling out a post-questionnaire. As a participant, you will receive $15 compensation. I hereby acknowledge that I have been given an opportunity to ask questions about the nature of the experiment and my participation in it. I give my consent to have data collected on my behavior and opinions in relation to this experiment. I understand I may withdraw my permission and data at any time. I am 18 years or older. If I have any questions, I can contact the primary researcher, Mr. Anoop Sinha, via email at aks@cs.berkeley.edu or phone at (510) 642-3437. Signed ______________________________________________ Name (please print) _______________________________________________ Date ____________________________________________ If you have any questions about your rights or treatment as a participant in this research project, please contact the Committee for Protection of Human Subjects at (510) 642-7461
149
A.7 Participant’s Designs Built Using CrossWeaver During User
Testing
Figure A-1. Participant #1, Task #1
167
References
Abowd, G. D. (1999). "Classroom 2000: An Experiment with the Instrumentation of a Living Educational Environment." IBM Systems Journal, Special issue on Pervasive Computing 38(4): 508-530.
Abowd, G. D., C. G. Atkeson, et al. (1996). Teaching and Learning as Multimedia Authoring: The Classroom 2000 Project. Multimedia '96: 187-198.
Adobe (2003a). Adobe Corporation http://www.adobe.com/.
Adobe (2003b). Adobe Illustrator http://www.adobe.com/products/illustrator/main.html.
Adobe (2003c). Adobe Photoshop http://www.adobe.com/products/photoshop/main.html.
Apple (1993). Apple Corporation Newton Personal Digital Assistant.
Bailey, B. P. (2002). A Behavior-Sketching Tool for Early Multimedia Design, Ph.D. Thesis. Minneapolis, MN, Computer Science Department, University of Minnesota: 235.
Bailey, B. P. and J. A. Konstan (2003). Are Informal Tools Better? Comparing DEMAIS, Pencil and Paper, and Authorware for Early Multimedia Design. Proceedings of the ACM Conference on Human Factors in Computing Systems. Ft. Lauderdale, FL: 313-320.
Bailey, B. P., J. A. Konstan, et al. (2001). DEMAIS: Designing Multimedia Applications with Interactive Storyboards. Proceedings of the 9th ACM International Conference on Multimedia. Ottawa, Ontario, Canada: 241-250.
Bluetooth (2003). https://www.bluetooth.org/.
Bolt, R. A. (1980). "Put-that-There: Voice and Gesture at the Graphics Interface." Computer Graphics 14(3): 262-270.
Bricklin, D. (2003). About Tablet Computing Old and New http://www.bricklin.com/tabletcomputing.htm.
Brocklehurst, E. R. (1991). "The NPL Electronic Paper Project." International Journal of Man-Machine Studies 34(1): 69-95.
168
Burnett, M. and D. McIntyre (1995). "Visual Programming." Computer 28(3): 14-16.
Buxton, W. (1997). Out From Behind the Glass and the Outside-In Squeeze. ACM CHI '97 Conference on Human Factors in Computing Systems.
Chandler, C., G. Lo, et al. (2002). "Multimodal Theater: Extending Paper Prototyping to Multimodal Applications." Extended Abstracts of Conference on Human Factors in Computing Systems: CHI 2002 2: 874-875.
Chang, S.-K. (1987). "Visual Languages: A Tutorial and Survey." IEEE Software 4(1): 29-39.
Chang, S.-K., T. Ichikawa, et al., Eds. (1986). Visual Languages. New York, Plenum Press.
Cheyer, A., L. Julia, et al. (1998). A Unified Framework for Constructing Multimodal Experiments and Applications. Proceedings of Cooperative Multimodal Communication '98. Tilburg (The Netherlands): 63-69.
Clow, J. and S. L. Oviatt (1998). STAMP: an automated tool for analysis of multimodal system performance. Proceedings of the International Conference on Spoken Language Processing. Sydney, Australia: 277--280.
Cohen, P. R., M. Johnston, et al. (1997). QuickSet: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia 97. Seattle, WA, USA, ACM, New York, NY, USA: 31-40.
Cohen, P. R., M. Johnston, et al. (1998). The Efficiency of Multimodal Interaction: A Case Study. International Conference on Spoken Language Processing, ICSLP'98. Australia. 2: 249-252.
CrossWeaver (2003). http://guir.berkeley.edu/crossweaver/.
Cypher, A., Ed. (1993). Watch What I Do: Programming by Demonstration. Cambridge, MA, MIT Press.
Dahlbäck, N., A. Jönsson, et al. (1993). Wizard of Oz Studies - Why and How. Intelligent User Interfaces '93: 193-200.
Damm, C. H., K. M. Hansen, et al. (2000). Tool Support for Cooperative Design: Gesture Based Modeling on an Electronic Whiteboard. Proceedings of CHI 2000, ACM Conference on Human Factors in Computing Systems. The Hague, The Netherlands: 518 - 525.
169
Davis, R. (2002). Sketch Understanding in Design: Overview of Work at the MIT AI Lab. 2002 AAAI Spring Symposium. Stanford, CA: 24-31.
Fujitsu (2003). Fujitsu Laptop, http://www.computers.us.fujitsu.com/index.shtml.
GO (1992). Go Corporation PenPoint Operating System. Reading, MA, Addison-Wesley.
Gould, J. D. and C. Lewis (1985). "Designing for Usability: Key Principles and What Designers Think." Communcations of the ACM 28(3): 300-311.
Gross, M. D. (1994). Recognizing and Interpreting Diagrams in Design. The ACM Conference on Advanced Visual Interfaces '94. Bari, Italy: 89-94.
Gross, M. D. and E. Y.-L. Do (1996). Ambiguous Intentions: A Paper-like Interface for Creative Design. ACM Symposium on User Interface Software and Technology. Seattle, WA: 183-192.
GUIR (2003). Group for User Interface Research, http://guir.berkeley.edu.
Halbert, D. C. (1984). Programming by Example, Ph.D. Thesis. Berkeley, CA, Computer Science Division, EECS Department, University of California.
Hammond, T. and R. Davis (2002). Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams. 2002 AAAI Spring Symposium on Sketch Understanding. Stanford, CA: 59-68.
Hong, J. I. and J. A. Landay (2000). "SATIN: A Toolkit for Informal Ink-based Applications." UIST 2000, ACM Symposium on User Interface and Software Technology, CHI Letters 2(2): 63-72.
Hong, J. I., F. C. Li, et al. (2001). End-User Perceptions of Formal and Informal Representations of Web Sites. Proceedings of Extended Abstracts of Human Factors in Computing Systems: CHI 2001. Seattle, WA: 385-386.
IBM (2003). ViaVoice http://www.ibm.com/software/voice/viavoice/.
IEEE (2003). http://grouper.ieee.org/groups/802/11/.
InStat (2003). Event Horizon: Two Billion Mobile Subscribers by 2007: 2003 Subscriber Forecast. http://www.instat.com/abstract.asp?id=98&SKU=IN0301117GW.
170
Ionescu, A. and L. Julia (2000). EMCE: A Multimodal Environment Augmenting Conferencing Experiences. FAUIC'2000. Canberra (Australia).
Julia, L. and A. Cheyer (1999). Multimedia Augmented Tutoring Environment for Travel CARS. http://www.ai.sri.com/oaa/chic/projects/docs/TravelMATE.pdf.
Kelley, J. F. (1984). "An Iterative Design Methodology for User-Friendly Natural Language Office Information Applications." ACM Transactions on Office Information Systems 2(1): 26-41.
Klemmer, S. R., A. K. Sinha, et al. (2001). "SUEDE: A Wizard of Oz Prototyping Tool for Speech User Interfaces." CHI Letters, The 13th Annual ACM Symposium on User Interface Software and Technology: UIST 2000 2(2): 1-10.
Klemmer, S. R., M. Thomsen, et al. (2002). "Where Do Web Sites Come From? Capturing and Interacting with Design History." CHI Letters, Human Factors in Computing Systems: CHI2002 4(1): 1-10.
Kramer, A. (1994). Translucent Patches – Dissolving Windows. ACM Symposium on User Interface Software and Technology. Marina del Rey, CA: 121-130.
Kurlander, D. (1993). Graphical Editing by Example, Ph.D. Thesis. New York, Computer Science Department, Columbia University.
Landay, J. (2001). Computer Supported Cooperative Work (CS294-2). http://guir.berkeley.edu/courses/cscw/fall2001/.
Landay, J. A. (1996). Interactive Sketching for the Early Stages of User Interface Design, Ph.D. Thesis. Pittsburgh, PA, Computer Science Department, Carnegie Mellon University: 242.
Landay, J. A. and B. A. Myers (1995). Interactive Sketching for the Early Stages of User Interface Design. Human Factors in Computing Systems: CHI ’95. Denver, CO: 43-50.
Landay, J. A. and B. A. Myers (1996). Sketching Storyboards to Illustrate Interface Behavior. Human Factors in Computing Systems: CHI '96 Conference Companion, Vancouver, Canada.
Landay, J. A. and B. A. Myers (2001). "Sketching Interfaces: Toward More Human Interface Design." IEEE Computer 34(3): 56-64.
171
Li, Y., J. A. Landay, et al. (2003). Sketching Informal Presentations. Fifth ACM International Conference on Multimodal Interfaces: ICMI-PUI 2003. Vancouver, B.C., Canada: 234 - 241.
Lieberman, H., MIT Press, 1993. (1993). Mondrian: A Teachable Graphical Editor. Watch What I Do:Programming by Demonstration. A. Cypher, MIT Press.
Lin, J., M. W. Newman, et al. (2000). "DENIM: Finding a Tighter Fit Between Tools and Practice for Web Site Design." CHI 2000, Human Factors in Computing Systems, CHI Letters 2(1): 510-517.
Lin, J., M. Thomsen, et al. (2002). "A Visual Language for Sketching Large and Complex Interactive Designs." CHI Letters: Human Factors in Computing Systems, CHI 2002 4(1): 307-314.
Long, A. C. (2001). Quill: A Gesture Design Tool for Pen-based User Interfaces, Ph.D. Thesis. Berkeley, CA, Computer Science Department, University of California, Berkeley: 307.
Macromedia (2003a). http://www.macromedia.com/.
Macromedia (2003b). Macromedia Director http://www.macromedia.com/software/director/.
Macromedia (2003c). Macromedia Fireworks http://www.macromedia.com/software/fireworks/.
Macromedia (2003d). Macromedia FreeHand http://www.macromedia.com/software/freehand/.
McCloud, S. (1993). Understanding Comics. New York, NY, Harper Colins.
McGee, D. R. and P. R. Cohen (2001). Creating Tangible Interfaces by Augmenting Physical Objects with Multimodal Language. Proceedings of the 6th International Conference on Intelligent User Interfaces. Santa Fe, New Mexico: 113-119.
McGee, D. R., P. R. Cohen, et al. (2002). "Comparing paper and tangible multimodal tools." CHI Letters: Human Factors in Computing Systems: CHI 2002 1(1): 407-414.
Microsoft (1992). Microsoft Windows for Pen Computing.
172
Microsoft (2003a). Microsoft Pocket PC, http://www.pocketpc.com/.
Microsoft (2003b). Microsoft PowerPoint http://www.microsoft.com/office/powerpoint/default.asp.
Microsoft (2003c). Microsoft Speech SDK http://www.microsoft.com/speech/.
Microsoft (2003d). Microsoft Tablet PC SDK http://www.tabletpcdeveloper.com/.
Microsoft (2003e). Microsoft Visio http://www.microsoft.com/office/visio/.
Mignot, C., C. Valot, et al. (1993). An Experimental Study of Future 'Natural' Multimodal Human-Computer Interaction. Proceedings of ACM INTERCHI'93 Conference on Human Factors in Computing Systems -- Adjunct Proceedings: 67-68.
Modugno, F. (1995). Extending End-User Programming in a Visual Shell with Programming by Demonstration and Graphical Language Techniques. March. Pittsburgh, PA, Computer Science Department, Carnegie Mellon University: 308.
Moran, L. B., A. J. Cheyer, et al. (1998). "Multimodal User Interfaces in the Open Agent Architecture." Knowledge-Based Systems 10(5): 295-304.
Myers, B. A. (1990). "Creating user interfaces using programming by example, visual programming, and constraints." ACM Transactions on Programming Languages and Systems 12(2): 143-177.
Negroponte, N. (1973). Recent Advances in Sketch Recognition. 1973 National Computer Conference and Exposition. New York, AFIPS Press. 42: 663-675.
Negroponte, N. and J. Taggart (1971). HUNCH – An Experiment in Sketch Recognition. Computer Graphics. W. Giloi, Berlif.
Newman, M. W., J. Lin, et al. (2003). "DENIM: An Informal Web Site Design Tool Inspired by Observations of Practice." Human-Computer Interaction 18(3): 259-324.
Nigay, L. and J. Coutaz (1993). A Design Space for Multimodal Systems: Concurrent Processing and Data Fusion. Proceedings of ACM INTERCHI'93 Conference on Human Factors in Computing Systems: 172-178.
173
Nigay, L. and J. Coutaz (1995). A Generic Platform for Addressing the Multimodal Challenge. Proceedings of ACM CHI'95 Conference on Human Factors in Computing Systems. 1: 98-105.
Nuance (2003). Nuance Speech Recognizer http://www.nuance.com/.
OGI (2003). Oregon Graduate Institute, Adaptive Agent Architecture, http://chef.cse.ogi.edu/AAA/.
Oviatt, S. and P. Cohen (2003). "Multimodal Interfaces That Process What Comes Naturally." Communications of the ACM 43(9): 45-51.
Oviatt, S., P. Cohen, et al. (2000). "Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions." Human-Computer Interaction 15(4): 263-322.
Oviatt, S. L. (1996). Multimodal Interfaces for Dynamic Interactive Maps. Proceedings of Conference on Human Factors in Computing Systems: CHI '96: 95-102.
Oviatt, S. L. (1999). Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. Proceedings of CHI 1999: 576-583.
Oviatt, S. L. (2003). Multimodal interfaces. The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications. J. Jacko and A. Sears. Mahwah, NJ, Lawrence Erlbaum Associates: 286-304.
Paragraph (1999). Calligrapher Handwriting Recognizer http://www.phatware.com/calligrapher/index.html.
Reekie, J., M. Shilman, et al. (1998). Diva, a Software Infrastructure for Visualizing and Interacting with Dynamic Information Spaces. http://www.gigascale.org/diva/.
Rekimoto, J. (1997). Pick-and-Drop: A Direct Manipulation Technique for Multiple Computer Environments. Proceedings of UIST'97: 31-39.
Rettig, M. (1994). "Prototyping for Tiny Fingers." Communications of the ACM 37(4): 21-27.
174
Salber, D. and J. Coutaz (1993). A Wizard of Oz Platform for the Study of Multimodal Systems. Proceedings of ACM INTERCHI'93 Conference on Human Factors in Computing Systems -- Adjunct Proceedings: 95-96.
SIGMM (2003). ACM Multimedia SIG http://www.acm.org/sigmm/.
Sinha, A., M. Shilman, et al. (2001). MultiPoint: A Case Study of Multimodal Interaction for Building Presentations. Proceedings of CHI2001: Extended Abstracts: 431-432.
Sinha, A. K., S. R. Klemmer, et al. (2002). "Embarking on Spoken Language NL Interface Design." International Journal of Speech Technology: Special Issue on Natural Language Interfaces 5(2): 159-169.
Sinha, A. K. and J. A. Landay (2001). Visually Prototyping Perceptual User Interfaces Through Multimodal Storyboarding. IEEE Workshop on Perceptive User Interfaces: PUI 2001. Orlando, FL: 101-104.
Sinha, A. K. and J. A. Landay (2002). Embarking on Multimodal Interface Design. International Conference on Multimodal Interfaces. Pittsburgh, PA: 355-360.
Smith, D. C., A. Cyper, et al. (2000). "Programming by Example: Novice Programming Comes of Age." Communications of the ACM 43(3): 75-81.
Software, J. (2003). PaintShopPro http://www.jasc.com/products/paintshoppro/.
Solidworks (2003). Solidworks 3D CAD Software http://www.solidworks.com/.
SRI (2003). Open Agent Architecture http://www.openagent.com/.
Sun (1999). Java JDK 1.1.7, http://java.sun.com/products/jdk/1.1/.
Sun (2002). Java JDK 1.4, http://java.sun.com/j2se/1.4/.
Sutherland, I. E. (1963). SketchPad: A Man-Machine Graphical Communication System. AFIPS Spring Joint Computer Conference. 23: 329-346.
Thomas, F. (1981). Disney Animation: The Illusion of Life / Frank Thomas and Ollie Johnston. New York, Abbeville Press.
Toshiba (2003). Toshiba Portege 3500 Tablet PC, http://www.tabletpc.toshiba.com/.
175
W3C (2000). Multimodal Requirements for Voice Markup Languages http://www.w3.org/TR/multimodal-reqs.
Wagner, A. (1990). Prototyping: A Day in the Life of an Interface Designer. The Art of Human-Computer Interface Design. B. Laurel. Reading, MA, Addison-Wesley: 79-84.
Wang (1988). Wang Corporation Freestyle tablet computer.
Wolf, C. G., J. R. Rhyne, et al. (1989). The Paper-Like Interface. Proceedings of the Third International Conference on Human-Computer Interaction: 494-501.
Wong, Y. Y. (1992). Rough and Ready Prototypes: Lessons From Graphic Design. Human Factors in Computing Systems. Monterey, CA: 83-84.
Yankelovich, N. and J. Lai (1998). Designing Speech User Interfaces. Proceedings of ACM CHI 98 Conference on Human Factors in Computing Systems (Summary). 2: 131-132.
Yankelovich, N., G.-A. Levow, et al. (1995). Designing SpeechActs: Issues in Speech User Interfaces. Proceedings of ACM CHI'95 Conference on Human Factors in Computing Systems. 1: 369-376.
Yankelovich, N. and C. D. McLain (1996). Office Monitor. Proceedings of ACM CHI 96 Conference on Human Factors in Computing Systems. 2: 173-174.