VERBAL CONSTRAINT MANAGEMENT FOR SHAPE CONCEPTUALIZATION

12
Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 261 VERBAL CONSTRAINT MANAGEMENT FOR SHAPE CONCEPTUALIZATION Jouke Verlinden Faculty of Design, Engineering, and Production Delft University of Technology [email protected] Imre Horváth György Kuczogi Faculty of Design, Engineering, and Production Delft University of Technology {i.horvath,g.kuczogi}@io.tudelft.nl ABSTRACT Studies on natural communication in shape modeling show that both direct manipulation of geometry and the expression of constraints are essential. Current solutions on constraint-based modeling do not support the verbal management of constraints. This paper presents a scheme to recognize and manage geometry-related constraints verbally. The execution and interpretation of constraint management actions is done based on a constraint library and a focus stack. As there are implicit and explicit relationships between the constraints and the geometry, special attention is given to propagate changes among them. A 2D Java implementation was made that supports a limited set of the proposed functionality. The proposed scheme seems capable of dealing with dynamism in geometric constraints, a shifting focus in discourse, and the possibility to use multimodal input. Its use is not limited to single- user situations. In collaborative design, similar notions in the verbal and multimodal dialogue apply. KEYWORDS Natural input, design support, geometric constraints, computational linguistics. 1. INTRODUCTION New approaches in geometric modeling and interface technology improve the way of communicating significantly. As (Chu et al., 1997) already demonstrate, combinations of speech and gesture interfaces empower the designer in expressing what he wants. Studies on natural communication in shape modeling (see figure below and section 1.1) show that both direct manipulation of the geometry and the expression of constraints are essential. Characteristics of expressing constraints include: Geometric constraints are not just cumulatively added – often existing ones are addressed and altered, while the removal of constraints also occurs in the course of a design session. The designer’s utterances have a certain focus (a particular element, the global shape, a constraint). For interpreting these utterances, the discourse state (state of the dialogue) should be considered. Although the driver of expressing constraints is speech, other modalities are often used to present additional information. Multimodal expressions should be considered. Current solutions on constraint-based modeling do not support the verbal management of constraints. Figure 1. Motion sequence from clay-based Wizard-of-Oz Experiment (Verlinden et al., 2001)

Transcript of VERBAL CONSTRAINT MANAGEMENT FOR SHAPE CONCEPTUALIZATION

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 261

VERBAL CONSTRAINT MANAGEMENT FOR SHAPE CONCEPTUALIZATION

Jouke Verlinden Faculty of Design, Engineering, and Production

Delft University of Technology [email protected]

Imre Horváth György Kuczogi

Faculty of Design, Engineering, and Production Delft University of Technology

{i.horvath,g.kuczogi}@io.tudelft.nl

ABSTRACT Studies on natural communication in shape modeling show that both direct manipulation of geometry and the expression of constraints are essential. Current solutions on constraint-based modeling do not support the verbal management of constraints.

This paper presents a scheme to recognize and manage geometry-related constraints verbally. The execution and interpretation of constraint management actions is done based on a constraint library and a focus stack. As there are implicit and explicit relationships between the constraints and the geometry, special attention is given to propagate changes among them.

A 2D Java implementation was made that supports a limited set of the proposed functionality. The proposed scheme seems capable of dealing with dynamism in geometric constraints, a shifting focus in discourse, and the possibility to use multimodal input. Its use is not limited to single-user situations. In collaborative design, similar notions in the verbal and multimodal dialogue apply.

KEYWORDS Natural input, design support, geometric constraints, computational linguistics.

1. INTRODUCTION New approaches in geometric modeling and interface technology improve the way of communicating significantly. As (Chu et al., 1997) already demonstrate, combinations of speech and gesture interfaces empower the designer in expressing what he wants. Studies on natural communication in shape modeling (see figure below and section 1.1) show that both direct manipulation of the geometry and the expression of constraints are essential. Characteristics of expressing constraints include:

• Geometric constraints are not just cumulatively added – often existing ones are addressed and altered, while the removal of constraints also occurs in the course of a design session.

• The designer’s utterances have a certain focus (a particular element, the global shape, a constraint). For interpreting these utterances, the discourse state (state of the dialogue) should be considered.

• Although the driver of expressing constraints is speech, other modalities are often used to present additional information. Multimodal expressions should be considered.

Current solutions on constraint-based modeling do not support the verbal management of constraints.

Figure 1. Motion sequence from clay-based Wizard-of-Oz Experiment (Verlinden et al., 2001)

262 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

This paper presents a scheme to recognize and manage geometry-related constraints verbally. It enables constraint management in a natural dialogue, which includes speech input. A prototype has been implemented to explore some of the issues.

1.1. Natural interfaces and constraints

We assume that natural modeling systems, which employ multiple modalities and new model representations, allow a better fit between modeling task and mental concepts. Examples of such capabilities are found in traditional sketching and human-to-human communication on design. In Wizard-of-Oz experiments described in Verlinden et al. (2001), a natural design systems was prototyped by human clayperson. The designer gestured and spoke while the clayperson modeled the geometry with clay. Both existing shapes (bird’s nest, puppet, phone horn, water cooker) and a design assignment (drinking package for teenagers) were asked to model. The utterances (both voice and gesture) were captured and analyzed. It was found that a lot of modeling actions were actually specifications of shape relationships, which were altered later. Figure 1 shows some stills of one session. The subject tries to model a phone horn by creating a cylinder and fit two caps at the ends.

In a natural interface, the target is not to disambiguate all vagueness – the most appropriate expression of the designer’s beliefs regarding the models should be captured; constraints seem an excellent representation of such notions. They demarcate the design space by specifying limitations and dependencies between design variables. In this research, geometric constraints are main focus of discussion.

A large variety of constraint satisfaction techniques have been used in geometric modeling (Dohmen, 1995). In general, these constraints can be specified as a set of equations, involving a number of variables that are related to others through functions. Constraint resolution algorithms can be employed to find correct values for the involved variables. In the case of conflicting constraints, a overconstrained situation occurs – user intervention will be necessary to harmonize the constraints. When multiple values are possible for

the variables, the designer can change these in the design system directly. However, a careful selection of the constraint treatment is crucial. Three types of vagueness can be identified in the specification of constraints, namely 1) the variables, 2) the types of constraint, and 3) additional descriptors. The identification of which element in the modeling scene is constrained, is subject to changes in the geometry. In design sessions, constraint types will be exchanged by others. The parameters governing the constrained elements might become more precise or wider, depending on the importance.

2. BACKGROUNDS AND RELATED RESEARCH

2.1. Multimodal input processing In contrast to keyboard and mouse input, multimodal systems use devices that are inaccurate and depend strongly on the user's characteristics. In order to accommodate these inputs, special algorithms are used to process and interpret the data. The flow of recognizing multimodal input is shown in figure 2.

In the literature, the following techniques are described. Feature abstraction is a Artificial

Feature abstraction

GestureParser/Classifier

Gestlet abstraction

Raw hand data

Feature vector

Gestlets

Gesture Hypothesis

Feature abstraction

PhraseParser/Classifier

Phrase abstraction

Raw speech data

Feature vector

phrases

phrase Hypothesis('utterance')

Gesture recognition Speech recognition

Figure 2. Generic algorithms of two-handed gesture and speech recognition (Koons et al., 1993)).

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 263

Intelligence-based statistical process that converts data to a parameterized data point, e.g. intonation or pitch in speech recognition. Two known principles are used here, the Hidden Markov Model or a layered Neural Network (sometimes a combination of the two). With pattern matching techniques, these feature vectors are converted in multiple interpretations, called 'phrases' for speech and 'gestlets' for gesture recognition. One of the key functionalities of a multimodal system is the merging of different input devices. This 'fusion' process converts input data into multimodal events that are then propagated trough the system. This can happen on low (signal)-level to semantic levels, the latter one is preferred when speech is included (Bernoit et al., to appear). A typical example is (Oviatt, 1999) employing typed feature structures. Lower level fusion schemes include Partial Action Frames (Vo and Waibel, 1997), Melting Pots (Coutaz and Nigay, 1995), and Time Delay Neural Networks (Waibel et al., 1995). Often, not all utterances of the user fully specify the information necessary to execute a modeling action. Reasons for this are: 1) misunderstanding, in which the recognition engines fail to extract the information, 2) tacit knowledge, not explicitly communicated by the user. Furthermore, the user often uses qualitative terms to specify the value of an attribute – a subdialogue might be necessary to pinpoint a desired (exact) instance. In the next chapter, the problems of missing operands and the interpretation of vague values are explored.

Although there exist many techniques that deal with multimodal input, they typically lack application-dependent knowledge that might be employed to resolve incomplete input. Geometric design includes a large number of conventions and common sense reasoning to interpret verbal information into spatial constructs which could be included for this purpose. Furthermore, most multimodal systems only recognize a limited set of operations, whereas a wide variety of constructs are used to express constraints. In complex applications like modeling, there are dependencies among the actions. This complexity requires a better support, dealing with semantics and dialogue management.

This research is expanding verbal and multimodal interpretation for constraints. It is part of the architecture described in Kuczogi et al. (2000), which is focusing on a generic architecture to deal

with natural interaction during shape conceptualization. Although in its present operation this module only interprets verbal input, extensions are planned to support 3D position and gestures.

2.2. Constraint based modeling and spatial reasoning

Main focus of this paper is the capturing of geometric constraints. For this, reasoning with spatial constructs is crucial, including topological (e.g. distance, touching, inside) and morphological, (e.g. angular or positional alignment) constructs. This might be extended to others (including kinematic and behavioral). For interpreting communication on constraints, commonsense reasoning needs to cover spatial constructs. In surveying knowledge on spatial reasoning and geometric reasoning, the following classification seems valid: A) interpretation of geometry to a symbolic representation, B) symbolic representation schemes, C) spatial reasoning, D) generation (or adaptation) of geometry based on the symbolic representation.

The transition from geometry to symbolic representation has quite a large body of knowledge in computer vision and in (dis)assembly analysis (e.g. (Wilson and Latombe, 1994)) Although the symbolic representation and reasoning are intimately bound, the representation merely focuses on the accuracy of the model, while reasoning deals with its application. In the literature, only a small number of representations are presented, most of them are based on Allen's Interval Algebra (1984). Table 1 summarizes basic theories for this purpose, schemes for topology,

Geometry B. Symbolicrepresentation

D. generation

C. reasoning

A. interpretation

Figure 3. Relationships between Spatial Reasoning

and Geometry

264 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

distance, and shape. For orientation, standard notations include qualifications in orthogonal directions (front-back, top-bottom, left-right). The frame of reference defines the context: intrinsic, extrinsic, and deictic frames exist (Hernandez 1994).

The generation of geometry based on symbolic constructs is found in the field of declarative modeling. Desmontils (2000) demonstrates natural language support in scene generation. The method uses fuzzy sets to translate linguistic constructs to values.

2.3. Computational Linguistics and Conceptual Design

Computational linguistics has successfully studied linguistic communication between two (and more) humans and extended computerized natural language processing to cope with semantics and pragmatics of verbal communication. In order to understand the latter, this field distinguishes the notions of intention, desire and beliefs (Grosz and Sidner 1990), see Table 2. These notions describe the key drivers in communicating about actions between humans. According to the literature,

intentions are based on the desires, beliefs and prior intentions. Belief, desire, and intention cannot be directly observed. However, it can be employed to model mental states of human beings or be implemented in software agents. Guessing/pattern recognizing algorithms are used to interpret human communications, after which the assistant can initiate a dialogue about the intentions and desires. This engagement is often called 'mixed initiative', denoting the capacity of

Table 2. The BDI structure used in computation linguistics.

Entity Description

Belief Personal beliefs that are considered in reasoning. Some beliefs are more certain than others (e.g. "flat-out" beliefs are considered true).

Desire A goal, can be a short-term or long-term. A collection of desires exists concurrently.

Intention Actions, part of a recipe or plan (as in AI planning) to reach a goal. One "intents to" perform these actions.

Table 1. Summarization of spatial reasoning theories.

Topology (between objects) Distance/size Shape

Hernandez (1994), based on Egenhofer(1989):

Disjoint (d), Tangent (t), Overlaps (o), Included-at-border (i@b), Included (i), Contains-at-border (c@b), Contains (c), Equal (=)

Coenen et al. (1998):

Tesseral mapping from n-dimensional to 1 dimensional spaces.

Dickinson (1991): cross section shape, axis shape, cross section sweep

Damski&Gero(1998):

Disjoint, Touch (face, vertex, edge), Overlap, Inside, Contain, Equal

Interval algebra (adapted by Mukerjee and Joe (1990)): min/max multiplication factor ("integer-multiple vector operator") (n1,n2):

n1*b<a<n2*b

Högg and Schwarzer(1991): 3D objects are made by a spanning surface between a top and bottom shape

Mukerjee (1998):

Contact formations (CF) and CF maps - captures the movement envelope of two touching faces.

Partial ordering of sizes/distances, e.g. Mavrovouniotis and stephanopoulus (1988)

Agrawal et al., (1995): QMAT vague, voronoi based shape descriptions (Medial Axis Transform)

Potential models (Mukerjee 1998)

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 265

the system to be responsive to a particular situation and to take the initiative to resolve problems - or to teach/give advice- when necessary (Horvitz, 1999).

The collaborative discourse theory includes mental attitudes, plans, and beliefs - some are shared, some might be individual. These are then used to describe and understand complex decision making conversations. This has resulted in a number of implementations, of which the COLLAGEN collaborative agent platform is well known (Rich and Sidner, 1997). The notions in computational linguistics seem worthwhile to apply to the combination of multimodal dialogue and geometric modeling. In this case, main focus is the geometric shape of an artifact (although this represents only a small part of a product). In translating the basic concepts from discourse analysis to the modeling domain, we can use the following: Table 3. BDI in modeling.

Entity Translation to modeling

Example

Belief Design constraints Relationships, properties

Desire Goal model (shape)

Entities, attributes

Intention Plan of modeling actions

Macros

In this context, desires typically represent goal states of the shape. Desire is easily translated into the goal model, a shape that represents a certain configuration. This includes entities and its attributes. A desire is the driver to create plans. As in the general field of communications, a multitude of desires might exist in designing. Intentions refer to intended modeling actions. Some modeling actions directly represent the alteration of a property of the model, others might embody more complex transformations or changes in the modeling context. Beliefs relate to the knowledge that someone uses while devising strategies to reach a desired state. In the generic description of modeling, this knowledge captures the relationships and its properties that might exist between the entities, identical to the definition of constraints mentioned earlier. Spatial reasoning constructs, of the previous section, can express design constraints when geometric modeling is concerned.

Resolution of constraint values and alterations of the constraints can only be interpreted when desires and intentions are considered. This constitutes the pragmatics of the communication, in particular the dialogue and turn taking protocols. Computational linguistics will be used in the solution presented in the following sections.

3. SYSTEMATIC SOLUTION As stated in the introduction, a framework that employs discourse processing is necessary to support incomplete and vague input during modeling,. Focusing on treating constraint-based discourse, the following functionality is required:

1. Checking input for constraint-related utterances. The user’s input might contain one or several utterances that deal with constraints. This function will overlap with the treatment of natural language for modeling. Furthermore, as the envisioned system captures multiple modalities, gestural and other input should also be considered. This is especially the case for resolving the parameters within constraints.

2. Resolving entities, variables, and constraint types (both automatic and interactive). In gathering the information to execute a constraint-management action, the arguments might be unrecognizable, or even non-existing in the present input. Various reasoning methods could be used for resolution, and when necessary the designer should be consulted.

3. Managing constraint pool. Based on the dialogue, the collection of active constraints is kept in a constraint pool, which is accessible for external modules. The pool represents the currently valid constraints, which will be applied to the geometric model and represent the basic ‘beliefs’ of the designer. When overconstrained situations occur, the constraint pool manager needs to resolve these (with user intervention).

4. Applying constraints to the model. The constraint pool is consulted whenever geometry is altered. Special algorithms are required, as the performance should not slow down the interaction. Furthermore, the modeling engine might use alternative

266 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

representations of the shape, which might complicate the application of constraints.

5. Presenting constraints to the user. Although the user might worry about a limited set of constraints concurrently, an overview of all active constraints is necessary. This allows explicit identification and modification of the relationships that exist between the objects.

The required functionality leads to the architecture displayed in Figure 4.

The Natural Language Processing module transforms the user’s input signals into commands for the subsequent engines. This model has been extensively researched by (Kuczogi et al., 2000). An important characteristic of the natural language processor is the use of a dynamic grammar/lexicon, which adapts to the current context. The particular constraint spotting functions are described in section 3.1. The Modeling Engine manages and presents the geometric model. In this research project, the engine will be based on a Vague Representation, allowing variance in the model (Rusak et al., 2000). The representation is based on vague point clouds, which can be combined to specify a shape. Constraints will be established between symbolic (possibly not directly mappable) regions. Evaluation functions are required to map these properties to the geometry.

The Constraint Maintenance module bridges the constraint pool and the Modeling Engine. This process is responsible for applying the constraints to the geometric model, both when constraints alter and when the geometry is directly manipulated. Although its primary goal is the enforcement of the existing constraints, it might contain assistance in eliciting constraint management actions from

direct manipulation – this will be revisited in the future research section.

The Constraint Discourse Manager is responsible for the interpretation and execution of constraint-related actions. It uses a Constraint Library to update the dynamic grammar of the Natural Language Processing module, it keeps track of the conversation, and stores the constraints in the constraint pool. The constraint library links user utterances to constraint types and their management actions. It also specifies the entities and attributes that might be constrained (see section 3.2). To keep track of the dialogue on constraints, the Discourse State stores the interaction history. This history determines the current focus (which might be recursive) and is traversed when actions and arguments are resolved. The focus can be explicitly changed, as computational linguistics theories show (‘Let’s stop talking about this’, ‘to come back to the previous issue ...’).

Constraint presentation displays the current constraints. In its primitive state, this is a simple list with some management functions. More advanced visualizations include constraint networks, integrated geometric and constraint renderings, and haptic or tactile feedback. The presentation module can also be used to display the discourse state. Design assistance as described earlier might also be based on this module.

The following sections will present some of the details.

3.1. Constraint spotting Four types of discourse management actions have been identified: • Addition of new constraints (“Put legs on top

of”, “cherry is on the tree”). These are typically recognized by combinations of verbs and indications of a location.

• Change of an existing constraint (“less”, “more”). Based on the discourse focus, an utterance might be recognized as an alteration of a constraint. Each constraint type can have its own modification actions and own lexicon to interpret these from the utterances. This is specified in the constraint library (next section).

• Removal of a constraint (“undo”, “forget “). A constraint might be discarded, identified by a

Natural LanguageProcessing

ConstraintDiscourse Mngt.

Modeling Engine

ConstraintMaintenance

ConstraintPresentation

ConstraintPool

DiscourseState

ConstraintLibrary

Figure 4. Overall architecture of the modeling system.

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 267

reference in the utterance or by the current discourse state.

• Discourse focus alteration (“Okay”). As mentioned in the previous section, some utterances influence the current focus of discussion.

Constraint spotting is performed by the Natural Language Processor (Kuzcogi et al., 2000). In short, it uses a thesaurus to match phrases with actions. The thesaurus contains linguistic patterns, which are dynamically updated based on the current state. In its operation, it first extracts the action type from the user utterance. When the action is determined, the system generates an action instance. This data structure is similar to melting pots (Nigay and Coutaz, 1997) and typed features (Oviatt, 1999). It includes an operand list, which specifies the type, level of necessity and possibly the operand syntax (assumed order of communication units in the utterance). This information is used to fill the operand list of an action instance as much as possible, by detecting values and references to existing entities. When an action instance is complete, i.e. when all mandatory operands are filled, it is forwarded to the Modeling Engine or the Constraint Discourse Manager.

Based on the discourse state, the Constraint Discourse Manager dynamically alters the thesaurus of the Natural Language Processor. This is especially important to interpret alterations of existing constraints.

3.2. Resolution of arguments To resolve incomplete input, a number of information sources will be consulted, including the constraint library, the discourse state and the existing geometry. The exact mechanisms will be determined later, yet will include a considerable

amount of spatial and geometric reasoning as mentioned in section 2.2.

The Constraint Library, an object oriented database, relates constraint management actions to constraint types, entities and attributes. The basic model is shown in Figure 5. The constraint type includes a number of recognition rules for use in the thesaurus of the Natural Language Processor. The constraint type can be related to others to establish a hierarchy. This hierarchic structure allows further specification or relaxation of constraints. For example, an overlapping constraint might be altered to a touching one, or an alignment constraint might be further refined by a left-alignment.

Each constraint type has its set of management actions, called constraint actions. Again, these include recognition rules to spot them in the user’s input. The constraint type hierarchy also establishes inheritance of actions. The entity types and attributes in the constraint library are used during natural language processing, each providing a number of recognition rules and possibly interactive selection/picking methods. A constraint relates a number of entity-attributes, each of those play a (possibly parameterized) roles. An explicit representation of the role enables the possibility to express n-ary constraints, while the role class itself might also include procedures to resolve missing information (e.g. object picking might differ for source and target in a binary constraint).

3.3. Discourse state The current discourse focus is an important aspect in interpreting the user utterances and executing the constraint management actions. For example, when a ‘remove’ action is given, which constraint (or group of constraints) should be deleted? Often in computational linguistics, the notion of a focus stack is used – allowing a recursive dialogue in

Constraint *utterance

Constraint action *utterance

Role

Attributes *utterance

Entity type * utterance parent

has has

uses

has parent

Figure 5. Basic library structure (UML notation).

268 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

which new topics are ‘pushed’ and ‘popped’ from the stack. A similar approach will be used here, in which the stack refers to constraints.

The focus stack can be explicitly addressed by the user (see section 3.1) or indirectly affected by constraint management (i.e. when a constraint is deleted, its occurrences should also be withdrawn from the stack).

Additional information might be necessary to capture the discourse state. Certainly, the action history will play a role in retrieving the operands, as these might be reused in future interpretation actions.

3.4. Integration with modeling engine Although a great number of constraint solving algorithms exist (Dohmen, 1995), the identification of the variables in the geometry adds complexity. For example, the constraint ‘the cat is on the mat’ includes the variables ‘bottom of cat’ and ‘top of mat’, which are then constrained by an equation. In general, these variables are mapped to a set of vertices through a spatial reasoning function SR(object, variable), in which the variable

identifies the referred feature. Then, a constraint contains at least two of those functions and a set of additional descriptors.

As was said in the backgrounds section, intrinsic, extrinsic and deictic frames of references can be applied while mapping the features: • Intrinsic - when the orientation is given by

some inherent property of the reference object. This might be the direction of motion or use, the side containing perceptual apparatus, the side characteristically oriented towards the observer, the symmetry of the objects.

• Extrinsic - external factors that impose a frame of reference to the object. This includes the accessibility of the reference object, its motion, other objects in its vicinity and earth gravitation.

• Deictic - when the orientation is defined by a point within the scene from which the reference object is seen.

In the example above, ‘bottom of cat’ can be resolved by applying an intrinsic frame, while the ‘top of mat’ by applying an extrinsic frame (the face that points upward in world coordinates).

Figure 6. Main screen of the implementation, with the drawing canvas on the right and

natural language input at the bottom left..

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 269

To maintain the constraint during geometric manipulation, the spatial reasoning function is required to be bi-directional. This ensures the interpretation of the constraint when objects are repositioned. Extended treatment is necessary when the geometry itself is altered, which effects in a re-evaluation of the mapping function – in some cases, the feature cannot be retrieved after alteration. User intervention is then required to refine the existing constraints. As geometric constraints relate two (or more) regions of objects, constraint maintenance requires n to m mappings. Simplification schemes are required to cope with this combinatorial explosion.

4. IMPLEMENTATION A first prototype was implemented in Java, accessible at (Verlinden, 2002). Main focus was to test a simple constraint discourse manager that supported natural language processing, discourse tracking and operand resolution. It is based on an existing constraint drawing application (Noth, 2001), a 2D drawing package in which alignment and adjacency constraints can be introduced. The built-in constraint satisfaction module, Cassowary (Badros and Borning, 1998) allows dynamic constraint management, optimized for performance and bi-directional constraints. A snapshot of the adapted application is shown in Figure 6.

In its original operation, constraints are added by selecting a number of source objects, activating a constraint button, and then select one target object. The rendered scene allows direct manipulation of the control points of the objects. The constraints are visualized by colored lines between the objects and are enforced while the objects are moved (or resized) in the scene.

The extensions regard a textual (command-based) interface, located at the left bottom of the screen. This supports simple natural language input and subdialogues if not all arguments are understood. For this, a rudimentary constraint library was created containing adjacency constraints (see Figure 7). All entries contain some keywords that are used in interpreting the user’s input. For example, the LeftOf constraint can deal with the utterances “more” and “less” (which modify the minimum distance in the adjacency). Similarly, the Below constraint supports “higher”, “lower”, “more”, and “less”.

A rudimentary grammar allows the identification of source and targets. The discourse manager then tries to map these to the current scene, first by tracing all entities are referred by the user in the text. When this fails, the manager will try to use the currently selected object. If no such exists, the manager will engage a subdialogue to ask the user to provide the missing information by selecting the elements in the canvas.

The implementation also captures of the discourse state. New constraints are added on a focus stack. When natural language input is analyzed, this stack is used to interpret possible constraint actions. The constraint discourse manager traverses it until an action has been recognized. Table 4 Use-case description of an example dialogue. User input System response S0 Initial state: 3 lines are

drawn, 1 constraint exists (a Below constraint), which is also on top of the focus stack.

S1 Utterance: “create a circle”

- creation mode is set on, type “circle”. - cursor is set to creation mode.

S2 Clicks two points in the drawing canvas with mouse

- a circle is created with specified center and radius.

S3 Utterance: “The circle is left of the line”

- circle object is selected as source. - select target mode is set on, type LeftOf constraint. - LeftOf constraint is added to the focus stack - user is prompted to select a destination (there are 3 lines to select from).

S4 Clicks a line object in the drawing canvas

- alignment constraint is added to the pool.

S5 Utterance: “more” - interpreted as a modification of the LeftOf constraint, increasing the minimum distance between the objects.

S6 Utterance: “higher” - interpreted as a modification of the Below constraint, increasing its distance.

For example, when first an Above and then a LeftOf constraint are added, the utterance “more” will be executed in the context of the LeftOf constraint, while the utterance “higher” will be forwarded to the Above constraint (see Table 4).

270 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

Some support has been given to change the contents of the focus, e.g. a constraint can be removed from the stack by including the keyword “okay” in an utterance.

The presentation of the constraints is done in a separate window (see Figure 8). This lists displays both the constraint focus stack and the complete constraint pool. Some interactive functions are supported: constraints can be deleted, and by double-clicking a constraint in the constraint pool, it is added to the focus stack.

5. CONCLUSIONS AND FUTURE WORK

5.1. Conclusions A scheme to recognize and manage geometry-related constraints with natural language was proposed. The execution and interpretation of constraint management actions is done based on a constraint library and a focus stack. As there are implicit and explicit relationships between constraints and geometry, special attention is given to propagate changes among them.

A 2D Java implementation was made that supports partly the functionality. Specifically, it explored

the discourse status and the constraint library. Within its limitations, it demonstrates the use of the focus stack and the added value of using verbal input and direct manipulation simultaneously.

The proposed scheme seems capable of dealing with the issues mentioned in the introduction, namely dynamism in geometric constraints, a shifting focus in discourse, and the possibility to use multimodal input. Of course, its use is not limited to single-design situations. In collaborative design, similar notions in the verbal and multimodal dialogue apply. Theories that were included from computation linguistics offer a wide range of techniques to extend this system with shared goal setting and initiative management, for example SharedPlans from (Grosz and Krauss, 1990).

In theory, the discourse management and constraint interpretation can be applied to a wider variety of design constraints. However, the extension of spatial reasoning to kinematic/material/ergonomics aspects is complicated. A useful paradigm could be the Physically Coupled Pairs concept, presented in (Jambak, 2002).

5.2. Future work In the short term, the proposed verbal constraint manager will be integrated with the speech interface and 3D vague representation engine described in (Kuczogi et al., 2002) and (Rusak, et al., 2000), using Open Inventor and C++ as the core implementation platform. The future implementation will have to deal with 3D geometric constraints and requires treatment of the identification of attributes as mentioned in section 3.4. In expanding this geometric constraint management, a human-centered design approach will be combined with software development research. A number of experiments will be used to evaluate and validate the proposed framework.

Long term goal of this research project is not just the support of natural language in design, but the augmentation of design systems with suggestive assistance (Igarashi and Hughes, 2001). A good example on assistance in constraints can be found in the CoDraw system (Gross, 1992). There, constraints are suggested and re-evaluated when objects are manipulated. For example when one object is moved inside another, a ‘containment’

Constraintremove

LeftOf

Adjacency Constraintmore, less

RightOf Belowlower, higher

Abovehigher, lower

Figure 7. Part of the implemented constraint library

(management actions in italics).

Figure 8. Discourse status, displaying both the current

focus (left) and the list of all constraints.

Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China 271

constraint might be suggested, or when one object is moved that has a ‘adjacent’ constraint to another entity, it might be suggested 1) to drop this constraint, 2) to resize the other object, 3) to move the other object. In order to pursue this type of assistance, the situated knowledge paradigm will be used (Clancey, 1997).

ACKNOWLEDGEMENTS This research has been performed as part of the Integrated Concept Advancement (ICA) project, at the Delft University of Technology.

REFERENCES Agrawal, R.B., Mukerjee, A., Deb, K. (1995)

"Modeling of Inexact 2-3 Shapes using Real-coded Genetic Algorithms" in Proceedings of the Symposium on Genetic Algorithms, 1995, Dehradun, India, p 41-49.

Allen, J.F. (1984) "Towards a general theory of action and time", Artificial Intelligence 23, pp. 123-154.

Badros, G.J., Borning, A. (1998) "The Cassowary Linear Arithmetic Constraint Solving Algorithm: Interface and Implementation", Technical Report UW-CSE-98-06-04 , University of Washington, Computer Science and Engineering.

Benoit, C., Martin, J. , Pelachaud, C., Martin, J., Schomaker, L. and Suhm, B. (to appear). Audio-visual and Multimodal Speech Systems. Draft of Chapter on Multimodal Systes (EAGLES Project), to appear.

Chu, C.-C., Dani, T.H. & Gadh, R. (1997) “Multisensory Interface for a Virtual Reality Based Computer Aided Design System”, Computer-Aided Design, 29 (10), 709-725.

Clancey, W.J. (1997) “Situated cognition: on human knowledge and computer representations”, Cambridge University Press, ISBN 0-521-44400-4.

Coenen, F.P., Beattie, B., Bench-Capon, T.J.M., Diaz, B.M. and Shave. M.J.R. (1998) “Spatio-Temporal Reasoning Using A Multi-Dimensional Tesseral Representation” Proceedings ECAI'98, John Wiley & Sons, pp. 140-144.

Coutaz and Nigay, "a generic platform to adress the multimodal challenge", International conference on Computer Human Interaction, 1995, denver, pp. 172-178.

Damski, J., Gero, J. (1998) "Object Representation and Reasoning Using Halfspaces and Logic", in J.S. Gero and F.Sudweeks (eds.)Artificial Intelligence in Design, Kluwer, pp. 102-126.

Desmontils, E. (2000) “Expressing constraint statisfaction problems in declarative modeling using

natural language and fuzzy sets”, Computers&Graphics, 24, pp. 555-568.

Dickinson,S. (1995) “The recovery and recognition of three-dimensional objects using partbased aspect matching” Technical Report CAR-TR-572, Center for Automation Research, University of Maryland, 1991.

Dohmen, M. (1995) “A survey of constrain statisfaction techniques”, Computer&Graphics, vol 9(6), pp. 831-845.

Egenhofer, M.J. (1989) "A formal definition of binary topological relationships", in W.Litin and H.J.Scheck(eds.), proceedings third international conference on Foundations of Data Organisation and Algorithms (FODO), lecture notes in computer science 367 (springer-verlag NY), pp. 457-472.

Gross, M.D. (1992) “Graphical constraints in CoDraw” in Proceedings of the 1992 IEEE Workshop on Visual Languages, pp. 81—87.

Grosz, B., Sidner, C. (1990) "Plans for discourse", intentions in Communication, Cohen, Morgan, Pollack (Eds), Bradford Books, pp 417-444.

Hernández, D. (1994) "Qualitative Representation of Spatial Knowledge", ISBN 3-540-58058-1, Springer-Verlag (Berlin).

Högg, S., Schwarzer, I.(1991) “Composition of Spatial Relations” Forschungsberichte Künstliche Intelligenz, Institut für Informatik, Technische Universität München, Number FKI-163-91, December 1991.

Jambak, M.I., Horváth, I. (2002) “Towards a more Comprehensive Artifactural Modeling in Conceptual Design”, to appear in proceedings of Design 2002, Croatia.

Horváth, I., Z. Rusák, J.S.M. Vergeest, G. Kuczogi; (2000) “Vague modeling for conceptual design” In: Proceedings of TMCE 2000, Delft, p. 131-144,

Horvitz, E. (1999) "Principles of Mixed-Initiative User Interfaces", Proceedings of CHI'99, Pittsburgh (USA), pp-159-166.

Igarashi, T., Hughes, J.F. (2001) “A Suggestive Interface for 3D Drawing”, in Proceedings of UIST’2001.

Koons, D.B., Sparrell, C.J., Thorisson, K.R. "Integrating simultantious input from speech, gaze, and hand gestures" In M. Maybury, editor, Intelligent Multimedia Interfaces, 1993, Morgan Kauffman, pp 257-275.

Kuczogi, Gy., Horváth, I., Rusák, Z., Verlinden, J., Jansson, J., Vergeest, J.S.M. (2000) "Extracting Procedural And Contextual Elements Of Verbal

272 Proceedings of the TMCE 2002, April 22-26, 2002, Wuhan, China

Communication To Instruct A Shape Conceptualization System", In Proceedings of EDA, Florida.

Kuczogi, Gy., Horváth, I., Rusák, Z. (2002) "Verbal interface for vague discrete shape modeler", In Proceedings of TMCE 2002, pp. 663-674.

Mavrovouniotis, M. and Stephanopoulos, G.(1988) “Formal order-of-magnitude reasoning in process engineering", Computers and Chemical Engineering, 12, 1988, pp. 867-881.

Mukerjee, A., and Joe, G. (1990) “A qualitative model for space”, Proceedings of the AAAI-90, July 29-Aug 3, Boston, p.721-727.

Mukerjee, A. (1998) ”Neat vs scruffy: A survey of computational models for spatial expressions” In Olivier, P., and Gapp, K. P., eds., Computational Representation and Processing of Spatial Expressions.

Noth, M. (2001), http://www.cs.washington.edu/research/constraints/cda/run.html

Oviatt, S. "Mutual Disambiguation of Recognition Errors in a Multimodal Architecture" Proceedings of ACM CHI 1999, pp. 576-583.

Rich, C., Sidner, C. (1997) ”COLLAGEN: When agents collaborate with people” in Proceedings of the International Conference on Autonomous Agents (Agents'97), 1997.

Rusák, Z., Horváth, I., Vergeest, J.S.M., Kuczogi, G., Janson, J. (2000) "Discrete Domain Representation for Shape Conceptualization", Proceedings of EDA 2000, pp 228-233.

Verlinden, J., Wiegers, T., Vogelaar, T., Horváth, I., Vergeest, J.S.M. (2001) “Exploring Conceptual Design Using A Clay-Based Wizard Of Oz Technique”, Proceedings of HMS’01, Kassel.

Verlinden, J. (2002), http://dutoce.io.tudelft.nl/~jouke/constraints

Vo, M.T. and Waibel, A. "Modeling and Interpreting multimodal inputs: a semantic approach", Technical Report CMU-CS-97-192, Carnegie Mellon University December 1997.

Waibel, A., Vo, M., Duchnowski, P., Manke, S. "Multimodal Interfaces", Artificial Intelligence Review, Vol.10, No.3-4, 1995.

Wilson, R., Latombe, J. (1994) "Geometric Reasoning About Mechanical Assembly", Artificial Intelligence, Vol 71 (2), pp. 379-396.