Evolving Interface Designs to Minimize User Task Times as Simulated in a Cognitive Architecture

7
Abstract—We present a novel approach to User Interface optimization. A Genetic Algorithm is used to evolve an interface layout to minimize user task times. Solutions are evaluated using the cognitive architecture – Active Control of Thought – Rational (ACT-R) to simulate the human cognition and motor action required to complete the task. A development environment, TOISE, has been created to integrate the GALib toolkit with user task definition and ACT-R. Our approach is tested on a classic design problem – the telephone keypad - in comparison with a Local Search. Solutions produced are also compared to the classic Bell telephone keypad. Our results show that both GA and LS evolve competitive solutions to the problem, with GA significantly outperforming LS. One of the solutions produced by GA approximates the classic Bell keypad. We conclude that the combination of EC and cognitive modeling may offer a competitive and cost effective alternative to interface design approaches driven by human evaluation. I. INTRODUCTION Over the last decade, researchers have begun to introduce cognitive models into the design and evaluation process to predict user performance when interacting with a specific User Interface (UI) architecture [1][2]. This essentially means that early in the design process, cognitive models are used to predict user performance and the resulting evaluation data used for instance to guide changes to an interface layout. This paper introduces an approach using a genetic algorithm (GA) to evolve a UI to maximize user performance as evaluated by the cognitive architecture Active Control of Thought - Rational (ACT-R). The GA will create potential UI layouts and pass them to ACT-R to calculate their fitness. The process takes place in a purpose- built UI design environment called - Toolkit for Optimization of Interface Systems through Evolution (TOISE). The novelty of this combination is that the GA uses ACT- R as a human simulator. Conversely the GA replaces an expensive and incremental human-driven design approach with a relatively rapid and inexpensive meta-heuristic search of a much wider design space. For evaluation purposes, we compare GA with a local search algorithm and assess the performance of both approaches on a classic UI layout problem. Jean-Claude Golovine is a research student in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262575; e-mail: [email protected]). Patrik O’Brian Holt is Professor of Computing in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262708; fax: + 44(0)1224 262727; e-mail: [email protected]). John McCall is Professor of Computing in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262780; fax: + 44(0)1224 262727; e-mail: [email protected]). In 1960, Bell Systems wrote a technical journal regarding the possible development of the pushbutton telephone keypad [3]. In this extensive and pioneering study, they researched the design of such apparatus and what influences the layout had on user speed and accuracy in keying telephone numbers. Amongst other findings, this research gave the industry what we consider today to be the standard telephone keypad layout as shown in Figure 1. Bell found this to be the best key arrangement in terms of minimizing keying time. Fig. 1 Bell telephone keypad The objective of this paper is to demonstrate that GA using ACT-R is capable of finding competitive solutions to the telephone keypad layout problem in order to establish the potential of GA + ACT-R as a cost effective technique for application to UI design more generally. The rest of the paper is structured as follows. In section 2, we introduce the UI design problem, and present the TOISE system, along with an overview of the ACT-R cognitive architecture. We describe the GA in Section 3. Our experiments are detailed in Section 4 and the results in Section 5. Section 6 contains our conclusions and some directions for further research. We finish the paper by Section 7 where we discuss further some of the results and consequences that came to light during this experimental work. II. BACKGROUND A. Overview Nichols discusses the usability issues that arise in designing interfaces appliances such as computers, Personal Digital Assistants (PDA) or mobile phones. Personal Universal Controllers (PUC) are becoming increasingly used in our daily life, and devices such as TV remote controls are replacing the controls of the actual devices. Design effort to Evolving Interface Designs to Minimize User Task Times as Simulated in a Cognitive Architecture Jean-Claude Golovine, John McCall, Patrik O’Brian Holt

Transcript of Evolving Interface Designs to Minimize User Task Times as Simulated in a Cognitive Architecture

Abstract—We present a novel approach to User Interface optimization. A Genetic Algorithm is used to evolve an interface layout to minimize user task times. Solutions are evaluated using the cognitive architecture – Active Control of Thought – Rational (ACT-R) to simulate the human cognition and motor action required to complete the task. A development environment, TOISE, has been created to integrate the GALib toolkit with user task definition and ACT-R.

Our approach is tested on a classic design problem – the telephone keypad - in comparison with a Local Search. Solutions produced are also compared to the classic Bell telephone keypad.

Our results show that both GA and LS evolve competitive solutions to the problem, with GA significantly outperforming LS. One of the solutions produced by GA approximates the classic Bell keypad. We conclude that the combination of EC and cognitive modeling may offer a competitive and cost effective alternative to interface design approaches driven by human evaluation.

I. INTRODUCTION Over the last decade, researchers have begun to introduce

cognitive models into the design and evaluation process to predict user performance when interacting with a specific User Interface (UI) architecture [1][2]. This essentially means that early in the design process, cognitive models are used to predict user performance and the resulting evaluation data used for instance to guide changes to an interface layout.

This paper introduces an approach using a genetic algorithm (GA) to evolve a UI to maximize user performance as evaluated by the cognitive architecture Active Control of Thought - Rational (ACT-R). The GA will create potential UI layouts and pass them to ACT-R to calculate their fitness. The process takes place in a purpose-built UI design environment called - Toolkit for Optimization of Interface Systems through Evolution (TOISE).

The novelty of this combination is that the GA uses ACT-R as a human simulator. Conversely the GA replaces an expensive and incremental human-driven design approach with a relatively rapid and inexpensive meta-heuristic search of a much wider design space. For evaluation purposes, we compare GA with a local search algorithm and assess the performance of both approaches on a classic UI layout problem.

Jean-Claude Golovine is a research student in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262575; e-mail: [email protected]).

Patrik O’Brian Holt is Professor of Computing in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262708; fax: + 44(0)1224 262727; e-mail: [email protected]).

John McCall is Professor of Computing in the IDEAS Research Institute at The Robert Gordon University, St Andrews Street, Aberdeen, AB25 1HG, Scotland, UK. (Phone: + 44 (0)1224 262780; fax: + 44(0)1224 262727; e-mail: [email protected]).

In 1960, Bell Systems wrote a technical journal regarding the possible development of the pushbutton telephone keypad [3]. In this extensive and pioneering study, they researched the design of such apparatus and what influences the layout had on user speed and accuracy in keying telephone numbers. Amongst other findings, this research gave the industry what we consider today to be the standard telephone keypad layout as shown in Figure 1. Bell found this to be the best key arrangement in terms of minimizing keying time.

Fig. 1 Bell telephone keypad

The objective of this paper is to demonstrate that GA

using ACT-R is capable of finding competitive solutions to the telephone keypad layout problem in order to establish the potential of GA + ACT-R as a cost effective technique for application to UI design more generally.

The rest of the paper is structured as follows. In section 2, we introduce the UI design problem, and present the TOISE system, along with an overview of the ACT-R cognitive architecture. We describe the GA in Section 3. Our experiments are detailed in Section 4 and the results in Section 5. Section 6 contains our conclusions and some directions for further research. We finish the paper by Section 7 where we discuss further some of the results and consequences that came to light during this experimental work.

II. BACKGROUND A. Overview Nichols discusses the usability issues that arise in

designing interfaces appliances such as computers, Personal Digital Assistants (PDA) or mobile phones. Personal Universal Controllers (PUC) are becoming increasingly used in our daily life, and devices such as TV remote controls are replacing the controls of the actual devices. Design effort to

Evolving Interface Designs to Minimize User Task Times as Simulated in a Cognitive Architecture Jean-Claude Golovine, John McCall, Patrik O’Brian Holt

maximize usability for such devices is most often carried out on cheap flexible platforms such as a PDA simulation [4]. St Amant, Horton and Ritter researched ways to evaluate Cell Phone menu interaction using GOMS and ACT-R [5]. Raynal and Vigouroux used GA to optimize keyboard layouts using Fitt’s law (1) as part of their cost function [6].

The massive global market in mobile phones [7] has stimulated research into new ways to assess their usability. Vanja Kljajevic stipulated in his book “Cognitive Models as Usability Testing Tools” that “From a general ergonomic point of view, usability of a product is its fitness for purpose”.

In this paper, we define fitness of an interface in terms of latencies i.e. the time it takes a user to carry out a task. This concept has been the essence of much research in the area of UI usability [8]. All of this research translates user interactions into cognitive models to assess usability in terms of user task timings. Though not the principle focus, much of this research has used the mobile phone as a test application. Given the recent and rapid development of touch technology in phones, layouts that are customized to the needs of a particular user have become possible. In particular, it is feasible to design telephone keypad layouts based on the most frequent numbers dialed by a particular user.

Fig. 2 The user actions recorded using the TOISE recorder.

B. TOISE ACT-R runs in native LISP code and is notoriously hard

to interoperate with in other platforms. Our TOISE development system is an IDE that we developed to facilitate evo-cognitive interface design. It comprises: an event recorder module that compiles, as an XML model, both a description of a Java interface and the user interactions with that interface; a cognitive module which passes user tasks to ACT-R for evaluation and includes some ACT-R model debugging facilities; and a GA module based on the well-established toolkit GaLib [9]. Figure 6 shows a test application running with TOISE in the background. TOISE provides an environment where the user can change the interface components, define user tasks, specify GA parameters and configuration options and run experiments.

Once the test application is launched, the required user

tasks are then carried out by interacting with the components

and this is converted into a task list by the task recorder – see Figure 2. We create a task list by actually carrying out tasks.

One of the major issues with interfacing to ACT-R is the

notoriously slow speed of processing due to Lisp. This is further confounded when used in a system involving a GA that requires many evaluations, each of which involves a series of ACT-R calls. For this reason, a selectable option allows us to switch from actually using ACT-R to using a simulation of the macros that return the ACT-R timings. The simulation was tested over 100,000 random simulations involving a range of user actions and produces the same task timings as ACT-R (to 1ms accuracy).

When loading a task model, two sets of records are produced. First, the test application interface is interpreted as a series of shapes with properties such as name, position and size often deduced by going down the chain of Java class inheritance. Second, a record of all user interactions is generated and cross-referenced with the actual physical layout of the interface. These records give a complete representation of the test application and the user interactions.

The TOISE system displays in real-time a scaled symbolic representation of the layout that is being optimized and stores in files the best solutions, statistics and screenshots of best found layouts if this option is selected.

C. ACTIVE CONTROL OF THOUGHT

ACT-R [10] is a software simulation system, which represents a model of human cognitive architecture based on production systems. ACT-R developed from original research on memory, learning and problem solving carried out by Allen Newell and others who later went on to develop the State Operator And Result (SOAR) architecture [11][12].

The basic ACT-R system has been enhanced through the addition of several modules such as the visual and motor modules in ACT-R/PM. Although ACT-R is mainly used as a research platform in cognitive psychology and cognitive science, it has more recently found promising applications in Human Computer Interaction (HCI) research and is now increasingly used beyond laboratory tasks and emerges in a wide variety of complex, dynamic usage domains.

The human operation of complex systems in difficult environments, such as control of remote vehicles [13], can be explored through simulation if accurate human performance models can be developed to represent human cognitive load as well as motor and visual functions. The ACT-R Architecture provides such a simulated environment and presents means for a UI designer to identify and test whether or not a complex system meets operator needs. Important application areas include: the use of ACT-R to predict user behavior, e.g. in flight and driving simulations, [14]; gaming environments [15] and tele-operation of mining vehicles and systems [16]. In user interface design, ACT-R is used for: testing and automation of human performance evaluation [17]; assessing user interactions such as gestures [18]; tracking eye movements [19]; and evaluating mobile device user interaction [5].

This increase in research also paves the way to further enhance the use of ACT-R in the real world and we see the emergence of systems that automate the creation of ACT-R models. Hence, systems such as: GOMS [20]; KLM [21]; ACT-Simple [8]; and G2A2 [22] [9].

The task timings returned by ACT-R for a mouse move action have two components: cognition and predicted motor time. Cognition is the time to (re) focus attention on mouse targets. Motor time is based on Fitts’s law:

(1)

T = the time of the movement in seconds b = parameter based on the type of motor action D = the distance to the target W = the width of the target Therefore using ACT-R adds the cognition load to the

motor predicted time. ACT-R is integrated in platforms such as CogTool, which

simulates human interaction by predicting skilled user time. It sends a series of KLM commands to an ACT-R model and retrieves latencies for a scenario of actions. CogTool is often used to benchmark UIs thus helps developers in early design and assessment or to compare different UI designs [23].

III. INTERFACE DESIGN OPTIMISATION

A. GENETIC ALGORITHM (GA) 1) Methodology

In this section a definition of the user interface design problem is provided, along with a description of GA and Local Search methods.

2) Solution representation

A general 2-dimensional user interface design can be represented in terms of a set of rectangular components within an overall bounding box. To complete a particular task, the user will, in general, focus on a sequence of interface components as well as carrying out motor actions (mouse-move, mouse-click, drag and drop actions etc). Interface components are given location coordinates X, Y and can have any dimensions within the overall bounding space.

Much work has been done in optimization of layouts of diverse types using genetic algorithms. A number of approaches to solutions representation have emerged from this research. These include: Tree, Nodes, Slices, Arrays {1..N}, Hybrids and Binary [6] [24][25][26][27][28][29][30]

In this work, the interface component locations are given as integers by the user action recorder. However the GA internally uses paired floating-point representation for those locations. We assume that interface component dimensions have been previously specified in the design process. For a problem consisting of n components for a specific interface, each solution therefore is a length 2n integer array.

3) Solution evaluation

A solution is evaluated as

(2) Here A(X) is the task timing in seconds given by ACT-R,

O(X) is a penalty given for overlapping components

proportional to the area of overlap, and B(X) is a fixed user-defined penalty for exceeding the bounding box. Penalty coefficient weight can be controlled through the TOISE interface.

4) Mutation Genetic Operators

There are several mutation operators, the main purpose of which is to provide a variety of ways to make minor adjustments that either move or swap components and repair minor flaws such as overlapping. These operators were found empirically to be more effective in combination rather than any single mutation operator acting alone.

The operators used are: • XYSwap • Swap Component • Flip • Random Nudging • One Pixel Nudging • Constant Nudging • Random Block • One Pixel Block Shift • Constant Block Nudging

XYSwap swaps the X and Y genes of a selected component. Swap Component swaps pairs of XY genes between two selected components. Flip reflects the position of a component either vertically or horizontally around the centre of the bounding box. Random Nudging, adjusts the XY genes of a selected component to random values within the bounding box. One Pixel Nudging alters the XY genes of a randomly selected component by one pixel. Constant Nudging is similar to the One Pixel Nudging but alters by a random value within a range set as a user parameter.

5) The mutation process

The mutation process begins with random selection of the components to be mutated and the mutation operators to be applied. This is depicted in Figure 3. In this process, a number of genes are selected at random from the chromosome and enters the mutation process. A similar process is achieved with the mutation processors in which some of the processors are selected randomly to mutate the selected genes. The mutation process will therefore apply the Swap XforY, One pixel block shift and the flip mutation processes to the pairs [x3, y3], [x1, y1] and [x4, y4] genes. The order in which the genes are mutated is also randomly chosen and the same is done with the operators. We therefore can assume complete randomness during the mutation process that is carried out.

1) Crossover It was decided to use a 2-point crossover approach. As the mutation process could be overly destructive i.e. when all possible 9 mutation operators are selected to operate on the entire set of the chromosome genes, it was felt that a uniform crossover operator; itself destructive [31], might destroy too many good individuals. Therefore a less destructive operator was chosen i.e. 2-points, and this proved to be effective.

Fig. 3. Process for genes and mutation operators

B. Interface Repair Mechanism A repair mechanism was introduced to reposition

components, which overlap or which do not lay entirely within the interface boundary. For the latter we simply re-position the components on the edge of the space whilst for the former, we use the system illustrated in Figure 4. As the process is not infallible, solutions that remain infeasible are penalized as described in formula (2).

Fig. 4. Example of mending mechanism for overlapping

C. Local Search Algorithm (LS) The LS algorithm mentioned in this paper was configured

in GaLib from the GA by using a population size of, 1 individual only, always applying mutation as above and never applying crossover, The LS algorithm is fully described in Figure 5.

LS inherits the mutation algorithms from the GA and

therefore is well suited to locally optimize solutions. Differences in performance between LS and GA can therefore be attributed directly to the population-based GA approach, incorporating crossover.

Select the only individual Save original individual Mutate individual Evaluate individual If individual score after mutation > original saved individual score Restore individual to saved one Endif

Fig. 5. LS Algorithm

IV. EXPERIMENTS A series of experiments has been conducted to compare

how well LS and GA perform on this problem. Solution quality is measured as the task timing given by ACT-R for a set of n telephone numbers to be dialed in sequence. These numbers were chosen at random from the local directory (Yellow Pages 2006/07, Aberdeen, Scotland, UK). The same series of numbers is used for all tests. Solutions can be directly compared with the Bell keypad layout.

Fig. 6. Test application with TOISE system in the background

ACT-R timings for the Bell layout are 151.737 seconds

when n = 10 numbers dialed and 1487.441 seconds for n = 100 numbers dialed. Mutation and crossover rate parameters were chosen by trial and error and set to those, which gave the best solution on 100 numbers for GA. Empirically-determined best GA parameters are displayed in Table 1.

TABLE 1

Best GA parameters

Phone Numbers Crossover Mutation

10 0.8 0.003

100 0.8 0.003

We conducted two experiments. In the first, LS and GA

were terminated as soon as they produced a solution with timings equal to or faster than those achieved by ACT-R using the Bell keypad. In the second, both algorithms were run in a set of 2 experiments i.e. 5000 and 10000 generations to determine the best task timings that could be realized. Each experiment was run 100 times for both n = 10 and n = 100 telephone numbers. In all experiments, the bounding box was set to be sufficiently large so as to avoid

constraining the layout to confirm to a limited set of solutions - see Figure 6.

V. RESULTS Run length distributions for the first experiment (both GA

and LS) are shown in Figure 7. We observe that GA is much faster at finding good quality solutions than is LS.

Fig. 7. LS versus GA – as good as Bell layout

Table 2 tabulates the statistical summary. We can see that

with over 100 runs, both the means and variability of the results are significantly smaller (99% confidence) for the GA, which considerably outperforms LS during this experiment. Both however had a 100% success rate in finding solutions as good as the Bell layout.

TABLE 2

GA versus LS – as good as Bell layout

Algorithm GA LS

Mean # Evaluations 4224.690 200588.430

STD 1786.694 325956.007

P(T<=t) 2.91039E-08

Results for the second experiment for 5000 and 10000

generations when n = 10 phone numbers dialed, are shown in Table 3 which shows clearly that the GA outperforms LS.

Moreover, the results also indicate that a decrease in evaluation number to 5000 generations from 10000 generations makes little difference in the GA’s capacity to find better layouts for all of its runs, which is not the case for LS.

TABLE 3

GA versus LS task timings – 10 numbers

Generations 5000 10000

Algorithm GA LS GA LS

Mean Best Timing (s)

148.67052 150.93261 148.59699 150.43482

STD (s) 0.293442 0.832239 0.210154 0.703498

P(T<=t) 3.436E-55 8.324E-49

We observe similar results for the second experiment with n = 100 numbers dialed. These are presented in Table 4.

TABLE 4

GA versus LS task timing – 100 numbers

Generations 5000 10000

Algorithm GA LS GA LS

Mean Best Timing (s)

1465.2946 1481.2325 1464.8686 1477.3188

STD (s) 0.125253 7.351423 0.098938 5.58621

P(T<=t) 4.957E-40 5.081E-41

Overall, the results show that regardless of the number of

phone numbers dialed i.e. scaling up the number of constraints, both LS and GA retains a similar trend when the number of constraints is scaled up. An increase in evaluation times does not seem to result in major changes in performance for GA but does to LS overall performance. Two of the best keypad layouts we discovered using a series of 10 numbers are shown in Figure 8.

Fig. 8. Optimized telephone keypad layouts

The solution shown in Figure 8(a) is notable for its shape

similarity to the standard solution developed in the classic Bell experiments mentioned in the introduction. The ordering of the numbers reflects the set of numbers used. In particular, three of the ten numbers all contain the Aberdeen area code (01224), which can be efficiently dialed using the central column of the layout.

The solution shown in Figure 8(b) is the best ever found, with a task timing of 147.769 seconds. The area codes 01224 and other area codes contained within the numbers, such as (01358) and (01343) can also be efficiently dialed using this layout.

In order to validate our best solutions, we created a project within CogTool (see Section IIC) for the layouts shown in Figure 8 and compared them to the Bell layout displayed in Figure 6 using the same 10 numbers that were used in our experiments.

As mentioned in Section IIC, CogTool models a skilled user and so task timings are faster for all three layouts than measured in our simulations. Table 5 shows the time taken in CogTool to dial each telephone number for each layout and the total time for the entire task. This provides independent verification that the optimized layouts shown in Figure 8 give better task timings.

TABLE 5 CogTool Task Timing (seconds)

Table 6 shows the mean time to dial a single set of 10 numbers in CogTool for each layout. A t-test showed significant improvement in timings for the optimized layouts with 99% confidence. These findings are in line with Bell’s results i.e. small differences (as often observed with human data) but statistically significant and directly validate our own results.

TABLE 6 CogTool validation results

Bell Fig.8(a) Fig.8(b)

Mean Time (s) 14.832 14.6369 14.5986

STD(s) 0.312991 0.244809 0.289399

P(T<=t) Bell Vs Fig.8(a) 3.557E-05

P(T<=t) Bell Vs Fig.8(b) 1.866E-06

Fig. 9. GA versus LS timing comparison based on 100 runs

using 5000 and 10000 generations per run

VI. CONCLUSION The overall aim of this research is to explore the use of

evolutionary algorithms for general interface design using cognitive modeling as a surrogate for human testing. To this end, we have developed a general testing framework (TOISE) allowing the specification of user tasks and interface layout components and constraints. The framework uses the cognitive modeling architecture ACT-R to evaluate interface designs and a GA to be used as a wrapper for evolutionary design.

Our objective in this paper was to provide a baseline

evaluation of the evo-cognitive design approach using a well-studied interface design problem from the HCI community. The telephone keypad design problem was selected and optimal layouts were sought for the user tasks of dialing a sequence of 10 and 100 telephone numbers. A local search algorithm and a genetic algorithm were developed with operators specialized for interface component layout. These algorithms were compared for performance with optimized parameter settings. Both LS and GA are capable of finding keypad layouts equal to or better than the Bell layout. GA significantly outperforms LS, obtaining on average a higher quality of solution in fewer evaluations and with less variability in run time. The performance of LS deteriorates as task complexity increases suggesting that LS will not scale well to interface design problems involving higher task complexity. This is shown in Figure 9 where 2 sets of experiments were ran i.e. 5000 and 10000 generations.

It is very encouraging that some of the runs recovered designs that very closely resemble the Bell telephone keypad. This indicates that evo-cognitive design is capable of reproducing some of the results of a real-world expensive human user trial. However, telephone keypad design is a relatively simple problem involving only one type of action and the practical application of the “novel” designs presented here is limited.

Our results suggest that the combination of evo-cognitive design may offer a competitive and cost effective alternative to interface design approaches driven by human evaluation.

Further research is ongoing using more complex interfaces and user tasks and will require human evaluation of interface designs produced through TOISE to fully assess the utility of this approach.

VII. DISCUSSION The main purpose of this paper was to discover if our EA

approach was capable of reproducing the results achieved by Bell Laboratories in optimizing telephone keypad layout. Although this layout is now ubiquitous, and so seems obvious, its discovery was a non-trivial human design exercise. The Bell experiments evaluated a total of 16 different designs, used almost 200 test participants and took several man-years of effort to discover the optimum layout. Our approach was able to evaluate millions of designs per run in a matter of hours and reliably produced competitive or better designs. In particular, the original Bell design was recovered on some runs, although with a labeling optimized to the numbers used. Moreover, some runs of our experiments yielded the significantly better layout shown in Figure 8(b) and this EA-generated solution possibly deserves to be evaluated in a human trial.

Small differences in keying times separate the best designs found. These can be experientially significant in terms of user cognitive load over a long period of time. Indeed, such small differences between designs were also observed in the original Bell experiments and held to be important.

Another experimental goal was to establish whether an EA would consistently deliver better solutions than a simpler local search. Fig 9 shows that the GA delivers better timing results using fewer generations compared to local search. In addition, the local search algorithm was not inconsistent in finding good solutions and was prone to being trapped by local minima.

Number Bell Fig.8(a) Fig.8(b)

1 14.367 14.290 14.219

2 15.235 14.901 15.030

3 15.116 14.934 14.875

4 14.599 14.501 14.435

5 14.749 14.574 14.422

6 15.075 14.841 14.752

7 14.906 14.689 14.591

8 15.157 14.863 14.952

9 14.701 14.486 14.460

10 14.415 14.290 14.250

Total Task Timing 148.32 146.369 145.986

This initial work has demonstrated that evo-cognitive design using the TOISE framework is a promising approach to Human Interface Design. TOISE, however, offers a platform that can facilitate experiments on a wide range of complex GUI designs involving the full range of user interactions. Future work will explore a wider range of interface design problems and complexity of tasks.

VIII. REFERENCES [1] Fleetwood, M., Lebiere, C., Archer, R., Mui, R., Gosakan, M., (2007)

Putting the Brain in the Box For Human-System Interface Evaluation, Proceedings of the Human Factors and Ergonomics Society Annual Meeting Proceedings, Human Performance Modeling, pp. 1165-1169(5)

[2] Ritter, F., Young, R., (2001) Embodied models as simulated users: Introduction to this special issue on using cognitive models to improve interface design. In International Journal of Human-Computer Studies

[3] Deininger, R., (1960) Human Factors Engineering Studies of the Design and Use of the Pushbutton Telephone Sets, Technical Journal, Volume 4, Number 4, Jul 1960

[4] Jeffrey Nichols. "Automatically Generating High-Quality User Interfaces for Appliances," Thesis Proposal, April 14, 2004.

[5] St. Amant, R., T. E. Horton, and F. E. Ritter (2004). Model-based evaluation of cell phone menu interaction. In CHI '04: Proceedings of the SIGCHI conference on Human factors in computing systems, New York, NY, USA, pp. 343-350. ACM.

[6] Mathieu Raynal and Nadine Vigouroux. Genetic algorithm to generate optimized soft keyboard. In CHI 05: CHI 05 extended abstracts on Human factors in computing systes, pages 17291732, New York, NY, USA, 2005. ACM Press.

[7] Measuring the Information Society - The ICT Development Index (2009). International Telecomunication Union. Retreived January 27, 2010, from the world Wide Web: http://www.itu.int/ITU-D/ict/publications/idi/2009/material/IDI2009_w5.pdf

[8] Dario D. Salvucci, Frank J. Lee, Simple cognitive modeling in a complex cognitive architecture, Proceedings of the SIGCHI conference on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA

[9] GAlib: A C++ Library of Genetic Algorithm Components, http://lancet.mit.edu/ga

[10] Anderson, J. R. & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum.

[11] John E. Laird, Allen Newell, Paul S. Rosenbloom, SOAR: an architecture for general intelligence, Artificial Intelligence, v.33 n.1, p.1-64, Sept. 1987

[12] Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.

[13] Gluck, K. A., Ball, J. T., Krusmark, M. A., Rodgers, S. M., & Purtee, M. D. (2003). A computational process model of basic aircraft maneuvering. In F. Detje, D. Doerner, & H. Schaub (Eds.), In Proceedings of the Fifth International Conference on Cognitive Modeling (pp. 117-122). Bamberg, Germany: Universitats-Verlag Bamberg.

[14] John, B. E., Salvucci, D. D., Centgraf, P., and Prevas, K. 2004b. Integrating models and tools in the context of driving and in-vehicle devices. In Proceedings of the 6th International Conference on Cognitive Modeling. Lawrence Erlbaum Associates. Mahwah, NJ, 130--135.

[15] Ritter, F. E., Kukreja, U., & St. Amant, R. (2007). Including a model of visual processing with a cognitive architecture to model a simple teleoperation task. Journal of Cognitive Engineering and Decision Making, 1(2), 121-147.

[16] D. W. Hainsworth, Teleoperation User Interfaces for Mining Robotics, Autonomous Robots, v.11 n.1, p.19-28, July 2001

[17] Byrne, M. D., Wood, S. D., Sukaviriya, P., Foley, J. D., & Kieras, D. E. (1994). Automating interface evaluation. In Proceedings of the CHI ‘94 Conference on Human Factors in Computer Systems. 232-237. New York, NY: ACM.

[18] Allan Christian Long, Jr., James A. Landay, Lawrence A. Rowe, Implications for a gesture design tool, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p40-47, May 15-20, 1999, Pittsburgh, Pennsylvania, United States

[19] Dario D. Salvucc, John R. Anderson, Intelligent gaze-added interfaces, Proceedings of the SIGCHI conference on Human factors in computing systems, p.273-280, April 01-06, 2000, The Hague, The Netherlands

[20] Stuart Card, Thomas P. Moran and Allen Newell. The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, 1983.

[21] John, B. E. and Kieras, D. E. 1994 The GOMS Family of Analysis Techniques: Tools for Design and Evaluation. Technical Report. UMI Order Number: CS-94-181., Carnegie Mellon University.

[22] St. Amant, R. and F. E. Ritter (2004). Automated goms–to–act-r model generation. In In Proceedings of the Sixth International Conference on Cognitive Modeling (ICCM). LEA, Mahwah, NJ, pp. 28-34.

[23] John, B. E. and D. D. Salvucci (2005). Multipurpose prototypes for assessing user interfaces in pervasive computing systems. Pervasive Computing, IEEE 4 (4), 27-34.

[24] Toshiyuki Masui. Graphic object layout with interactive genetic algorithms. In Proceedings of the 1992 IEEE Workshop on Visual Languages, pp. 74-80. IEEE Computer Society Press, September 199

[25] Terushige Honiden (2004), Tree Structure Modeling and Genetic Algorithm-based Approach to Unequal-area Facility Layout Problem, International Journal of Industrial Engineering & Management System, Vol. 3, No. 2, pp. 123-128.

[26] A. Oliver, N. Monmarché and G. Venturini (2002). Interactive design of web sites with a genetic algorithm. Proceedings of the IADIS International Conference WWW/Internet, pages 355-362, Lisbon, Portugal, November 13-15

[27] Petrovic, P. Solving LEGO brick layout problem using Evolutionary Algorithms. Norsk Informatikkonferanse NIK'2001.

[28] Jagielsky, R., Gero, J.R. A Genetic Programming Approach to the space Layout Planning Problem, CAAD Futures, Vol. 875 (1997), pp.875-884

[29] Balakrishnan and Cheng, Genetic search and the dynamic layout problem: An improved algorithm. Computers and Operations Research. V 27 i6. 587-593. 2000.

[30] Klanac, Alan; Jelovica, Jasmin; Niemeläinen, Matias; Domagallo, Stanislaw; Remes, Heikki; Romanoff, Jani, Structural Omni-Optimization of a Tanker. 7th International Conference on Computer and IT Applications in the Maritime Industries (COMPIT`08), Liège, 21-23 April, 2008. 537-550.

[31] Michalewicz, Z.,"Genetic Algorithms + Data Structures = Evolution Programs", 3rd edition, Springer-Verlag, Berlin, 1996