Towards holistic human-computer interaction evaluation ...

426
University of Wollongong Thesis Collections University of Wollongong Thesis Collection University of Wollongong Year Towards holistic human-computer interaction evaluation research and practice: development and validation of the distributed usability evaluation method Lejla Vrazalic University of Wollongong Vrazalic, Lejla, Towards holistic human-computer interaction evaluation research and practice: development and validation of the distributed usability evaluation method, PhD thesis, School of Economics and Information Systems, University of Wollongong, 2004. http://ro.uow.edu.au/theses/534 This paper is posted at Research Online. http://ro.uow.edu.au/theses/534

Transcript of Towards holistic human-computer interaction evaluation ...

University of Wollongong Thesis Collections

University of Wollongong Thesis Collection

University of Wollongong Year

Towards holistic human-computer

interaction evaluation research and

practice: development and validation of

the distributed usability evaluation

method

Lejla VrazalicUniversity of Wollongong

Vrazalic, Lejla, Towards holistic human-computer interaction evaluation research andpractice: development and validation of the distributed usability evaluation method, PhDthesis, School of Economics and Information Systems, University of Wollongong, 2004.http://ro.uow.edu.au/theses/534

This paper is posted at Research Online.

http://ro.uow.edu.au/theses/534

Towards Holistic Human-Computer Interaction Evaluation

Research and Practice:

Development and Validation of the

Distributed Usability Evaluation Method

A thesis submitted in fulfilment of the

requirements for the award of the degree

DOCTOR OF PHILOSOPHY

from

THE UNIVERSITY OF WOLLONGONG

by

LEJLA VRAZALIC

BCom (Honours Class I)

School of Economics and Information Systems

2004

- i -

Acknowledgements

Najveće je slovo što se samo sluti, Najdublje je ono što u nama ćuti.

Greatest is the thought that is just a hint, Deepest is that which lies silently within us.

Mehmedalija Mak Dizdar, Bosnian Poet (1917-1971)

Although it may have felt like the loneliest journey of all at times, I am indebted to the

following people for their selfless support, encouragement and wisdom along the way:

My mentor, colleague and friend, Associate Professor Peter Hyland for his guidance, his

criticisms and the knowledge he imparted over many, many coffees at Picasso. He has taught

me a lot about research.

My colleagues in Information Systems, the Decision Systems Laboratory and the Faculty of

Commerce for their interest and assurance that the journey does have an end at times when it

seemed like no end was in sight.

Associate Professor Helen Hasan, Dr Edward Gould and Dr Irina Verenikina for introducing me

to Activity Theory, a powerful framework that appears to be deceptively simple at first glance.

Five years later, it still hasn’t ceased to amaze me.

My friends in Australia and in Dubai for motivating me by asking: “Are you done yet?” at every

step of the way.

My parents, Veda and Munib Vražalić, my brother, Mahir Vražalić, and my grandparents,

Nafija and Osman Alić. Even though they were thousands of miles away, I felt their love and

support with me at all times.

My partner, Diniz da Rocha, for being there from the beginning to the end, and for being

himself at the times that I needed it the most.

Nije nauka zec. Neće pobjeći. Research is not a rabbit. It won’t run away.

Anonymous, Bosnian Adage

- ii -

Abstract

The traditional notion of usability localises usability at the system interface and does not take

into account the context in which people use computer systems to do their everyday work.

Recently, arguments have been made by a number of researchers that this notion of usability is

outdated and inadequate because it fails to address the use-situation. Proposals have been put

forward to extend our current thinking about usability to include the usefulness of systems. The

usefulness of systems is manifested in the use-situation because usefulness cannot be

understood outside the context in which the system is employed to perform real-life activities.

To extend the traditional notion of usability to include the usefulness of systems, Spinuzzi

(1999) introduced the notion of distributed usability which views usability as a property of

humans’ interaction with a system, rather than a property of the system itself. Spinuzzi (ibid)

argues that, instead of being localised at the system interface, usability is distributed across an

entire activity that a human engages in using the system. This view of usability has significant

implications for our current usability evaluation methods (UEMs) which are focused primarily

on assessing the traditional usability of systems. As a result of this focus, the UEMs suffer from

a number of problems and limitations, raising questions about their validity and reliability. This

thesis aims to develop and validate a UEM based on distributed usability. The UEM has been

named the Distributed Usability Evaluation Method (DUEM). It consists of four phases and is

focused on assessing the distributed usability or usefulness of computer systems. Distributed

usability is operationalised through the principles of Cultural Historical Activity Theory.

Activity Theory is a powerful clarifying tool (Nardi, 1996b) for understanding and explaining

human activities in context and, as a result, a suitable underlying framework for a UEM that

aims to assess systems in this context. The validation of DUEM indicates that it overcomes

most of the problems associated with current UEMs, however in the process of doing so it

suffers from its own set of limitations.

- iii -

Thesis Certification

CERTIFICATION

I, Lejla Vrazalic, declare that this thesis, submitted in fulfilment of the requirements for the

award of Doctor of Philosophy, in the School of Economics and Information Systems,

University of Wollongong, is wholly my own work unless otherwise referenced or

acknowledged. The document has not been submitted for qualifications at any other academic

institution.

Lejla Vrazalic

25 June, 2004

- iv -

Table of Contents

Acknowledgements....................................................................................... i

Abstract ........................................................................................................ ii

Thesis Certification ....................................................................................iii

List of Tables .............................................................................................. ix

List of Figures.............................................................................................. x

List of Publications................................................................................... xiv

Chapter 1 - Introduction ............................................................................ 1

1.1 Introduction.......................................................................................................................... 1 1.2 Background.......................................................................................................................... 2 1.3 Research Goals .................................................................................................................... 6 1.4 Research Methodology ........................................................................................................ 7 1.5 Significance of the Thesis.................................................................................................... 9 1.6 Structure of the Thesis ......................................................................................................... 9 1.7 Definitions of Key Terms .................................................................................................. 11

Chapter 2 - Literature Review................................................................. 12

2.1 Introduction........................................................................................................................ 12 2.2 Traditional Usability .......................................................................................................... 14 2.3 Distributed Usability.......................................................................................................... 18 2.4 Designing for Usability...................................................................................................... 20 2.5 Evaluation .......................................................................................................................... 22 2.6 Usability Evaluation Methods............................................................................................ 25 2.7 Taxonomies of Usability Evaluation Methods................................................................... 26

2.7.1 Whitefield at al’s Framework for Evaluation ............................................................. 27 2.7.2 Fitzpatrick’s Strategies for Usability Evaluation ....................................................... 28 2.7.3 Howard and Murray’s Classes ................................................................................... 30

- v -

2.7.4 Wixon and Wilson’s Conceptual Dimensions ............................................................. 31 2.7.5 Dix et al’s Classification............................................................................................. 32 2.7.6 Preece at al’s Evaluation Paradigms ......................................................................... 33 2.7.7 UEM Taxonomies: A Commentary ............................................................................. 33

2.8 An Overview of Different Usability Evaluation Methods ................................................. 35 2.8.1 Expert-Based UEMs ................................................................................................... 36

2.8.1.1 Resource-Constrained Methods .......................................................................................... 37 2.8.1.2 Usability-Expert Reviews ................................................................................................... 40 2.8.1.3 Cognitive Walkthroughs ..................................................................................................... 43 2.8.1.4 Heuristic Walkthroughs ...................................................................................................... 47 2.8.1.5 Group Design Reviews ....................................................................................................... 48

2.8.2 Model-Based UEMs.................................................................................................... 51 2.8.2.1 The Keystroke Level Model................................................................................................ 52 2.8.2.2 Model Mismatch Analysis (MMA) Method ....................................................................... 54

2.8.3 User-Based UEMs ...................................................................................................... 55 2.8.3.1 Formal Experiments............................................................................................................ 56 2.8.3.2 Field Studies........................................................................................................................ 57 2.8.3.3 Usability (User) Testing...................................................................................................... 58

2.9 The Problem with Usability Problems............................................................................... 76 2.9.1 Defining Usability Problems ...................................................................................... 78 2.9.2 Validating Usability Problems.................................................................................... 79 2.9.3 Rating Usability Problems.......................................................................................... 80

2.10 UEM Challenges.............................................................................................................. 83 2.10.1 UEM Challenge 1 [UEMC-1]: UEMs are system focused and technology driven .. 83 2.10.2 UEM Challenge 2 [UEMC-2]: Lack of understanding how users’ goals and

motives are formed ................................................................................................. 84 2.10.3 UEM Challenge 3 [UEMC-3]: Lack of user involvement in the design of the

evaluation and analysis of the evaluation results................................................... 85 2.10.4 UEM Challenge 4 [UEMC-4]: Limited understanding of users’ knowledge ........... 86 2.10.5 UEM Challenge 5 [UEMC-5]: Lack of means for including contextual factors

in an evaluation ...................................................................................................... 87 2.10.6 UEM Challenge 6 [UEMC-6]: Lack of understanding how the system and

system use co-evolve over time............................................................................... 88 2.10.7 UEM Challenge 7 [UEMC-7]: No common vocabulary for describing evaluation

processes and defining evaluation outcomes ......................................................... 89 2.10.8 UEM Challenge 8 [UEMC-8]: Lack of a theoretical framework............................. 89 2.10.9 Summary of UEM Challenges................................................................................... 90

2.11 Information Processing .................................................................................................... 91 2.12 Cultural Historical Activity Theory ................................................................................. 94

2.12.1 Activity Theory – A Historical Perspective............................................................... 95 2.12.2 Basic Principles of Activity Theory ........................................................................ 105 2.12.3 Activity Theory in Human-Computer Interaction ................................................... 114

2.12.3.1 First Steps “Through the Interface” ................................................................................ 115 2.12.3.2 Formalizing Activity Theory in HCI: “Context and Consciousness” ............................. 117 2.12.3.3 Towards an Activity Theory Based HCI Method: The Activity Checklist ..................... 119

- vi -

2.12.3.4 Latest Activity Theory Developments in HCI: Crystallization and the AODM ............. 120 2.12.4 Benefits of Activity Theory...................................................................................... 123 2.12.5 Activity Theory Limitations..................................................................................... 124 2.12.6 Activity Theory Principles Used in this Thesis ....................................................... 125

2.13 Conclusion ..................................................................................................................... 129

Chapter 3 - Research Methodology....................................................... 131

3.1 Introduction...................................................................................................................... 131 3.2 Research Goals ................................................................................................................ 132 3.3 HCI and Information Systems ......................................................................................... 132 3.4 March and Smith’s Research Framework........................................................................ 134

3.4.1 Theoretical Background ........................................................................................... 135 3.4.2 Design Science Artifacts ........................................................................................... 136 3.4.3 Two Dimensional Research Framework................................................................... 139

3.5 Research Framework Justification................................................................................... 142 3.6 Research Methodology .................................................................................................... 145

3.6.1 Stage I: Building the Distributed Usability Evaluation Method............................... 146 3.6.2 Stage II: Validating the Distributed Usability Evaluation Method .......................... 151

3.6.2.1 Comparing Usability Evaluation Methods........................................................................ 152 3.6.2.2 Validation of the Distributed Usability Evaluation Method.............................................. 158

3.7 Conclusion ....................................................................................................................... 162

Chapter 4 - Stage I: Method Building................................................... 166

4.1 Introduction...................................................................................................................... 166 4.2 Traditional Usability Testing ........................................................................................... 167

4.2.1 Traditional Usability Testing Method Components.................................................. 168 4.2.2 Project Background .................................................................................................. 175 4.2.3 Usability Testing of Proposed New Design .............................................................. 178

4.2.3.1 Method Component 1: “Define Usability Testing Plan”................................................... 178 4.2.3.2 Method Component 2: “Select and Recruit Representative Users” .................................. 188 4.2.3.3 Method Component 3: “Prepare Test Materials” .............................................................. 191 4.2.3.4 Method Component 4: “Usability Test”............................................................................ 192 4.2.3.5 Method Component 5: “Analysis and Reporting of Results” ........................................... 193

4.2.4 Empirical Limitations of Traditional Usability Testing............................................ 196 4.2.4.1 User-Related Limitations .................................................................................................. 196 4.2.4.2 Process-Related Limitations ............................................................................................. 202 4.2.4.3 Facilitator-Related Limitations ......................................................................................... 214 4.2.4.4 Summary of Traditional Usability Testing Limitations .................................................... 216

4.3 Distributed Usability Evaluation Method (DUEM)......................................................... 218 4.3.1 DUEM - An Overview............................................................................................... 218 4.3.2 Phase 1 – Selecting and Recruiting Users................................................................ 220

- vii -

4.3.3 Phase 2 – Understanding Users’ Activities .............................................................. 225 4.3.4 Phase 3 – Evaluating the System in relation to Users’ Activities............................. 240

4.3.4.1 Sub-phase 3.1 – Defining the Evaluation Plan.................................................................. 240 4.3.4.2 Sub-phase 3.2 – Preparing the Evaluation Resources and Materials ................................ 245 4.3.4.3 Sub-phase 3.3 – Testing the System ................................................................................. 248

4.3.5 Phase 4 – Analysing and Interpreting Results .......................................................... 249 4.3.6 Summary of Stage I of Research Methodology ......................................................... 266

4.4 DUEM Vs. Traditional Usability Testing ........................................................................ 267 4.5 Potential Benefits and Limitations of DUEM.................................................................. 272

4.5.1 Potential Benefits of DUEM ..................................................................................... 272 4.5.2 Potential Limitations of DUEM................................................................................ 275

4.6 Conclusion ....................................................................................................................... 276

Chapter 5 - Stage II: Method Validation.............................................. 279

5.1 Introduction...................................................................................................................... 279 5.2 The Application of DUEM .............................................................................................. 282

5.2.1 DUEM Phase 1: Selecting and Recruiting Users ..................................................... 285 5.2.2 DUEM Phase 2: Understanding Users’ Activities ................................................... 290

5.2.2.1 Activity 1: “Enrolling in subjects/courses”....................................................................... 295 5.2.2.2 Activity 2: “Taking subjects/Doing research” .................................................................. 299 5.2.2.3 Activity 3: “Doing assignments/Studying for and taking tests and exams/Writing

a thesis”........................................................................................................................... 303 5.2.2.4 Activity 4: “Socialising” ................................................................................................... 309 5.2.2.5 Summary of DUEM Phase 2: Understanding User Activities .......................................... 311

5.2.3 DUEM Phase 3: Evaluating the System in relation to Users’ Activities .................. 312 5.2.3.1 Sub-Phase 3.1: Defining the Evaluation Plan ................................................................... 312 5.2.3.2 Sub-Phase 3.2: Preparing the Evaluation Resources and Materials .................................. 314 5.2.3.3 Sub-Phase 3.3: Testing the System ................................................................................... 315

5.2.4 DUEM Phase 4: Analysing and Interpreting Results ............................................... 316 5.2.4.1 Breakdowns in Activity 1: “Enrolling in subjects/courses” .............................................. 317 5.2.4.2 Breakdowns in Activity 2: “Taking subjects/Doing research”.......................................... 322 5.2.4.3 Breakdowns in Activity 3: “Doing assignments/Studying for and taking tests and

exams/Writing a Thesis”................................................................................................. 325 5.2.4.4 Breakdowns in Activity 4: “Socialising” .......................................................................... 329 5.2.4.5 Summary of DUEM Phase 4: Analysing and Interpreting Results ................................... 332

5.3 DUEM Claims ................................................................................................................. 333 5.3.1 [DUEM-1]: DUEM is user focused and user activity driven ................................... 333 5.3.2 [DUEM-2]: DUEM provides a means of understanding users’ goals and motives . 334 5.3.3 [DUEM-3]: DUEM involves users directly in the design of the evaluation and

in the analysis of the evaluation results ............................................................... 336 5.3.4 [DUEM-4]: DUEM provides an understanding of users’ knowledge...................... 338 5.3.5 [DUEM-5]: DUEM provides a framework for including contextual factors in

an evaluation ........................................................................................................ 339

- viii -

5.3.6 [DUEM-6]: DUEM provides a means of understanding how the system and system use co-evolve over time............................................................................. 341

5.3.7 [DUEM-7]: DUEM offers a common vocabulary for describing evaluation processes and defining evaluation outcomes ....................................................... 343

5.3.8 [DUEM-8]: DUEM is a theory informed method..................................................... 344 5.4 Actual Benefits of DUEM ............................................................................................... 346 5.5 Actual Limitations of DUEM .......................................................................................... 348 5.6 Conclusion ....................................................................................................................... 349

Chapter 6 - Conclusion ........................................................................... 351

6.1 Introduction...................................................................................................................... 351 6.2 Summary of Key Issues ................................................................................................... 351 6.3 Limitations of Research Study......................................................................................... 354 6.4 Personal Reflection .......................................................................................................... 355 6.5 Future Research Directions.............................................................................................. 357 6.6 Conclusion ....................................................................................................................... 358

References ................................................................................................ 360

Appendix A - Task Scenarios................................................................. 375

Appendix B - Pre-Test Questionnaire ................................................... 378 Appendix C - Post-Test Questionnaire ................................................. 382 Appendix D - Recruitment Poster – Traditional Usability Testing ... 386 Appendix E - Information Sheet............................................................ 387 Appendix F - Consent Form................................................................... 390

Appendix G - Profile of Participants - Usability Testing .................... 391

Appendix H - Quantitative Data............................................................ 396

Appendix I - Post-Test Questionnaire Data Summary ....................... 400 Appendix J - Usability Problems – Traditional Usability Testing ..... 403 Appendix K - DUEM Recruitment Poster............................................ 405 Appendix L - Distributed Usability Problems...................................... 406

- ix -

List of Tables

Table No Title Page No

Table 1.1 List of key terms and their definitions (in alphabetical order) 11

Table 2.1 Conceptual mapping of Activity Theory principles to UEM Challenges

128

Table 3.1 Metrics or criteria used to evaluate different artifacts (March & Smith, 1995)

140

Table 3.2 Summary of comparative UEM studies 153

Table 3.3 Problems by validity types across comparative UEM studies (Gray and Salzman, 1998)

156

Table 3.4 Eight UEM Challenges (from Section 2.10) 159

Table 3.5 Eight claims about DUEM 160

Table 3.6 Eight claims about DUEM and corresponding questions for validating DUEM

161

Table 4.1 Traditional usability testing method components 171

Table 4.2 Quantitative and qualitative data collected 188

Table 4.3 Generic profile of current student population 189

Table 4.4 Profile of usability testing participants 193

Table 4.5 Summary of quantitative data collected for each task 194

Table 4.6 Distributed Usability Evaluation (DUEM) method components 268

Table 4.7 Key differences between traditional usability testing and DUEM 272

Table 5.1 Summary of student participants and their backgrounds 289

Table 5.2 Adapted questions for Phase 2 used to collect data about users’ activities

291

- x -

List of Figures

Figure No Title Page No

Figure 1.1 The relationship between usability, utility and functionality 3

Figure 1.2 March and Smith’s (1995) research framework 8

Figure 2.1 Structure of Chapter 2 13

Figure 2.2 A model of the attributes of system acceptability (Nielsen, 1993) 16

Figure 2.3 The Star lifecycle (Hartson & Hix, 1989; adapted from Preece et al, 2002)

22

Figure 2.4 A model of three evaluation dimensions 25

Figure 2.5 Framework for evaluation (Whitefield et al, 1991) 28

Figure 2.6 Strategies for usability evaluation and associated UEMs (Fitzpatrick, 1999)

29

Figure 2.7 Classification of usability evaluation techniques (Dix et al, 1998) 32

Figure 2.8 Core evaluation paradigms (Preece et al, 2002) 33

Figure 2.9 UEM categories and UEMs to be discussed 35

Figure 2.10 Revised usability heuristics (Nielsen, 1994a) 38

Figure 2.11 HOMERUN heuristics for commercial web sites (Nielsen, 2000) 40

Figure 2.12 Generic user profile (adapted from Rubin, 1994) 61

Figure 2.13 A typical usability laboratory (Rubin, 1994, p.56) 64

Figure 2.14 A sample task scenario for setting up a printer (Rubin, 1994) 66

Figure 2.15 Sample performance measures used in usability testing 67

Figure 2.16 Triangulating usability test data from different sources (adapted from Dumas & Redish, 1993)

70

Figure 2.17 Usability problem detection matrix (adapted from Gray & Salzman, 1998)

80

Figure 2.18 A five-point rating scale for the severity of usability problems (Nielsen, 1994a, p.49)

81

Figure 2.19 Estimating the severity of usability problems using a combination of orthogonal scales (Nielsen, 1993)

81

- xi -

Figure No Title Page No

Figure 2.20 The information processing model (Card et al, 1983) 92

Figure 2.21 Changes in HCI direction (Bannon, 1991) 93

Figure 2.22 Unmediated vs. mediated behaviour (Vygotsky, 1978, pp. 39-40) 96

Figure 2.23 Current conceptualisation of tool mediation in Activity Theory literature

97

Figure 2.24 Hierarchical structure of an activity (Leont’ev, 1978) 98

Figure 2.25 The activity hierarchy – An example 101

Figure 2.26 The activity system (Engeström, 1987) 102

Figure 2.27 Four levels of contradictions in a network of activity systems (Engeström, 1999)

105

Figure 2.28 The computer tool as an extension of the internal plane of actions (Kaptelinin, 1996, p. 52)

112

Figure 2.29 A checklist for HCI analysis through focus shifts and breakdowns (Bødker, 1996, pp. 168-169)

117

Figure 2.30 Activity Notation (Mwanza, 2001, p. 345) 122

Figure 3.1 Relationship of HCI to other computer-related disciplines (Preece et al, 2002)

133

Figure 3.2 Relationships between constructs, models, methods and instantiations (based on March & Smith, 1995)

138

Figure 3.3 Two-dimensional research framework (March & Smith, 1995) 139

Figure 3.4 A multimethodological framework Information Systems research (Nunamaker et al, 1991, p. 94)

144

Figure 3.5 Situating the thesis in March and Smith’s research framework 145

Figure 3.6 Structure of a method component (Goldkuhl et al, 1998) 148

Figure 3.7 Method Theory (Goldkuhl et al, 1998) 149

Figure 3.8 Research Methodology 164

Figure 4.1 Sequence of five method components that make up traditional usability testing

169

Figure 4.2 Initial analysis of traditional usability testing 174

Figure 4.3 The University’s World Wide Web site prior to re-development 176

Figure 4.4 The high-fidelity prototype used in the evaluation 176

- xii -

Figure No Title Page No

Figure 4.5 The Current Students pages of the proposed new design 177

Figure 4.6 Categories of current students at the University 180

Figure 4.7 Actual categories of current student applied in the usability testing 181

Figure 4.8 Sample tasks developed for usability testing 183

Figure 4.9 Web page with information about transport options 184

Figure 4.10 Layout of the usability laboratory 185

Figure 4.11 Equipment configuration in the usability laboratory 186

Figure 4.12 Library web page used by participants to find referencing information

200

Figure 4.13 UniLearning web page containing referencing information 201

Figure 4.14 Search results using the term: “Special Consideration” 207

Figure 4.15 Search results using the term: “supplementary test” 207

Figure 4.16 Users in a DUEM evaluation (based on the Engeström, 1987) 222

Figure 4.17 Hierarchical structure of an activity (Leont’ev, 1978) 232

Figure 4.18 Breakdown in report producing activity 250

Figure 4.19 Contradictions directly related to the system (i.e. tool) 256

Figure 4.20 The French students’ activity 257

Figure 4.21 The Scandinavian students’ activity 258

Figure 4.22 Contradictions in the Scandinavian students’ activity 259

Figure 4.23 Contradictions in the investigation activity 261

Figure 4.24 Distributed Usability Evaluation Method (DUEM) 264

Figure 4.25 Analysis of DUEM (based on Goldkuhl et al, 1998) 271

Figure 4.26 System view differences between traditional usability testing and DUEM

274

Figure 5.1 Top level and lower level pages on the University’s web site 284

Figure 5.2 A current student’s central activity 286

Figure 5.3 Hierarchical structure of Activity 1: Enrolling in subjects/courses 297

Figure 5.4 Enrolling in subjects/courses activity 298

- xiii -

Figure No Title Page No

Figure 5.5 Hierarchical structure of Activity 2: Taking subjects/Doing research

301

Figure 5.6 Taking subjects/Doing research activity 302

Figure 5.7 Hierarchical structure of Activity 3: Doing assignments/Studying for and taking tests and exams/Writing a thesis

307

Figure 5.8 Doing assignments/Studying for and taking tests and exams/Writing a thesis activity

308

Figure 5.9 Hierarchical structure of Activity 4: Socialising 310

Figure 5.10 Socialising activity 311

Figure 5.11 Contradictions in Activity 1: Enrolling in subjects/courses 321

Figure 5.12 Contradictions in Activity 2: Taking subjects/Doing research 324

Figure 5.13 Contradictions in Activity 3: Doing assignments/studying for and taking tests and exams/Writing a Thesis

328

Figure 5.14 Events Calendar 330

Figure 5.15 Contradictions in Activity 4: Socialising 331

- xiv -

List of Publications

Note: Publications relevant to this thesis are marked with an asterisk (*)

Edited Books

* Hasan H., Gould, E., Larkin, P. & Vrazalic, L. (2001) Information Systems and Activity Theory: Volume 2 Theory and Practice, University of Wollongong Press.

Refereed Book Chapters

MacGregor, R.C., Vrazalic, L., Bunker, D., Carlsson, S. & Magnusson, M. (2004) A Comparison of factors pertaining to both the adoption and non-adoption of electronic commerce in formally networked and non-networked regional SMEs: A study of Swedish small businesses, to appear in Corbitt, B. & AL-Qirim, N. (Eds) eBusiness, eGovernment & Small and Medium-Sized Enterprises: Opportunities and Challenges, IDEA Group Publishing, pp.206-243.

MacGregor, R.C., Vrazalic, L., Bunker, D., Carlsson, S. & Magnusson, M. (accepted) Barriers to Electronic Commerce Adoption in Small Businesses in Regional Sweden, Encyclopedia of Information Science and Technology, Volume I-III, IDEA Group Publishing, in press.

* Vrazalic, L. (2003) Evaluating Usability in Context: An Activity Theory Based Method. In Hasan, H., Verenikina, I. & Gould, E. (Eds) Activity Theory and Information Systems: Volume 3, Expanding the Horizon, University of Wollongong Press, pp.171-192.

* Hasan, H., Verenikina, I. & Vrazalic, L. (2003) Technology as the Object and the Tool of the Learning Activity. In Hasan H., Gould, E., & Verenikina, I. (Eds) Information Systems and Activity Theory: Volume 3 Expanding the Horizon, University of Wollongong, pp 15-30.

* Vrazalic, L. (2001) Techniques to Analyse Power and Political Issues in IS Development. In Hasan H., Gould, E., Larkin, P. & Vrazalic, L. (Eds) Information Systems and Activity Theory: Volume 2 Theory and Practice, University of Wollongong Press, pp 39-54.

Refereed Journal Publications

MacGregor, R.C. & Vrazalic, L. (2004) The Effects of Strategic Alliance Membership on the Disadvantages of Electronic Commerce Adoption: A Comparative Study of Swedish and Australian Small Businesses, Journal of Global Information Management, in press.

* Vrazalic, L. (2004) Evaluating Distributed Usability: The Role of User Interfaces in an Activity System, Australasian Journal of Information Systems, Special Issue 2003/2004, pp. 26-39.

Vrazalic, L., MacGregor, R.C., Bunker, D., Carlsson, S. & Magnusson, M. (2002) Electronic Commerce and Market Focus: Some Findings from a Study of Swedish Small to Medium Enterprises, Australian Journal of Information Systems, Vol 10 No 1, pp 110-119.

MacGregor, R.C., Vrazalic, L., Carlsson, S., Bunker, D. & Magnusson, M. (2002) The Impact of Business Size and Business Type on Small Business Investment in Electronic Commerce: A Study of Swedish Small Business, Australian Journal of Information Systems, Vol 9, No 2, pp 31-39.

- xv -

Refereed Conference Publications

* Vrazalic, L. (2004) Distributing Usability: The Implications for Usability Testing, Constructing the Infrastructure for the Knowledge Economy: Information Systems Development Theory and Practice: Proceedings of the 12th International Conference on Information Systems Development Methods & Tools Theory & Practice, Kluwer Publishing, in press.

* Vrazalic, L. (2004) Extending Usability: The Implications for Human Computer Interaction Methods, to appear in Application of Activity Theory to Education, Information Systems and Business: Proceedings of the International Society for Cultural and Activity Research (ISCAR) Regional Conference, July 12-13.

Vrazalic, L., Hyland, P. & Unni, A. (2004) Regional Community Portals in Australia: Analysing the Current State of Play Using the S3 Model, Proceedings of the International Association for Development of the Information Society (IADIS) Web Based Communities Conference, Lisbon, March 24-26.

MacGregor, R., Vrazalic, L., Hasan, H. & Ditsa, G. (2004) Re-Examining IT Investments: Measuring Long-Term Payback in the E-Commerce Era, Proceedings of the Information Resources Management Association (IRMA) Conference.

MacGregor, R., Vrazalic, L., Schafe, L., Bunker, D., Carlsson, S. & Magnusson, M. (2003) The Adoption and Use of EDI and Non-EDI E-Commerce in Regional SMEs: A Comparison of Driving Forces, Benefits and Disadvantages of Adoption, Proceedings of the International Business Information Management Conference IBIM’03, December, Cairo.

Vrazalic, L., Hyland, P., MacGregor, R. & Connery, A. (2003) Regional Community Portals, Proceedings of the 5th International Information Technology in Regional Areas Conference, 15-17 December, Caloundra.

Vrazalic, L., Stern, D., MacGregor, R., Carlsson, S. & Magnusson, M. (2003) Grouping Barriers to E-commerce Adoption in SMEs: Findings from a Swedish Study, Proceedings of the Australian Conference on Information Systems, 26-28 November, Perth.

MacGregor, R., Vrazalic, L., Bunker, D., Carlsson, S. & Magnusson, M. (2003) The Role of Enterprise Wide Business Systems in the Adoption and Use of E-Commerce in SMEs: A Study of Swedish SMEs, Proceedings of the Australian Conference on Information Systems, 26-28 November, Perth.

* Vrazalic, L. (2003) Website Usability in Context: An Activity Theory Based Usability Testing Method, Proceedings of Transformational Tools for 21st Century Minds National Conference, July 27-29, Gold Coast, Knowledge Creation Press, pp 41-47.

Vrazalic, L. & Hyland, P. (2003) WWW Sites vs. Applications Software: The DAIS Model, CHINZ: Proceedings of the Fourth annual ACM SIGCHI conference on Computer-Human Interaction, July 3-4, Dunedin, New Zealand.

* Vrazalic, L. & Hyland, P. (2002) Towards an Activity Scenario Based Methodology for Usability Testing of Websites, Proceedings of HF 2002 Human Factors Conference, November 25-27, Melbourne.

* Vrazalic, L. (2002) Designing for Diversity: Towards an Activity Theory based Methodology, Proceedings of the Sixth World Multiconference on Systemics, Cybernetics and Informatics, July 14-18, Orlando, pp 414-419.

Vrazalic, L., MacGregor, R.C., Bunker, D., Carlsson, S. & Magnusson, M. (2002) Electronic Commerce Investment and Adoption in Small to Medium Enterprises: Findings from a Swedish Study, Proceedings of the 9th European Conference on Information Technology Evaluation, July 15-16, Université Dauphine, Paris, pp 435-441.

- xvi -

Vrazalic, L., MacGregor, R.C., Bunker, D., Carlsson, S. & Magnusson, M. (2002) Electronic Commerce and Market Focus: Some Findings from a Study of Swedish Small to Medium Enterprises, Proceedings of SMEs in a Global Economy: Sustaining SME Innovation, Competitiveness and Development in the Global Economy, July 12-13, University of Wollongong, Australia.

* Vrazalic, L. (2001) Interfaces for Diverse Cultures: An Activity Theory Approach, Proceedings of INTERACT ’01: IFIP TC.13 International Conference on Human-Computer Interaction, July 9-13, Tokyo, pp 660-664.

Vrazalic, L. & Hasan, H. (2001) Activity Theory Usability Laboratory, Proceedings of INTERACT ’01: IFIP TC.13 International Conference on Human Computer-Interaction, July 9-13, Tokyo, pp 881-882.

* Vrazalic, L. & Gould, E. (2001) Towards an Activity-Based Usability Evaluation Methodology, Proceedings of the 24th Information Systems Research Seminar in Scandinavia, Ulvik, Norway.

* Hasan, H. & Vrazalic, L. (2001) Reconciling Communities of Practice: Lessons from User-Centred Design, Proceedings of OZCHI 2001: Usability and Usefulness for Knowledge Economies, November 20-22, Perth, pp 57-62.

Vrazalic, L. & Gould, E. (2000) Power and Politics: The Changing Status of IS Departments in Organisations, Proceedings of the 23rd Information Systems Research Seminar in Scandinavia “Doing IT Together” (Volume 1), Lingatan, Sweden, August 12-15, pp 741-750.

Non-Refereed Conference Publications

* Vrazalic, L. (2002) Integrating Cultural Diversity into the User Interface: An Activity Theory Approach, Dealing with Diversity, Fifth Congress of the International Society for Cultural Research and Activity Theory, 18-22 June, Vrije Universiteit, Amsterdam, pp. 177-178. (Extended Abstract)

- 1 -

Chapter 1

Introduction

1.1 Introduction

This thesis is situated in the field of Human-Computer Interaction (HCI), and the area of

usability evaluation methods in particular. It aims to improve and add to our knowledge about

usability evaluation methods. Although research has been carried out in this area for decades,

Hartson et al (2001) argue that the area is still in its infancy both as a research topic and an

applied body of knowledge. Furthermore, it is an area that is currently faced with a number of

challenges, as studies, such as those carried out by Molich et al (1998; 1999), raise questions

about the reliability and validity of mainstream usability evaluation methods. This thesis aims to

make a contribution to overcoming some of these challenges in the form of a theory informed

usability evaluation method that will be of use to HCI researchers and practitioners.

This chapter provides an overview of the thesis. The chapter begins by presenting a background

to the research problem. This is followed by a description of the research goals and the research

methodology that will be used to achieve the goals. A brief discussion of the significance of the

thesis ensues. The thesis structure and the contents of each chapter in this thesis are then

described in some detail. Finally, definitions of key terms used in the thesis are provided. For

the purposes of the following discussion the terms “thesis” and “research study” will be used

interchangeably.

- 2 -

1.2 Background

A user interface is the visible part of a computer system that allows the user to access,

communicate with and use the system effectively. The design of the user interface is central to

the way the underlying system is perceived and used because “to most users, the interface is the

system” (Hartson, 1998, p. 104) and even if the system itself is robust, functional and reliable,

poor interface design can not only cause users to make errors, but can sometimes have

catastrophic and fatal consequences. This became apparent in the aftermath of disasters such as

Three Mile Island in 1979 (Norman, 1988), the Challenger space shuttle in 1986 (Tufte, 1997)

and the failure of the London Ambulance Service Computer Aided Despatch System in October

1992 (Finkelstein & Dowell, 1996). The cause of all these disasters was found to be poor user

interface design. Due to our dependence on computer systems in the information age, the risk of

errors and disasters caused by users not being able to operate a poorly designed interface has

increased. However, by designing high quality interfaces we can significantly reduce this risk.

According to Sutcliffe (1995, p. 226) there is no single measure of the quality of an interface.

This poses difficulties in trying to determine what constitutes a high quality interface.

Nevertheless, it is possible to define an interface in terms of its utility and usability. Utility

refers to how well an interface helps a user perform a task. Closely tied to the underlying

functionality of the system (i.e. what functions the system provides), utility represents the task

fit between what a system does and what a user wants to do with it. In essence, utility defines

the usefulness of the system because a useful system is one that provides those functions that a

user requires to perform tasks and achieve goals. Usability refers to the ease of operating the

interface. It represents the effort required by a user to handle a system through the interface

(ibid). A usable system has an interface that is easy and simple to use. The utility and usability

of an interface are inextricably linked concepts. Usability has a direct effect on the perceived

utility of a system. If an interface is poorly designed and difficult to operate, the utility of the

system itself will be perceived as being ineffective. From a user’s point of view, therefore, good

- 3 -

usability at the interface is linked to effective system utility, and effective utility permits a user

to access and use the full range of functions available in a system. This relationship between a

system’s usability, utility and functions is depicted in Figure 1.1. In a high quality interface, all

three are seamlessly integrated so that the functions of the system match the utility, and the

good usability allows a user to fully realise the available utility.

Figure 1.1 The relationship between usability, utility and functionality

Although usability and utility (usefulness) are closely related, Human-Computer Interaction

(HCI) evaluation methods are primarily aimed at assessing and measuring the usability of

systems. These methods are generally known as usability evaluation methods (UEMs) and can

be used at any stage of a system design and development process (Hartson et al, 2001). UEMs

define usability as a property of the system being evaluated in terms of high-level criteria such

as system effectiveness, system learnability and system flexibility (Shackel, 1986) or attributes

such as system efficiency and system memorability (Nielsen, 1993). These criteria and

attributes imply that usability is localised within the system, or at the system interface. This is

the conventional HCI view of usability and has been termed “traditional usability” in this thesis.

Traditional usability is system-focused and, as such, does not take into account the context or

use-situation in which users employ a system to perform real-life activities. As a result,

USABILITY

UTILITY (USEFULNESS)

FUNCTIONALITY

Visible Interface

SYSTEM

Ease of using the system

Support for user tasks

Systemfunctions

- 4 -

traditional usability has been criticised by some researchers (Spinuzzi, 1999; Thomas &

Macredie, 2002; Nardi & O’Day, 1999) as being inadequate because it fails to include the

“tightly woven interdependencies between [systems] and their contexts of use” (Nardi &

O’Day, 1999, p.27). The context of use is critical when evaluating a system. Nardi and O’Day

(1999, p. 30) elaborate further by stating:

“Some design problems originate in a larger context – the social, organisational,

or political setting in which a [system] is used. We consider this larger context to

be a legitimate focus of attention when we evaluate how technology works in a

given setting. Evaluation should not be limited to […] issues such as whether menu

items are easy to find or recognise, though these fine-grained questions must also

be addressed. We would like to move beyond the human-machine dyad, expanding

our perspective to include the network of relationships, values, and motivations

involved in technology use.”

In the above quote, Nardi and O’Day (1999) imply the need to evaluate not just the usability of

systems, but also the usefulness of systems. However, in order to do this, it is necessary to

include the context or use-situation in the evaluation process, because the usefulness of a system

cannot be understood outside the context in which the system is used. Current UEMs are

focused only on the usability of systems because they are based on the traditional notion of

usability, and as such, they do not have the means to account for or describe the context in the

evaluation process. This is a key limitation of current UEMs.

To overcome the key limitation of current UEMs, it is first necessary to re-visit and re-define

the notion of usability on which the UEMs are based. To this end Spinuzzi (1999) has proposed

that the traditional view of usability needs to be extended to include the activity network in

which a system is used. Spinuzzi (1999) terms this “distributed usability”. Instead of being a

property of the system itself, distributed usability implies that usability is a property of the

- 5 -

users’ interaction with the system. This interaction is manifested in the users’ activities, which

involve the use of the system, and other tools, to achieve a goal or outcome. Furthermore, since

most activities are carried out in a social and collaborative context, the users interaction with the

system is also affected by this context. Therefore, to re-define the notion of usability, the users’

activities and the context in which they occur must also be taken into account.

In current UEMs, traditional usability is constructed in relation to the users, their tasks, and the

system itself. This can be seen in the UEM that is considered to be the de-facto standard for

evaluation – laboratory-based usability testing (Landauer, 1995). Laboratory-based usability

testing involves observing users complete a set of pre-defined, short-term tasks which have

been set by evaluators, using the system being evaluated. Users are not involved in setting the

tasks and the evaluation itself is carried out in a laboratory setting, which is removed from the

users’ social context. This enables evaluators to focus on specific aspects of the system that

need to be assessed, while eliminating any external influences or factors that may interfere with

this process.

However, in reality, users employ systems to perform real-life activities which take place in

their social context. These activities are distinctly different to the pre-defined, short-term tasks

that are used in laboratory-based usability testing. Thomas and Kellogg (1989) refer to the

difference between these tasks and users’ real-life activities as “ecological gaps”. If these gaps

are to be bridged, it is necessary to re-develop and improve current UEMs so that systems are

evaluated in relation to users’ real-life activities. The first step towards this is to base the UEMs

on distributed usability, rather than traditional usability.

Distributed usability, in itself, is insufficient to improve current UEMs. A suitable framework

for operationalising this notion must be employed. Cultural Historical Activity Theory (or

Activity Theory as it is widely known) has emerged as a multi-disciplinary theoretical

framework for studying human behaviour and practices as developmental processes (Kuutti,

- 6 -

1996). Activity Theory is a “theory that takes its starting point in human activity as the basic

component in purposeful human work” (Bødker, 1991a). The basic tenets or principles of

Activity Theory offer a powerful clarifying tool (Nardi, 1996b) for understanding and

explaining human activities in context. Amongst these principles are the following:

• human activities as the basic unit of analysis;

• the mediating role of tools in human activities; and

• the social context of human activities.

Activity Theory, therefore, provides a suitable framework for operationalising distributed

usability and achieving the research goals of this thesis, which are described next.

1.3 Research Goals

The primary goal of this research is to re-develop an existing UEM, based on distributed

usability and informed by Activity Theory, in order to overcome the limitations associated with

current UEMs (which are discussed in detail in Chapter 2), and assess both the usability and

usefulness of systems.

The UEM selected to be re-developed is laboratory-based usability testing. This UEM has been

selected for two reasons. Firstly, usability testing is, arguably, the most widely used UEM in

both research and practice. Secondly, it involves users directly in the evaluation process. Since

user involvement is a key factor in system acceptance and the cornerstone of user-centred

design (Preece et al, 1994; Norman, 1996; Bannon, 1991), it is important to ensure user

participation in usability evaluation. The re-development of laboratory-based usability testing

will result in, what has been named, the Distributed Usability Evaluation Method (DUEM).

DUEM is intended to be an evaluation method that assesses the distributed usability of systems

in the context of purposeful human activities.

- 7 -

A secondary goal of this research is to apply DUEM in practice and, in doing so, validate the

method to determine whether it overcomes the limitations associated with current UEMs, as

described above. The following section describes how these goals will be achieved.

1.4 Research Methodology

The two-dimensional framework proposed by March and Smith (1995) has been selected as the

underlying structure for the research methodology used in this thesis. The framework is located

in the domain of design science. Design science is a form of applied research which involves

developing artifacts for human use that can be employed to solve real-world problems (Järvinen,

1999). The term was coined by Simon (1969) to describe research that results in the creation of

artifacts or things that serve human purposes. March and Smith (1995) propose that four types

of artifacts can be developed: constructs, models, methods and instantiations. This research is

concerned with methods (and usability evaluation methods, specifically). To develop artifacts,

design science proposes two key activities: building and evaluation. Building involves actually

constructing an artifact, while evaluation determines how well the artifact works when used in

practice. The purpose of the evaluation activity in design science is simply to determine whether

improvements have been made. How and why an artifact performed or failed are questions that

concern natural science (or basic research). These questions are answered through theorising

and justifying activities.

Based on the four types of research outputs (artifacts) and the four types of research activities

(from design and natural science), March and Smith (1995) derive a two-dimensional

framework with sixteen cells which is shown in Figure 1.2. The figure also indicates the cells in

which this thesis is situated.

- 8 -

Figure 1.2 March and Smith’s (1995) research framework

Based on March and Smith’s (1995) framework, a two-stage research methodology will be used

to achieve the research goals:

1. Stage I: Method building (to develop DUEM)

2. Stage II: Method validation (to evaluate DUEM).

However, the authors do not provide a specific set of steps to follow when building and

evaluating methods, stating that these activities are complex and poorly understood. Therefore,

Goldkuhl et al’s (1998) Method Theory has been selected as the approach for building DUEM.

Method Theory provides a structure for creating new methods that go beyond the original

methods (Goldkuhl et al, 1998, p. 116). Once DUEM has been built, it will be validated by

being applied in practice to evaluate a system. The data collected from the application of DUEM

will be used as evidence to confirm or refute a number of claims made about DUEM. This

approach for validating a method was used by Mwanza (2002), and represents what Iivari

(2003) refers to as “ideational” evaluation. Ideational evaluation demonstrates that an artifact

includes novel ideas and addresses significant theoretical or practical problems with existing

artifacts. In this instance, the ideational evaluation aims to demonstrate that DUEM contains

novel ideas and addresses problems with existing UEMs, and usability testing in particular.

- 9 -

1.5 Significance of the Thesis

The outcome of this research study is a usability evaluation method that is based on an extended

notion of usability and on Activity Theory principles – the Distributed Usability Evaluation

Method (DUEM). This method aims to overcome a number of limitations and problems

associated with current UEMs. If the validation of DUEM demonstrates that this has been

achieved, DUEM will make a significant contribution to existing usability evaluation methods

and practices in a number of ways. Firstly, DUEM is one of the few UEMs that will be

grounded in a theoretical framework (namely, Activity Theory). Secondly, DUEM represents an

operationalisation of distributed usability and Activity Theory principles. To date, there have

only been a handful of attempts to operationalise Activity Theory in HCI. Thirdly, DUEM aims

to be a highly flexible and practical UEM that provides evaluators with options to design the

evaluation to suit their needs and constraints, while at the same time involving users in a direct

and significant way. Finally, DUEM is significant because it enables evaluators to assess the

usefulness of systems in relation to users’ activities. Useful systems are “technologies [that] are

carefully integrated into existing habits and practices” (Nardi & O’Day, 1999, p. 50). DUEM

attempts to assess whether these technologies are genuinely integrated into the habits and

practices of real users and their activities.

1.6 Structure of the Thesis

This thesis consists of six chapters. Chapter 1 has presented a broad overview of the research

study including the background to the study, the research goals and methodology, and the

significance of the study.

Chapter 2 presents a review of the relevant literature. The notions of traditional and distributed

usability (which have been raised above) are discussed in further detail. This is followed by a

- 10 -

description of the most widely used UEMs, including the benefits and limitations of each UEM.

Based on these descriptions, a set of eight specific challenges faced by the HCI community in

relation to UEMs is identified and presented. The UEM developed in this thesis aims to

overcome these challenges. The second part of Chapter 2 provides a detailed historical overview

of Activity Theory, its key principles and its application in HCI. Finally, the Activity Theory

principles which will be used to inform the development of DUEM are mapped to the eight

UEM challenges in order to demonstrate how each principle will contribute to overcoming the

challenges.

Chapter 3 describes the research methodology that will be used to develop and evaluate

DUEM. It begins with a discussion of March and Smith’s (1995) two-dimensional design

science research framework, which is used as the basis for defining a research methodology. A

two-stage research methodology is proposed: method building and method validation. Each of

the stages is described in detail and the steps undertaken in each stage are presented.

Chapter 4 is based on Stage I of the research methodology: method building. It follows the

series of steps that make up this stage of the research methodology to describe how DUEM was

built. The final part of Chapter 4 provides a comparison of DUEM and traditional usability

testing, as well as the potential benefits and limitations of DUEM.

Chapter 5 is based on Stage II of the research methodology and provides a description of the

steps taken to evaluate and validate DUEM. Following the validation process, the actual

benefits and limitations of DUEM are presented.

Chapter 6 draws together the key contributions and conclusions of the study, identifies the

limitations of the study and provides suggestions for future research arising from the study.

- 11 -

1.7 Definitions of Key Terms

For the purposes of this study, the terms below have been defined in Table 1.1 and listed in

alphabetical order.

Table 1.1 List of key terms and their definitions (in alphabetical order)

Term Definition Actions In Activity Theory, the conscious, short-term individual processes

that are goal-oriented and used to translate an activity into reality (Leont’ev, 1978).

Breakdown In Activity Theory, the manifestation of a conceptualisation. Commercial online learning system

A commercial web-based system used by lecturers at the University to provide subject materials to students, engage students in interactive learning activities and provide assessment results.

Conceptualisation In Activity Theory, the movement of operations to the level of actions or activities if the conditions in which the operations take place change.

Contradiction In Activity Theory a misfit within the elements of an activity, between the elements of an activity or between different activities (Engeström, 1999).

Evaluator(s) Any individual (or individuals) taking part in a system evaluation. Object In Activity Theory, the ‘ultimate’ motive of an activity towards

which it is directed. Activities are distinguished by their objects. Online student enrolment system

Web-based system used by University students to enrol in subjects and manage their enrolment. Available on the University’s web site.

Operations In Activity Theory, the means by which actions are executed. Operations are suppressed to the unconscious level.

Student Administration Services

Unit at the University that manages student enrolments and handles all administrative matters related to students.

Student Management System

Web-based system used by administrative and academic staff at the University to manage student marks and results.

Tool mediation Activity Theory principle which states that a tool is an instrument that serves as the facilitator or conductor of a person’s activity.

Traditional usability testing

Laboratory-based usability testing as described by Rubin (1994) and Dumas and Redish (1993).

Usability Evaluation Method (UEM)

A method or technique which is used to conduct formative evaluation at any stage of the design and development process (Hartson et al, 2001).

For the purposes of this study, the following terms will be used interchangeably:

1. System, interface, computer and technology

2. Web-based and online

3. User(s) and participant(s) [in an evaluation]

- 12 -

Chapter 2

Literature Review

2.1 Introduction

The purpose of evaluation methods in Human-Computer Interaction (HCI) is to evaluate the

usability of interfaces. This chapter presents a review of the literature and previous research on

evaluation methods in HCI. To fully understand the role of these methods and establish their

effectiveness, it is necessary to examine their object of study – usability – more closely. The

literature review will begin by discussing and ‘unpacking’ the notion of usability as it has been

traditionally applied in HCI. This will be followed by an emerging view of usability that is

termed “distributed usability”. The role of evaluation in designing for usability will then be

briefly presented, prior to describing evaluation and usability evaluation methods (UEMs) in

general.

Considering the breadth and depth of the field, a large number of UEMs are available to both

researchers and practitioners. In order to make sense of all the different UEMs, several UEM

taxonomies are described. This is followed by a brief description of the most widely used

UEMs, which provides an insight into the benefits and limitations of each. Based on these

descriptions, eight UEM challenges that exist in the area and need to be overcome are identified.

Finally, Cultural Historical Activity Theory (or Activity Theory as it is widely known) is

proposed as a theoretical framework that will be used to inform the design and development of a

UEM aimed at overcoming the identified challenges. A diagrammatic representation of the

structure of this chapter is shown in Figure 2.1.

- 13 -

Figure 2.1 Structure of Chapter 2

For the purposes of the discussion, the term World Wide Web site will be abbreviated to web

site.

Traditional Usability

Distributed Usability

Evaluation

Usability Evaluation Methods (UEMs)

Taxonomies of UEMs

Expert-Based UEMs User-Based UEMs Model-Based UEMs

UEM Challenges

Cultural Historical Activity Theory

- 14 -

2.2 Traditional Usability

“Many developers dream of an algorithm giving an exact measure of the usability of a product.

Ideally, one could take the source code of a program, run it through an analysis program giving

a single number in return: Usability = 4.67”

(SINTEF Group, 2001)

Human-Computer Interaction (HCI), as a discipline, is “concerned with the design, evaluation

and implementation of interactive computing systems for human use and with the study of

major phenomena surrounding them” (ACM SIGCHI, 1992, p. 6). The notion of usability is

fundamental to HCI. Usability is an abstract concept intended to encase and encompass both the

design and evaluation of computer systems. It is the glue that binds the entire systems design

and development process together. To some, usability is a science. To others, it is an art form.

As such, usability is a concept that does not lend itself to a precise and clear-cut definition. It

has been stated previously that usability refers to the ease of operating a system interface. This

statement may appear to be an oversimplification considering the plethora of other, seemingly

more comprehensive, definitions of usability that exist. However, most of these definitions are

based around the central notion of “ease of use” (Miller, 1971, cited in Shackel, 1986). For

example, the Institute of Electrical and Electronics Engineers’ Standard Computer Dictionary

defines usability as “the ease with which a user can learn to operate, prepare inputs for, and

interpret outputs of a system or component” (IEEE, 1990). The notion of ease of use is, in itself,

somewhat vague, because what is easy to use for some may not be so easy for others.

Consequently, some authors and researchers have defined usability in terms of multiple high-

level criteria, attributes and principles.

Shackel (1986) proposes that usability can be specified and measured numerically in terms of

four high-level operational criteria:

- 15 -

a) Effectiveness: a required level of performance by a percentage of specific users

within a range of usage environments;

b) Learnability: a pre-defined time period from the start of user training and based on

a specified amount of training;

c) Flexibility: the levels of adaptation and variability in possible tasks; and

d) Attitude: user satisfaction levels after continued use.

Nielsen (1993) views usability as a narrow concern when compared to the issue of system

acceptability, and models usability as an attribute of system acceptability. However, like

Shackel (1986), Nielsen argues that usability itself can be further broken down into five

defining attributes:

1. Learnability: the system should be easy to learn;

2. Efficiency: the system should be efficient so that high levels of productivity are

possible;

3. Memorability: the system should be easy to remember and not require re-learning:

4. Errors: the system should have low error rate and enable quick recovery after errors;

and

5. Satisfaction: the system should be pleasant to use.

Nielsen’s conceptualisation of usability is shown in Figure 2.2. Although there is some overlap

between Shackel’s (1986) and Nielsen’s (1993) views, they are two distinct definitions of what

is purported to be the same concept.

- 16 -

Figure 2.2 A model of the attributes of system acceptability (Nielsen, 1993)

Preece et al (2002) add safety to their list of defining criteria and attributes, describing it as

protecting users from dangerous and undesirable situations.

In contrast to these high-level criteria and attributes, Norman (1988) conceptualises usability in

terms of design principles based on a combination of psychological theory and everyday user

experiences. The aim of these principles is to help designers make improvements to the system

and explain different aspects of their design to the various stakeholders (Thimbleby, 1990).

While numerous principles have been applied in HCI for this purpose, the most well-known

have been proposed by Norman (1988) and include:

1. Visibility: making the system functions visible so that each function corresponds with a

control;

2. Mapping: a direct, natural relationship between the controls and their functions;

3. Affordance: perceived and actual properties of an object that determine how it can be

used;

4. Constraints: limiting the behaviours and possible operations on an object; and

5. Feedback: sending information back to the user about what action has been done.

- 17 -

The general conclusion from the previous discussion of definitions that characterize usability in

terms of criteria, attributes or principles is that usability is located within the system itself. For

example, usability can be defined in terms of the ‘effectiveness’ of a system, the ‘memorability’

of a system, the system ‘constraints’, etc. This view is supported by the notion that usability can

be thought of purely as an attribute of the entire package that makes up a system (Dumas &

Redish, 1993). However, the ISO 9241-11 international standard defines usability as “the extent

to which a product can be used by specified users to achieve specified goals with effectiveness,

efficiency and satisfaction in a specified context of use” (1998). This definition highlights three

other elements, in addition to the system itself, which are external to the system, but are

important in describing usability. These are “specified users”, “specified goals” and a “specified

context”.

The ISO 9241-11 definition of usability implies that every system is situated in a context that

consists of users trying to achieve goals. The inclusion of the context in this definition is

important because, to be usable, systems must fit into the fabric of everyday life (Beyer &

Holtzblatt, 1998). Since contexts, users and goals are varied and evolve over time, the usability

of a system itself must be continually re-constructed and re-generated in response to shifting

contexts, user diversity and changes in users’ goals. For example, the usability of a safety

critical system in a nuclear power plant is defined in different terms compared to the usability of

a university’s web site. Furthermore, the usability of the safety critical system at the Three Mile

Island nuclear power plant may have been re-defined following the accident in 1979. Likewise,

the usability of the university’s web site might be defined differently if attracting international

students becomes an important goal of the university. Therefore, the meaning of usability is

constructed depending on the users, their tasks and goals, the system and the context or

environment in which it is used. Traditional definitions of usability such as those proposed by

Shackel (1986), Nielsen (1993) and Norman (1988) do not account for the elements that are

external to the system (the users, their tasks and goals, and the context). Those definitions that

- 18 -

do, such as the ISO 9241-11 definition, do not explain the relationships between all the different

elements.

Many researchers (Thomas & Macredie, 2002; Spinuzzi, 1999; Nardi & O’Day, 1999; Beyer &

Holtzblatt, 1998; Karat, 1997; Nardi, 1996a; Engeström & Middleton, 1996; Hutchins, 1995;

Bevan & Macleod, 1994; Kling & Iacono, 1989) are increasingly critical of these traditional

definitions of usability for a number of reasons. Traditional definitions are system-focused and

technology-oriented. They fail to take into account different types of contexts (social, cultural,

physical) and they do not explain the relationship between users, systems and contexts in actual

use-situations. They also fail to incorporate the evolutionary aspect into usability as a

constructed notion that changes with shifts in the context, users and goals. Thomas and

Macredie (2002) go so far as to argue that the traditional conceptions of usability are ill-suited,

unwieldy, meaningless and unable to handle the new “digital consumer” (p. 70). An alternative

view of usability has been proposed in response to the limitations of the traditional view. It has

been termed distributed usability and will be discussed in the following section.

2.3 Distributed Usability

Spinuzzi (1999) argues that the traditional view of usability, described in the previous section, is

inadequate because it localises usability as an attribute or quality of a single system and

disregards the relationships between the system and contextual factors such as the interaction

between humans, the use of tools other than the system, and the actual work practices of users.

In Spinuzzi’s (ibid) view, usability is distributed across an activity network or system which is

comprised of assorted and interrelated genres, practices, uses and goals of any given activity.

An activity network represents a unit of analysis that takes into account individual users,

working with others, as part of a larger activity. This definition implies that usability is a

property of the interaction with a system in a collaborative context, rather than the system itself.

Thinking about usability in this way provides a more complete and holistic view of the system

- 19 -

and its interface as defined in terms of the users, their goals and surrounding environment. It

leads us to consider solutions that we might not have previously. For example, in his study,

Spinuzzi found that breakdowns in the users’ interaction with the system studied were not

caused just by the size of the mouse pointer or even the levels of user training, but could be

attributed to ‘deeper discoordinations’ between the interface and other genres (e.g. computer

screens (with menus, dialogue boxes); reports; maps; handwritten notes; software manuals; etc.)

relating to the users’ goals and the context of use. Such an interpretation of the system’s

usability is only possible because the entire activity network and the role of the system in this

interconnected network are analysed. It would not have been possible if only localised attributes

of the interface, such as ease of use, learnability and memorability, were examined.

Spinuzzi (ibid) advocates the study of on-screen (i.e. computer based) and off-screen (i.e. non-

computer based) genres, or typified forms, and their relationships in the context of a system of

interrelated tools and activities. Nardi and O’Day (1999) view this system or arrangement of

tools as belonging to an information ecology. They define an information ecology as a “system

of people, practices, values, and technologies in a particular local environment” (p. 49) which

focuses on human activities served by technology, rather than the technology itself. An

information ecology is fundamental to distributed usability because Spinuzzi (ibid) views

usability as being distributed across the ecology or activity network. The tools that make up the

information ecology are said to mediate the users’ activities. The mediating role of the tools

suggests that the users’ activities are facilitated by intermediate entities (i.e. the tools).

Where traditional views of usability usually relate only to a system consisting of hardware and

its associated software, distributed usability is constructed in relation to the relationships

between four elements: the users, their tasks, the environment and the system. These elements

are connected through a series of mediating relationships. This implies that they cannot be

understood individually because together they make up an information ecology that is greater

than the sum of its parts.

- 20 -

Distributed usability is constructed and situated in an ecology consisting of people, their

activities (which are mediated by the system and other types of tools), and the social, physical

and historical context of these activities.

The distribution of usability across this ecology has significant implications for both the design

and evaluation of systems. Clearly, the design process could not proceed without taking into

account and devoting substantial resources to understanding the larger context of user activities,

and incorporating this context into the design process. The implications for system evaluation,

however, are even more considerable in light of existing evaluation methods, which are based

on the traditional view of usability and therefore primarily focus on the system and its localised

attributes in isolation. Distributed usability extends this view to include the system utility or

usefulness in relation to users’ activities in a specific context because it requires a task fit

between the users and those activities. In light of this, existing evaluation methods need to be

re-considered, because in their current form, they may not be able to assess distributed usability.

However, the role of evaluation in designing for usability will be discussed first.

2.4 Designing for Usability

The term “designing for usability” was coined by Shackel (1986) to describe a design approach

aimed at incorporating human factors into the design and development process. According to

Shackel (ibid) designing for usability involves addressing five fundamental features. At least

three of those (user centred design, experimental design and iterative design) were proposed

earlier by Gould and Lewis (1983). The fundamental features are:

1. User centred design: focusing on users and their tasks from the beginning of the design

process;

- 21 -

2. Participative design: involving users in the design process as members of the design

team;

3. Experimental design: conducting formal usability evaluations of pilots, simulations

and full prototypes;

4. Iterative design: repeating the “design, test, re-design” cycle until results meet the

usability specification;

5. User supportive design: providing training, manuals, references and help materials to

users.

The involvement of users at all times is fundamental to designing for usability. Shackel’s (ibid)

approach suggests that this involvement extends to carrying out usability evaluation with users

iteratively during several stages of the design and development process. The implication is that

evaluation drives the entire process because incorporating evaluation at every stage (pilots,

simulations and full prototypes) will determine the direction of the development process.

Whiteside, Bennet and Holtzblatt (1988) coined the term “usability engineering” to represent a

similar approach to system design and development. Several authors have made contributions to

the area since then, including Butler (1996), Wixon and Wilson (1997) and, most notably,

Nielsen (1993) in his book titled “Usability Engineering”. According to Nielsen (1993),

usability engineering is a set of activities that takes place throughout a product lifecycle aimed

at specifying, assessing and improving the usability of a system. One of the key activities is the

empirical testing of prototypes, suggesting that continuous evaluation using prototypes is central

to the usability engineering approach.

The pivotal role of evaluation is even more apparent in the Star development lifecycle,

developed by Hartson and Hix (1989) and based on empirical research about the practices of

real interface designers. The Star lifecycle does not prescribe a specific ordering of design and

development activities, unlike most other lifecycle models. Instead, a design project can begin

with any one of six activities shown in Figure 2.3 (adapted from Preece at al, 2002, p. 193).

- 22 -

However, as Figure 2.3 clearly indicates, the evaluation activity is at the very core of the Star

lifecycle. The double-sided arrows indicate that evaluation is undertaken following the

completion of any one of the other activities shown, and the results of the evaluation are fed

back into the activity for further iteration.

Figure 2.3 The Star lifecycle (Hartson & Hix, 1989; adapted from Preece et al, 2002)

Having established the critical role of evaluation in the systems design and development process

and, consequently, intimated the importance of evaluation methods, it is now appropriate to

examine the notion of evaluation and evaluation methods more closely.

2.5 Evaluation

The preceding section demonstrated that evaluation is the cornerstone of designing for usability.

Evaluation paves the way for initiating a new design, steers the design process at different

stages and brings to a close the system development life cycle. Preece et al (2002) define

evaluation as “the process of systematically collecting data that informs us about what it is like

for a particular user or group of users to use a product for a particular task in a certain type of

- 23 -

environment” (p. 317). This definition implies both the goals and the role of evaluation. The

fundamental goals of evaluation are to assess the usability of a system (Rosenbaum, 1989) and

identify problems with the design so that improvements can be made (Dumas & Redish, 1993).

Where the assessment of usability is done by gathering performance data about a system in

order to determine whether it meets predefined criteria (Cordes, 2001), this type of evaluation is

referred to as comparative evaluation because the performance of the system is being compared

to a set of standards. The role of evaluation is to inform the system design and development

process. It is context and system dependent, and may encompass several of the above goals

(Scriven, 1967).

It is possible to distinguish between different types of evaluation based on three factors: the

motive or purpose of the evaluation, the types of measures used and the time in the system

design and development lifecycle that the evaluation takes place. Depending on the motive or

purpose, evaluation can be categorised as diagnostic or benchmarking. Diagnostic evaluation,

as the name suggests, involves identifying and diagnosing problems and errors with the system.

In contrast, the purpose of benchmarking is to compare the system against a set or standards of

usability goals that are based on market analysis (Sutcliffe, 1995).

Evaluation can be objective or subjective, according to the types of measures used (ibid).

Objective evaluation requires data collection by empirical means. For example, conducting a

formal survey of the system performance. Subjective evaluation is based on personal

judgements and opinions. For example, interviewing users about their attitude towards the

system.

Finally, depending on the time in the lifecycle that the evaluation is carried out, it can be

categorised as formative or summative (Scriven, 1967). Formative evaluation is carried out

during the system design and development process in order to reveal deficiencies and make

gradual improvements to the different versions or prototypes of the design and ensure that they

- 24 -

meet specific standards and criteria at an intermediate stage of the design lifecycle. Scriven

(1967) also refers to this type of evaluation as process research. Summative evaluation, on the

other hand, is done after the system design and development process is complete to assess the

final design. Summative evaluation can either be absolute (assessing the final system on its own

merit) or comparative (assessing the final system in relation to a set of standards or similar

existing systems) and is often regarded as the more rigorous form of evaluation involving

formal experiments (Hartson et al, 2001). The results of summative evaluation may be used as

the basis for initiating a new system design after a certain period of time.

The motive or purpose of the evaluation (diagnostic vs. benchmarking), the types of measures

used (objective vs. subjective) and the time in the system design and development lifecycle that

the evaluation is carried out (formative vs. summative) can be thought of and modelled as three

dimensions of evaluation. Figure 2.4 shows a graphical representation of the three dimensions.

Different evaluation methods and techniques can be situated in the eight three dimensional

quadrants, depending on the purpose of the evaluation, the measures used and the time that the

evaluation takes place. Since an evaluation can incorporate any number of evaluation methods

and these methods can generally be applied in different ways, the proposed model can be used

to identify which evaluation methods fit into each of the eight quadrants. Evaluators can then

select a target quadrant (or quadrants) based on the evaluation purpose, the required measures

and the time in the lifecycle, and decide which evaluation methods in the target quadrant would

suit their needs.

- 25 -

Figure 2.4 A model of three evaluation dimensions

The previous discussion has provided some background into evaluation in general. The

following section will focus specifically on usability evaluation methods.

2.6 Usability Evaluation Methods

Methods used to evaluate the usability of computer systems have been of special interest to HCI

researchers and practitioners, and consequently various methods have been developed over the

years to asses the extent to which a system’s actual performance conforms to its desired

performance (Long & Whitefield, 1986, cited in Whitefield at al, 1991). According to Hartson

et al (2001) a usability evaluation method (UEM) refers to a method or technique which is used

to conduct formative evaluation at any stage of the design and development process. Summative

studies are generally excluded from this definition. Gray and Salzman (1998) characterise

UEMs as being analytic or empirical. Analytic UEMs are concerned with the effects of the

intrinsic features of an interface on usability and include various techniques such as heuristic

Formative

Summative

Objective

Subjective

Diagnostic Benchmarking

- 26 -

evaluation (Nielsen and Molich, 1990), cognitive walkthroughs (Lewis et al, 1990), GOMS

(Card et al, 1983), formal usability inspections (Kahn & Prail, 1994) and expert reviews

(Bradford, 1994). These methods are usually used to predict a system’s behaviour (Whitefield et

al, 1991). Empirical UEMs refer to methods which involve experimental techniques, and are

generally known as user testing methods or usability testing. Despite the large number of UEMs

developed over the last two decades, Hartson et al (2001) argue that the area is still in its

infancy as both a research topic and an applied body of knowledge because standard UEM

methods have not been agreed upon by researchers or practitioners. This problem has been

exacerbated by the changing definitions of UEMs (Gray and Salzman, 1998) and the lack of

standards in applying UEMs due to the dissimilarity of design entities and processes (Whitefield

et al, 1991). As a result, it has become increasingly difficult to develop a single taxonomy of

UEMs that researchers and practitioners alike could use to identify, compare and choose

suitable methods. Instead, several different taxonomies of UEMs have emerged, based on

varying criteria.

2.7 Taxonomies of Usability Evaluation Methods

A formal taxonomy of UEMs is not only beneficial to the selection of an appropriate UEM to

meet specific evaluation goals at different stages of the lifecycle, but also to compare the

effectiveness of various UEMs. Numerous researchers (Whitefield et al, 1991; Dix et al, 1998;

Reiterer & Oppermann, 1993) have suggested that the compilation of such a taxonomy poses a

series of difficulties due to the diverse nature of UEMs. Fitzpatrick (1999) also suggests that it

is due to a lack of universally accepted names of generic UEMs. Most UEMs have evolved over

time through use and their definitions have changed. However, these historical changes have

never been formally documented in research. Furthermore, there is ongoing confusion in the

literature in the usage of terms such as method, technique, model, tool and framework when

applied to usability evaluation (Fitzpatrick & Dix, 1999). Nevertheless, several classifications of

UEMs have been put forward (Whitefield et al, 1991; Fitzpatrick, 1999; Howard & Murray,

- 27 -

1987; Wixon & Wilson, 1997; Dix et al, 1998; Preece at al, 2002) although most suffer from a

lack of specific criteria used and do not necessarily include the multiplicity of methods

available. A brief overview of these classifications will be presented next to demonstrate this

claim.

2.7.1 Whitefield at al’s Framework for Evaluation

Whitefield et al (1991) use the term ‘framework for evaluation’ to refer to their classification of

UEMs. The framework for evaluation is based on a need, identified by the authors, to improve

the evaluation process in practice. This improvement can be achieved “by clarifying what can

be done towards which goals and how it can be done” (p. 70). They argue that this type of

grouping includes a full range of methods and illuminates the differences between them.

The framework for evaluation is based on the presence of the user and the computer (the two

components that make up a system), which are categorised as being real or representational. The

presence of a real computer implies the existence of an implemented system, prototype or

simulation, while symbolic mental representations (such as notational models and

specifications) are representational computer systems. The presence of real users means being

able to involve actual users in the evaluation. This is in contrast to representational users, which

are explicit or implicit descriptions or models of users. Based on these criteria, four classes of

UEMs are derived and positioned in a two-by-two matrix, shown in Figure 2.5. The classes are

Analytic methods, User reports, Specialist reports and Observational methods.

- 28 -

Figure 2.5 Framework for evaluation (Whitefield et al, 1991)

The authors argue that this framework allows the identification of possible evaluation

statements, methods, and areas of system performance and user behaviour that could be

examined (Whitefield et al, 1991). However, they concede that it is difficult to show how the

framework would actually lead to improvements in practice, as intended.

2.7.2 Fitzpatrick’s Strategies for Usability Evaluation

Fitzpatrick (1999) suggests that the main problem associated with Whitefield et al.’s (1991)

framework, is that it limits methods to mutually exclusive quadrants, when in fact they can be

used in different situations. For example, user reports are confined to situations where

representational computers and real users are available. In reality, however, it is desirable to

make use of reports with real computers as well. He proposes that strategies should be used as

the basis for understanding usability evaluation and that these strategies employ various

evaluation methods. His contribution is a two-by-two matrix of strategies for evaluating

- 29 -

usability. When selecting the most appropriate strategy, it is necessary to take into account the

stage in the lifecycle when the evaluation takes place and the desirability of employing multiple

UEMs.

Using an inverted version of Whitefield at al’s (1991) matrix as the basis for his proposed

taxonomy (to better represent the low/high measure associated with two-by-two matrix

diagrams), Fitzpatrick (1999) identifies four strategies, shown in Figure 2.6. As the design

lifecycle progresses from analysis to implementation, a suitable corresponding strategy can be

selected. For example, when a high fidelity prototype of the computer is available and actual

users can be found to carry out tasks using the prototype, the Real World strategy can be

adopted. This strategy in turn determines the appropriate UEMs that can be employed. In the

previous example of the high fidelity prototype, this suggests the use of methods such as

observation, questionnaires, interviews, etc.

Figure 2.6 Strategies for usability evaluation and associated UEMs (Fitzpatrick, 1999)

In essence, Fitzpatrick’s (1999) taxonomy is important at the conceptual level because it makes

a distinction between usability strategies and methods and therefore overcomes the problem

- 30 -

associated with Whitefield et al’s (1991) classification. However, the bases of the proposed

strategy matrix are not explicitly defined other than stating that the selection of a strategy

depends on the stage in the lifecycle when the evaluation takes place.

2.7.3 Howard and Murray’s Classes

Howard and Murray (1987) identify five classes of explicit evaluation. Explicit evaluation

involves the systematic collection and analysis of data using what they term evaluation

“techniques”, instead of UEMs. (This adds substance to Fitzpatrick’s (1999) claim about the

confusing usage of terminology in HCI evaluation.) While the authors confess to the lack of

criteria used as the basis of their taxonomy, they suggest that the classes should not be treated as

being mutually exclusive. They also emphasise that the taxonomy does not make any provisions

for subjective assessments, such as expert opinions, that may be undertaken during the design

process. The five classes are described as follows:

1. Expert based: An expert employs domain knowledge and scientific principles to

evaluate the system. For example, heuristic evaluation or a cognitive walkthrough.

2. Theory based: Models of the users and the system are employed to map relationships

between the formal representations of the user and the system. For example, GOMS

analysis (Card et al, 1983) and production system analysis (Kieras & Polson, 1985).

3. Subject based: Evaluation involving four components – the system, the task, the

subject and the metric. For example, evaluation in formal laboratory conditions.

4. User based: A personal evaluation of a system by its users. The authors do not provide

an example of this type of evaluation.

5. Market based: The final evaluation of a system based on its market performance. For

example, benchmarking against similar systems available on the market.

Apart from the lack of criteria on which Howard and Murray’s (1988) taxonomy is based, there

are several other problematic issues, most notably the unclear distinction between subject and

- 31 -

user evaluation, and the overlap between different classes of evaluation (for example, market

based evaluation can be done by experts or users). Finally, there is no explicit indication of

methods for conducting market-based evaluation.

2.7.4 Wixon and Wilson’s Conceptual Dimensions

Wixon and Wilson (1997) suggest that UEMs can be classified along five conceptual

dimensions, with each dimension being, in theory, logically independent. Together the

dimensions create a multidimensional space. These dimensions are:

1. Formative vs. summative methods: The difference between these types of evaluation

methods has been described previously in Section 2.5.

2. Discovery (qualitative) vs. decision (quantitative) methods: Discovery methods

explore how users work, think or behave, while decision methods are used to choose

between designs.

3. Formalised vs. informal methods: Formalised methods have been described formally

in the literature, while others are practised informally.

4. Users involved vs. no user involvement: Some methods involve users, while others are

reliant on experts and designers to carry out the evaluation.

5. Complete vs. component methods: Complete methods consist of a series of steps,

while component methods represent only one part of the entire process.

Wixon and Wilson (1997) argue that any UEM can be mapped along any range of these five

dimensions depending on how it is used. For example, heuristic evaluation is a summative,

discovery method which has been formalised, doesn’t involve users and is complete. However,

it can be used as a decision method in specific circumstances. Wixon and Wilson (1997) view

this framework, consisting of five dimensions, as being important because of the hybrid uses of

UEMs. They make the claim that placing different UEMs within such a framework is more

- 32 -

important than being able to compare them directly, although they do not provide a supporting

argument for this claim.

2.7.5 Dix et al’s Classification

Dix et al (1998) identify eight factors to distinguish between various evaluation techniques. (It

should be noted that Dix et al (1998) use the terms “method” and “technique” interchangeably).

Those factors include: the stage of the design lifecycle, the style of evaluation (laboratory vs.

field studies), the level of subjectivity or objectivity of the UEM, the types of measures

provided (qualitative vs. quantitative), the information provided by the UEM, the immediacy of

response, the level of intrusiveness of the UEM and the resources required. Based on these

factors, Dix et al (1998) propose a classification consisting of analytic, experimental and query,

and observational evaluation techniques and place each UEM into one of the these three

categories. This is shown in Figure 2.7. Dix et al’s (1998) classification “is intended as a rough

guide only” (p. 440). It is not clear how the authors derived the three categories based on the

eight factors identified. There also appears to be a mismatch between some of the techniques

and the category they have been placed into. For example, laboratory experiments belong in the

experimental as well as the observational categories.

Figure 2.7 Classification of usability evaluation techniques (Dix et al, 1998)

- 33 -

2.7.6 Preece at al’s Evaluation Paradigms

Preece et al (2002) argue that, due to the loose terminology in the field, UEMs should be

organised according to four core evaluation paradigms, each identified by a set of beliefs and

practices. Like Dix et al (1998), Preece at al (2002) do not make a distinction between methods

and techniques. Instead, each technique or method is associated with a particular paradigm. In

contrast to other taxonomies, it is interesting to note that usability testing is singled out as a

paradigm in its own right, rather than a UEM or technique. Like most of the other taxonomies

discussed, however, Preece et al’s (2002) taxonomy suffers from a lack of formal criteria on

which it is based. It is shown in Figure 2.8.

Figure 2.8 Core evaluation paradigms (Preece et al, 2002)

2.7.7 UEM Taxonomies: A Commentary

Based on the previous discussion, a number of observations regarding existing taxonomies can

be made. Most UEM taxonomies are based on arbitrary classification criteria, usually related to

the stage in the lifecycle when the evaluation method is used. Other UEM taxonomies are not

based on any criteria and have been proposed on a random basis. The end result is a disorderly

and questionable set of taxonomies that place the same UEM in different categories. For

- 34 -

example, heuristic evaluation is classified as analytic by Dix et al (1998), predictive by Preece

et al (2002) and specialist by Whitefield et al (1991). This situation has created a great deal of

confusion in HCI research and practice communities when it comes to choosing the most

appropriate or the best UEM to use. This defeats one of the purposes of developing a taxonomy,

which is to enable comparisons between different methods on a set of criteria. None of the

taxonomies presented above reveal any information that would enable such a comparison to

take place.

Furthermore, there appears to be an inconsistent use of the terminology in different taxonomies.

The words ‘framework’, ‘strategies’ and ‘paradigms’ are all used to refer to a classification

scheme, while the terms ‘method’ and ‘technique’ are used interchangeably within the same

taxonomy. The lack of standard definitions that can be referred to has resulted in anomalies such

as usability testing being referred to as a technique, method and paradigm in different

taxonomies.

Finally, several of the taxonomies above are clearly based on the traditional notion of usability

because they are concerned primarily with classifying UEMs in relation to the system, ignoring

the larger context in which it is situated, and the users’ tasks and goals. For example, Whitefield

et al’s (1991) framework and Fitzpatrick’s (1999) strategies are based on the presence of a real

or representational computer. None of the taxonomies specifically refer to the users’ activities or

the context as one of the underlying criteria. These are both highly relevant criteria because they

highlight the need for taking into account the users’ needs and the system context when

choosing the most appropriate UEM.

The different types of UEMs available to and commonly used by researchers and practitioners

will now be discussed in some detail.

- 35 -

2.8 An Overview of Different Usability Evaluation Methods

Describing every single UEM and UEM variation is beyond the scope of this thesis because of

the large number of UEMs available. This section will, therefore, focus only on those UEMs

commonly used in the systems design and development process. For the purposes of this

discussion only, the different UEMs will be classified depending on whether users are involved

in the evaluation process. Those UEMs that do not involve users will be categorised further

depending on whether they are carried out by experts or purely theoretical in nature. This results

in three final categories of UEMs: expert-based, model-based and user-based. The reasons for

selecting these three categories are twofold. The categories are widely known and accepted as

convention in the HCI community, and they do not result in any overlapping UEMs. Each UEM

discussed can be allocated to a single category, thus facilitating the discussion process by

avoiding unnecessary complexity. A diagrammatic representation of the three categories and the

UEMs that will be presented under each category is shown in Figure 2.9.

Figure 2.9 UEM categories and UEMs to be discussed

The purpose of the following discussion is to highlight some of the challenges associated with

current UEMs and ‘set the stage’ for the development of a UEM that aims to overcome these

challenges. In the ensuing sections, each UEM will first be described briefly and then it’s

strengths and weaknesses will be presented. It should be noted here that although it is possible

Usability Evaluation Methods

User-Based Non User-Based

Expert-Based Model-Based Heuristic Evaluation Guideline Review Standards Inspection Cognitive Walkthrough Heuristic Walkthrough Cognitive Jogthrough Pluralistic Walkthrough

Keystroke Level Model Model Mismatch Analysis

Formal Experiments Field studies User (usability) testing

- 36 -

to delineate between the different UEMs and describe the distinguishing features of each one, in

practice, they are often applied in combination. For example, expert-based UEMs may be used

in conjunction with usability testing. This approach is highly beneficial because some of the

UEMs complement each other well.

2.8.1 Expert-Based UEMs

Expert-based UEMs, as the name implies, involve an expert or team of experts who inspect the

system interface and predict the types of problems users would have interacting with the system

(Preece et al, 2002). Sometimes referred to as usability walkthroughs (Karat et al, 1992) but

better known as usability inspection methods, these UEMs emerged in the early 1990s in

response to the length of time, high costs, and complexity associated with user-based UEMs

(Desurvire, 1994; Nielsen, 1994a). According to Nielsen (1994a), expert-based UEMs are

cheap, fast, easy to use and can be carried out at any stage of development lifecycle, as long a

prototype of the interface exists. This is because usability inspections do not require the

evaluator to actually use the system. Furthermore, usability specialists, software developers, end

users or other types of professionals who are knowledgeable about user interfaces can carry out

the evaluation. The underlying premise of these types of UEMs is the reliance on the judgement

of experts or usability inspectors to provide evaluative feedback (Nielsen, 1994a). Due to their

subjective nature, usability inspection methods are non-empirical (Virzi, 1997) and some

authors (Bailey, 1993; Tullis, 1993) have questioned their reliability following studies

indicating that experts experienced problems when trying to predict human performance.

There are a number of different expert-based UEMs in use. Virzi (1997) suggests that they vary

on three defining dimensions:

1. The level and type of usability expertise of the evaluators: whether the evaluators are

formally trained in usability or whether they are non-experts in the usability domain;

- 37 -

2. The number of evaluators in a single session: whether a single expert or a team of

experts is conducting the evaluation; and

3. The goals of the evaluation: whether the inspection aims to find general usability

problems, problems associated with ease of learning or assessing compliance with

standards.

Based on these dimensions, Virzi (1997) proposes four loose groupings of expert-based

methods: resource-constrained methods, usability-expert reviews, cognitive walkthroughs and

group design reviews. These groupings will be used as the basis for the discussion to follow

about various expert-based UEMs that have been developed over the years. A special type of

UEM known as Heuristic Walkthrough will be addressed in a separate section because it lies

across two of the groupings (resource-constrained methods and cognitive walkthroughs).

2.8.1.1 Resource-Constrained Methods

Resource-constrained methods are employed in situations where usability expertise is not

available or is too expensive to bring in. Various resource-constrained methods have been

developed to allow non-experts to carry out usability inspections. The most well-known of these

is heuristic evaluation.

Heuristic evaluation is a discount usability engineering method (Nielsen, 1989; 1993), one of a

family of methods proposed by Nielsen to provide evaluators with an inexpensive and fast way

of assessing the usability an interface. Nielsen’s argument for proposing discount usability

engineering methods was that simpler methods have a better chance of being used even though

they may not be the best methods to use, implying that some usability evaluation is better than

none (Nielsen, 1993). Heuristic evaluation involves a small group of evaluators assessing the

interface to determine its compliance with a set of recognised usability principles known as

‘heuristics’ (Nielsen, 1994a, p. 26). Originally proposed by Nielsen and Molich (1990),

heuristic evaluation has arguably become the most widely used and recognised UEM owing to

- 38 -

its apparent simplicity and low cost. Nielsen (1994a) estimates the benefit/cost ratio of heuristic

evaluation to be 48, implying that the benefits of heuristic evaluation are 48 times greater than

the costs involved.

Heuristic evaluation is a two-stage process. In the first stage, three to five evaluators first

individually review the interface based on a set of pre-defined heuristics. Heuristics are high-

level principles that describe the desirable attributes that usable interfaces should possess.

Nielsen and Molich (1990) initially proposed a list of ten heuristics. These were later revised by

Nielsen following a factor analysis of 249 usability problems (Nielsen, 1994a). This revised set

of usability heuristics is shown in Figure 2.10. Nielsen (ibid) recommends several passes

through the interface so that evaluators can first get a feel for the system and then focus on

specific elements of the interface on successive passes.

Figure 2.10 Revised usability heuristics (Nielsen, 1994a)

In the second stage, once all the individual evaluators have completed their review, the

evaluators are allowed to communicate and aggregate the findings. This is done for the purpose

of ensuring an impartial evaluation by each individual evaluator. The outcome of a heuristic

evaluation is a list of usability problems, referring to the heuristics that were violated in each

case (ibid). Although heuristic evaluation can be used at any stage of the design process,

Nielsen and Phillips (1993) found that it performed better in conditions when evaluators had

access to running versions of a system.

- 39 -

An independent study by Jeffries et al (1991) which compared four different UEMs, concluded

that heuristic evaluation produced the best results compared to the other UEMs because it found

the highest number of problems, including most of the serious problems, at the lowest cost.

Even though heuristic evaluation has been shown to be useful at finding major and minor

problems, twice as many minor, low-priority problems are found in absolute numbers (Nielsen,

1992; Jeffries et al, 1991). Due to this, Kantner and Rosenbaum (1997) recommend heuristic

evaluation for an interface that has already been evaluated iteratively and requires only minor

revisions.

The advantages of heuristic evaluation appear to be manifold, including: being able to use the

UEM at any point during the design and development process because it does not require a

working system, the availability of immediate and fast feedback (Kantner & Rosenbaum, 1997),

the ability to explain usability problems in relation to a set of design principles, and the lack of

interpretation required and fewer practical and ethical problems because users are not involved.

Jeffries et al (1991) have also found that heuristic evaluation identified serious problems at low

cost making it a valuable UEM when time and resources are scarce. This is supported by

Nielsen’s (1993) claim that even non-experts can perform a heuristic evaluation.

However, studies have found that usability specialists were better at applying heuristic

evaluation than non-experts (Nielsen, 1992; Jeffries et al, 1991; Kantner & Rosenbaum, 1997)

making it dependent on the skills of the evaluator. Sutcliffe (2002) found high levels of variance

between evaluators involved in a heuristic evaluation, while Bailey (2001) and Sears (1997)

discovered that experts were reporting problems that didn’t actually exist. This is consistent

with Nielsen’s (1994a; 1993) own admission that heuristic evaluation is not a systematic and

comprehensive method for finding and fixing all the problems with a system interface. As a

discount UEM, heuristic evaluation only addresses those problems that are specified in the

heuristics. Preece et al (2002) suggest that to make heuristic evaluation effective, it is necessary

- 40 -

to involve several trained evaluators whose expertise is known. Most importantly, though, the

fact that users are not involved in the evaluation is a significant disadvantage because no

primary data is collected about actual use of the system.

Several authors have proposed changes or extensions to heuristic evaluation. For example,

Sutcliffe (2002) suggests new heuristics to cover motivational aspects of design. Kurosu et al

(1997), on the other hand, recommend separating the heuristics into several sub-categories and

splitting the evaluation session so that each session focuses on one sub-category. Kurosu et al

(1997) argue that this is useful because it may be difficult for evaluators to maintain all ten of

the heuristics actively in working memory. The authors named this method the structured

heuristic evaluation method (sHEM). Finally, Nielsen (2000) himself has suggested

modifications to the existing heuristics based on new types of systems. An example is the list of

“HOMERUN” heuristics for the evaluation of commercial web sites, shown in Figure 2.11.

Figure 2.11 HOMERUN heuristics for commercial web sites (Nielsen, 2000)

2.8.1.2 Usability-Expert Reviews

Usability-expert reviews involve usability experts acting as surrogates for users during the

evaluation process (Virzi, 1997). These reviews are based on the premise that usability experts

will be able to identify problems at the interface because of their prior experience and

specialised knowledge. The difference between usability-expert reviews and heuristic evaluation

is that the former are usually performed by individual experts working alone. In some instances,

- 41 -

the experts may apply a set of guidelines to the interface. This technique is known as a guideline

review.

A number of different guidelines have been developed both commercially and in the public

domain for conducting guideline reviews. Some organisations such as IBM, Apple and

Microsoft have developed their own design and evaluation guidelines, customised to their

systems and organisation. These are known as “house style guides”. Others, such as Nielsen’s

“Top Ten Guidelines for Homepage Usability” (2002), are guidelines available to any

organisation or individual through public forums such as handbooks, and professional or

academic articles. Most guidelines have originated from practical experience or psychological

theory (Preece et al, 1994). Apart from their origin, guidelines can be classified depending on

their level of granularity (high-level general guidelines or specific guidelines about individual

interface elements), the type of system they are intended for (e.g. Shneiderman’s (1992) form-

fill design guidelines or Brooks’ (1988) guidelines for 3D interfaces) and their nature

(prescriptive design rules or descriptive design principles). A well-known handbook of

guidelines titled “Guidelines for Designing User Interface Software” was published by Smith

and Mosier (1986). In addition to the guidelines themselves, Smith and Mosier (ibid) included

examples of how each guideline could be used, what the exceptions were and which

psychological data set a guideline was derived from (Preece et al, 1994).

Interface standards are a special type of guidelines developed by governing organisations and

bodies such as the International Organisation for Standardisation (ISO), the National Institute of

Standards and Technology (NIST) and the British Standards Institution (BSI), to ensure that

software complies with minimum safety and operating requirements. Standards are guidelines

which have become formalised because they have been recognised as important and relevant

(Preece et al, 1994). For example, the ISO 9000 is concerned with general software quality

standards, while the ISO 9241 provides standards relevant to HCI design issues. Standards can

also be developed and promoted by professional associations (such as the British Computer

- 42 -

Society), while some are promoted as industry or de-facto standards (for example, the common

use of graphical user interfaces). The use of standards for the purposes of an expert review is

known as a standards inspection. An example of such an inspection can be found in Oppermann

and Reiterer (1997).

The use of guidelines and standards in usability-expert reviews is beneficial because it enables

experts to focus the evaluation process according to a generally accepted set of principles or

rules. These UEMs are cheap to use and require few facilities (Jordan, 1998). Jeffries et al

(1991) also found that guideline reviews identified general and recurring usability problems.

However, guideline documents may often contain more than a thousand guidelines, diluting

their benefits as they become difficult and cumbersome to use (Lund, 1997; Blatt & Knutson,

1994). As a result, individuals often develop their own sets of guidelines. Furthermore, the

usefulness of high-level guidelines (such as “Allow input flexibility”) is debateable as they are

too general (Blatt & Knutson, 1994) and require interpretation by the expert, which introduces

an element of bias into the review process.

Jeffries et al (1991) found that guideline reviews missed severe problems at the interface.

Chrusch (2000) also points out that guidelines only address a fraction of the usability problems.

This is because guidelines do not provide data that leads to any measures of usability (Jordan,

1998). There are situations when guidelines must be traded-off against resource or other

constraints, or against each other (Preece et al, 1994). For example, designing a menu with an

adequate level of detail conflicts with the desire for a fast response time (Preece et al, 1994).

Finally, Polson et al (1992) suggest that guidelines rapidly become outdated and, therefore,

ineffective. Potter et al (1990, cited in Blatt & Knutson, 1994) have found this to be the case

with standards as well, after discovering usability problems with an interface even when it did

not violate standards.

- 43 -

2.8.1.3 Cognitive Walkthroughs

A cognitive walkthrough is a structured, expert-based UEM which assesses how easy it is to

learn to use an interface by exploration. Lewis et al (1990) and Polson et al (1992) developed

cognitive walkthroughs based on their CE+ theory of exploratory learning of ‘Walk Up and Use

Systems’ (Wharton et al, 1992). ‘Walk Up and Use Systems’ include systems such as ATMs or

information kiosks in shopping malls, airports, etc. These types of systems are based on the

notion of learning by doing and the cognitive walkthrough method assesses whether this notion

has been addressed in the design of the interface (ibid). Unlike other types of walkthroughs (e.g.

requirements and code walkthroughs (Yourdon, 1989); usability walkthroughs (Bias, 1991);

heuristic walkthroughs (Sears, 1997), the cognitive walkthrough focuses on the users’ internal

cognitive activities which include the users’ goals and knowledge while performing specific

tasks (Wharton et al, 1992). The cognitive walkthrough method has been designed to be used in

an iterative fashion, early in the design and development lifecycle (although the interface must

be mature enough to allow the evaluator to determine correct action sequences), either by an

individual evaluator or evaluators (Wharton et al, 1992; Lewis & Wharton, 1997).

The cognitive walkthrough involves two phases, a preparation phase and an analysis phase

(Wharton et al, 1994). During the preparation phase, the input conditions for the walkthrough

are determined. This includes identifying:

• the user population (“Who are the users of the system?”)

• the tasks (“What tasks will be analysed?”)

• the action sequences (“What is the correct action sequence for each task?”)

• the interface (“How is the interface defined?”).

The analysis phase involves working through each action of every task identified during the first

phase and justifying why users are expected to choose that particular action, based on the

evaluators’ understanding of the users’ goals and knowledge (Wharton et al, 1994). To assist in

this process, tasks are evaluated by completing a set of forms which contain a list of four

- 44 -

questions to be asked about the interface (Wharton et al, 1992). These questions include the

following (Wharton et al, 1994; Lewis & Wharton, 1997; Preece et al, 2002):

• Will the user know what to do to achieve the task?

• Will the user notice that the correct action is available?

• Will the user associate and interpret the response from the action correctly?

• If the correct action is performed, will the user see that progress is being made?

The responses to these questions and any underlying assumptions are used to derive a success or

failure story for each action. If all the answers to the questions above are “yes”, this constitutes

a success story. A negative answer implies a failure story (Lewis & Wharton, 1997). The

purpose of this process is twofold: to find mismatches between the users’ and designers’

conceptualisations of a task, and to identify interface design errors that could interfere with

learning by exploration (Wharton et al, 1994). Following the analysis phase, the interface is

modified to eliminate the problems identified.

There are several characteristics which distinguish the cognitive walkthrough from other expert-

based UEMs (Lewis & Wharton, 1997). A cognitive walkthrough focuses on specific user tasks,

instead of the design and attributes of the interface. Rather than trying to predict how a user will

use the system, the cognitive walkthrough method assesses the likelihood that a user will follow

the correct sequence of actions to complete a task. In doing this, it attempts to identify the

reasons why a user may encounter problems by tracing the mental processes of the user. Also,

since the cognitive walkthrough method is based on a cognitive theory, it requires more

specialised knowledge than other expert-based UEMs (Wharton et al, 1992; Lindgaard, 1994).

This was demonstrated by the failure of the first version of a cognitive walkthrough

implemented with a group of untrained analysts who had difficulty understanding the

terminology of cognitive science and distinguishing goals from actions (Wharton et al, 1994).

John and Packer (1995) did conclude, however that the method is learnable.

- 45 -

Although it does not directly involve users, the cognitive walkthrough method focuses on user

tasks, unlike other expert-based UEMs. This allows evaluators to predict how specific tasks

should be performed and whether the system design actually supports these tasks (Lewis &

Wharton, 1997). By concentrating on the tasks rather than user behaviour, the cognitive

walkthrough method enables evaluators to critique an action sequence and determine whether

their expectations about the sequence are reasonable (ibid). This also helps define the users’

goals and assumptions (Jeffries et al, 1991). Furthermore, the method attempts to identify the

reasons behind the problems a user may experience (Lewis & Wharton, 1997). Studies

comparing the cognitive walkthrough method to user testing have shown that the cognitive

walkthrough method finds 40% more problems than those revealed by user testing (Lewis et al,

1990; Jeffries et al, 1991; Cuomo & Bowen, 1994) and takes less effort than user testing

(Jeffries et al, 1991; Karat, 1994). However, in comparison with other UEMs, the cognitive

walkthrough has not fared so well.

According to Jeffries et al (1991), heuristic evaluation finds more problems than a cognitive

walkthrough and requires less effort. Studies have also shown that the cognitive walkthrough

method is tedious and time consuming (Jeffries et al, 1991; Desurvire et al, 1992; Preece et al,

2002), lacks a task definition methodology (Jeffries et al, 1991) and is concerned with low level

details (Jeffries et al, 1991; Wharton et al, 1992). Originally intended for ‘Walk Up and Use

Systems’, this problem is compounded when the method is applied to more complex interfaces.

Wharton et al (1994) found that the cognitive walkthrough had to be refined and augmented in

different ways in order to be effective in industrial settings. Furthermore, the method only

focuses on a single attribute: ease of learning. Due to this narrow focus, Karat et al (1992) found

that the walkthrough failed to identify a significant number of relatively severe usability

problems. This finding is supported by Desurvire et al (1992). Cuomo and Bowen (1994)

showed that the cognitive walkthrough did not rate highly on the identification of recurring

problems compared to other UEMs, and identified more specific, rather than general, problems

(Jeffries et al, 1991). Difficulties have also been experienced during the process of applying the

- 46 -

cognitive walkthrough method. John and Packer (1995) criticised its lack of guidance about

how to pick tasks that are representative, while Ereback and Hook (1994) stated that richer

descriptions of users were required.

In an attempt to overcome the above limitations of the method, several researchers have

proposed refinements to various aspects of the cognitive walkthrough. The automated

walkthrough developed by Rieman et al (1991) tried to make cognitive walkthroughs less time

consuming and tedious by using an Apple HyperCard stack to prompt the evaluators with

relevant questions only and provide a space for recording the results. The automated version

enabled evaluators to answer most questions with a single click of the mouse or a brief text

entry and maintain a dynamic description of the users’ current and active goals (Rieman et al,

1991). However, no formal evaluations of the automated version have been carried out.

Spencer (2000) adapted the cognitive walkthrough method to make it useful in the interactive

development environment of a large software company. He did this by reducing the number of

questions and limiting discussions between evaluators after each question. This reduced the

overall time it took to complete the evaluation to 2.5 hours, but had negative effects on the level

of detail available for analysis. In contrast, Lavery and Cockton (1997) extended the number of

questions to include the feedback provided by the system. They suggested that in addition to the

four core questions about each action sequence, the following questions also be addressed:

• Will the user perceive the feedback?

• Will the user understand the feedback?

• Will the user see that progress is being made towards solution of their task in relation to

their main goal and current subgoals?

Finally, Blackmon et al (2002) recently proposed a cognitive walkthrough for the web (CWW)

which evaluates how well web sites support users’ navigation and information search tasks. The

model underlying CWW is CoLiDeS (Comprehension-based Linked model of Deliberate

- 47 -

Search) and is based on Latent Semantic Analysis. It aims to objectively estimate the degree of

semantic similarity between generalised representative user goal statements and the heading and

link test on a web page (ibid). Although the authors report successful outcomes, CWW is still

relatively new and has not been comprehensively tested and validated as yet.

2.8.1.4 Heuristic Walkthroughs

A special type of expert-based based UEM that lies across two of Virzi’s (1997) categories

(resource-constrained methods and cognitive walkthroughs) is the heuristic walkthrough. Sears

(1997) proposed the heuristic walkthrough method to combine the benefits of heuristic

evaluation and the cognitive walkthrough, arguing that the UEM provides more structure than a

heuristic evaluation, but is less rigid than a cognitive walkthrough. The heuristic walkthrough is

a two-pass process directed by prioritised lists of user tasks: Nielsen’s (1994a) usability

heuristics, and “thought-focusing” questions (Sears, 1997). In Pass One, the evaluators explore

the list of tasks in any order using the “thought focusing” questions as a guide. During Pass

Two, evaluators are given a task-oriented introduction to the system and, using the list of

usability heuristics, they are able to explore aspects of the system to look for usability problems.

In both passes, any problems are documented and assigned severity ratings. Initially the two

passes are done individually. This is followed by a meeting of the evaluators, at which they

agree on a single rating for each problem found.

Sears (1997) claims that the heuristic walkthrough is effective and easy to learn and apply. In a

controlled study, he was able to show that heuristic walkthroughs are more thorough than

cognitive walkthroughs because they found more problems, and more valid than heuristic

evaluations because they found fewer false positives (i.e. issues that were not actually usability

problems). It appears that some benefits have been derived from combining the features of a

heuristic evaluation and a cognitive walkthrough. The task-focused aspects of a cognitive

- 48 -

walkthrough are merged with the flexible free-form evaluation of a heuristic evaluation which

allows evaluators to find a variety of problems, while avoiding false positives (Sears, 1997).

However, combining the best of the cognitive walkthrough method and the best of the heuristic

evaluation method does not necessarily imply that all the problems associated with these two

methods have been eliminated. The heuristics may still require an experienced evaluator and do

not enable a comprehensive evaluation of the interface as a whole. The cognitive walkthrough

has been simplified to incorporate only the task-oriented aspects into the heuristic walkthrough,

however this also means that the UEM does not benefit from a full analysis of the user’s mental

processes. Finally, Sears’ (1997) own study showed that most serious problems in an interface

would be found only if four or five evaluators used any of the three UEMs (heuristic

walkthrough, cognitive walkthrough and heuristic evaluation).

2.8.1.5 Group Design Reviews

Group design reviews can be distinguished from resource-constrained methods and usability-

expert reviews by the number of evaluators involved. Unlike the latter two categories which

involve evaluators working independently, group design reviews, as the name implies, involve

multiple evaluators working together in a single session (Virzi, 1997). The size and composition

of the group can vary depending on the system, however the goal of the evaluation process

remains the identification of potential usability problems. Two well-known UEMs which fall

into this category are both variants of the walkthrough method. They are the cognitive

jogthrough and the pluralistic walkthrough.

- 49 -

Cognitive Jogthroughs

Rowley and Rhoades (1992) developed a less time consuming version of the cognitive

walkthrough and aptly called it a jogthrough. The difference between a walkthrough and a

jogthrough lies in the allocation of formal roles of evaluator, presenter, moderator and recorder

to participants in the process. These participants then meet together and undertake the

evaluation as a group. This group evaluation session is recorded on a videotape and a software

package is used to log relevant events in real time during the process. The video recording and

log are then synchronized. The benefits of this are twofold: all the comments made during the

session are recorded, not just the key decisions, and the speed of the evaluation is quicker and

more efficient, resulting in three times the number of problems being found in the same amount

of time (Lewis & Wharton, 1997; Mack & Nielsen, 1994).

Pluralistic Walkthroughs

Although similar in purpose to a cognitive walkthrough, the pluralistic walkthrough differs in

that it is not concerned with the mental processes of users (Karat & Bennett, 1991), it does not

utilise guiding questions and it “combines” three types of participants in a single walkthrough:

real users, developers and usability experts (Bias, 1994). All of the participants are asked to

assume the role of the user, depending on the target user group. The pluralistic walkthrough

begins by creating a scenario in the form of a series of hard copy screens for each user task that

is to be examined. Each scenario represents a single linear path through the interface. The

participants are then presented with these scenarios and a set of instructions, and briefed about

the system. The evaluation then proceeds in the following manner:

1. Each participant is asked to write down the sequence of actions they would take to

complete a specified task. The participant must specify how he/she would move from

one screen to the next. This step is done individually.

- 50 -

2. Once each participant has written down the sequence of actions, the walkthrough

administrator announces the “right” answer (Bias, 1994). The participants then

articulate their responses and discuss possible usability problems. The users present

their responses first to avoid any influence by the other participants. The usability

experts follow with their results and the developers offer their comments last.

3. This process is then repeated with the next task, and continues until all the scenarios

have been assessed. After each task, participants are asked to complete a short usability

questionnaire.

This process reveals the five defining characteristics of a pluralistic walkthrough: involving

three types of participants in a single walkthrough; presenting hard-copy panels (screens) in the

same order in which they would appear online; asking all the participants to assume the role of

the target user; writing down the sequences of actions in as much detail as possible; and

following a specific order during the discussion of the results (Bias, 1994).

According to Bias (1994) the pluralistic walkthrough method yields increased efficiency and

utility by generating new and valuable results. Although Bias and Mayhew (1994) claim that a

cost-benefit analysis of the pluralistic walkthrough method is likely to be positive, this would

only apply to those tasks which are actually selected for the evaluation. Another benefit of a

pluralistic walkthrough is the availability of early performance and satisfaction data because it

relies on low fidelity instruments (hard-copies of screens), which are accessible early in the

design lifecycle. With its strong focus on users’ tasks, this UEM appears to work well in

multidisciplinary team environments because it provides developers with immediate feedback

directly from the users (Bias, 1994; Preece et al, 2002).

However, the pluralistic walkthrough suffers from several limitations. It does not have a strong

theoretical basis but relies on four “convergent thrusts” in usability: early and iterative testing,

participatory design, co-ordinated empathies and the integration of evaluation and design (Bias,

- 51 -

1994). The nature of the process itself gives rise to a number of limitations. The entire interface

cannot be evaluated because it is not feasible to simulate all the possible actions on hard-copy

and it becomes even more problematic if there is more than one “right answer”, i.e. more than

one correct sequence of actions to complete a task. This contrasts sharply to the cognitive

walkthrough method which is based on the theory of learning by exploration. Furthermore, there

is no provision made for errors and error recovery because the emphasis is on the correct

sequence of actions. To counter these limitations, Nielsen (1993) suggests that the pluralistic

walkthrough is best suited to the evaluation of traditional text-based interfaces which have pre-

defined sequences of screens for each sub-task. Finally, Bias (1994) also confirms that the

pluralistic walkthrough progresses as slowly as the slowest of the participants and could

potentially lead to behavioural changes if the users become less critical because the developers

are present.

2.8.2 Model-Based UEMs

Model-based UEMs, as the name suggests, include methods that use predictive models to

provide measures of user performance on specific tasks with a particular interface (Preece et al,

2002; Virzi, 1997). Since they usually only require a specification of the system’s functionality

and a list of proposed user tasks, model-based UEMs are particularly useful in situations where

it is not possible to do any user testing (Preece et al, 2002; Preece et al, 1994). However, model-

based UEMs do not identify usability problems. Instead they are used to find answers to

specific, low-level questions. For example, the keystroke level model (discussed in Section

2.8.2.1) is used to estimate the time it takes to perform a task. This data can then be used to

calculate the training time and guide the training documentation. It also identifies those stages

of the task that take the longest to complete or result in the highest number of errors (Olson &

Olson, 1990).

- 52 -

Preece et al (1994) classify models as either single layer (i.e. having a flat-representation), or

multiple-layer. The latter are complex in nature and beyond the scope of this thesis. The most

well-known of the former is the keystroke level model developed by Card et al (1980). The

keystroke level model is considered to be a derivative of the Goals, Operators, Methods and

Selection Rules (GOMS) family of models as described by Card et al (1983; 1986). However,

Virzi (1997) argues that GOMS is not an empirical usability inspection method because it does

not involve subjects and does not lead directly to the identification of usability problems.

Nevertheless, according to Gray and Salzman’s (1998) definition of UEMs, it is concerned with

usability evaluation. Therefore, the keystroke level model will be discussed below.

Another model-based UEM is the Model-Mismatch Analysis (MMA) method developed by

Sutcliffe et al (2000). Based on Norman’s (1986) theory of action model and the walkthrough

methodology (Lewis et al, 1990), it is used to determine the causes of usability problems. The

MMA method will be described in Section 2.8.2.2.

2.8.2.1 The Keystroke Level Model

The keystroke level model is a single layer model which deals with short user tasks (usually

single commands) and has relatively simple user operations embedded in the sequence of the

task (Preece et al, 1994). It is the simplest of the GOMS family of models (John & Kieras,

1996). The model is used to calculate task performance times for experienced users which

provides the evaluator with an idea of the minimum expected performance times (ibid). The

model contends that the total time required to execute a task (TExecute) can be described in terms

of the following four physical-motor operators:

• Keystroking (TK) – the time it takes to press a single key or button;

• Pointing (TP) – the time it takes to point to a target on a display with a mouse or another

device;

• Homing (TH) – the time it takes to home the hands on the keyboard;

- 53 -

• Drawing (TD) – the time it takes to draw a line using a mouse.

Two other operators that are used to describe TExecute are the time it takes to mentally prepare to

complete an action (TM) and the time it takes for the system to respond (TR). The total execution

time is the sum of the time for each of the above operators. This is represented as follows:

TExecute = TK + TP + TH + TD + TM + TR

Card et al (1980) analysed the findings of empirical research to determine approximate standard

times for the above operators. However, since then, further research by Olson and Olson (1990)

has refined the operators, while other researchers have customised them in applying the model

in practice (Haunold & Kuhn, 1994).

The main benefits of the keystroke level model lie in its simplicity (Card et al, 1980) and the

numerical data collected which enables evaluators to compare alternative designs and assess the

most efficient ones (Preece et al, 2002). However, a major limitation of the keystroke model is

that it assumes all human information processing activity to be contained in the above six

primitive operators (John & Kieras, 1996). It does not take into consideration any other user

characteristics, including their previous knowledge, and it fails to account for contextual factors

which may affect the execution time. It is also restricted to a single aspect of performance – the

time it takes to complete a task (Card et al, 1980). This alone is not a reliable indicator of the

usability of the interface, nor does it identify the problems at the interface. According to Card et

al (1980), in order to apply the method, the evaluator must be an expert, the task must be a

routine one and the performance itself must be error free because the model makes no

provisions for errors.

- 54 -

2.8.2.2 Model Mismatch Analysis (MMA) Method

The MMA evaluation method was proposed by Sutcliffe et al (2000) to provide guidance for

actually diagnosing the underlying causes of user errors. The method is an extension of the

walkthrough evaluation method based on Hollnagel’s (1993) concept of observed problems and

causes of human errors. It consists of two phases. The first phase involves analysing

observations of user-system interaction in order to find the causes of usability problems (or

“genotypes”) based on surface manifestations of failure (“or phenotypes”). To do this, Sutcliffe

et al (2000) have developed taxonomies of observed usability problems and their causes. To

begin this phase, the authors propose analysing observed critical incidents and breakdowns, as

per Monk et al’s (1993) definition. In the MMA method, these incidents and breakdowns have

been elaborated into the following five categories of phenotype contexts:

1. Start of a task or sub-task;

2. Action selected or initiated;

3. During action execution;

4. Action completed;

5. Action completed, task not complete.

Whenever users encounter problems in these contexts, notes are made and supported by video

recordings. Once this has been completed, and the observed user problems have been

categorised by phenotype context, heuristics are used to map the phenotypes to causes or causal

explanations for the usability problems. The authors have developed a taxonomy of genotype

causes for this purpose. This taxonomy is based on Norman’s (1986) theory of action model.

Norman (1986) developed the model for the purpose of “understanding the fundamental

principles behind human action and performance” (p. 32) and, although it is not an evaluation

model per se, it allows evaluators to explore users’ actions as a precursor to identifying usability

problems. For example, Cuomo (1994) applied Norman’s model to assess the usability of

graphical, direct-manipulation style interfaces.

- 55 -

In the second phase of the MMA, users are asked to verbalise the steps they would “expect to

perform manually and then recall the evaluation task from memory, without looking at the

system” (Sutcliffe et al, 2000, p. 44). The data collected is used to elicit a generic user model of

the task which is then compared to a model of the system by walking through the task sequence.

This is referred to as a mismatch analysis and used to determine how well the system model

supports the users’ task. In addition, Sutcliffe et al (2000) use data collected during the second

phase of the MMA to examine how much of the system model the users have learned.

The results of applying the MMA method to the evaluation of two systems by the authors

(Sutcliffe et al, 2000) indicate that the method is easy to use, even by non-HCI experts with

minimal training. There is no doubt that the MMA method also provides evaluators with a

useful tool for understanding, classifying and mapping observed usability problems

(phenotypes) to their causes (genotypes). However, the same study showed that the second

phase of the MMA method was time-consuming and problematic because it required judgement

to derive a generic user model for the mismatch analysis. Breakdowns in the first phase arose

when the same usability problem was attributed to two different causes, bringing into question

the reliability of the method. Finally, the MMA method is a diagnostic tool because it does not

actually identify usability problems. It only analyses problems that have been detected by other

UEMs.

2.8.3 User-Based UEMs

As the name suggests, user-based methods directly involve users in the evaluation process,

although their role is limited. Instead of experts or models being used to predict user

performance and behaviour, user-based UEMs are based on observing users interact directly

with the system. User-based UEMs are distinguished on the basis of the location where this

observation usually takes place. Field studies are user-based methods that are situated in the

users’ natural environment. However, they are not as common as user-based methods that are

- 56 -

applied in controlled environments such as laboratories. They include formal experiments and a

less rigid version of formal experiments generally referred to as usability or user testing. It is the

latter that is considered to be the de-facto standard for usability evaluation (Landauer, 1995).

2.8.3.1 Formal Experiments

A formal experiment is an empirical UEM used during the evaluation process where evaluators

wish to test a specific hypothesis in order to predict the relationship between two or more

variables. This type of UEM is most useful in situations where the users’ low-level motor,

perceptual and cognitive activities are being studied (Rubin, 1994). In formal experiments, a

specific hypothesis is formulated and tested by manipulating the independent variable(s) to

determine the effects on the dependent variable. This is often done in a controlled environment

such as a laboratory. Formal experiments are used mostly to compare which of two designs is

better on a specific attribute or dimension. For example, evaluators can propose a hypothesis

such as: “A user will select the incorrect menu item more often if the menu is located on the left

hand side of the screen instead of on top of the screen”. The independent variable being tested is

the location of the menu on the screen, while the dependent variable is the number of times the

incorrect menu item is selected. Using formal experiments implies that the evaluators must be

knowledgeable about the scientific experiment method, including sampling and random

selection of participants, the allocation of participants to conditions and familiarity with

statistical techniques to analyse the predominantly quantitative data collected by this UEM.

Despite having the advantage of being able to derive causal conclusions about specific

variables, formal experiments are not practical for making design decisions because they are

resource intensive and complex to carry out. As a result, this UEM is not widely used in

practice. The focus on a specific feature of a system interface implies that the results of formal

experiments are too low-level to support the design process. The evaluators must have a

sufficient and representative sample of users readily available and be able to control a number

- 57 -

of independent variables. Also, the quantitative data collected from formal experiments is not

rich enough, in that it only indicates the extent of a cause and effect relationship between two

variables. It does not support evaluators in determining how to fix problems and make

improvements to the design (Rubin, 1994), nor does it necessarily result in usable and

competitive products (Good, 1989). Good (ibid) argues that this is because the goals are not

grounded in customers’ real needs and experiences.

2.8.3.2 Field Studies

Unlike the formal experiment, a field study is undertaken in the users’ natural environment,

which is why this type of UEM is also known as contextual field research. A field study usually

takes place towards the end of the design lifecycle when a high fidelity prototype or near

complete version of the system is available. However, this also implies that it is often too late to

make changes to the system (Rubin, 1994) and the outcomes of the evaluation are more useful

as input into the next version of the system. While formal experiments tend to focus on specific

and narrow features of the interface, a field study emphasises broader usability issues (Good,

1989). Evaluators can employ a variety of ethnographic techniques to collect data during a field

study, including observation, interviews and surveys. This is usually done while real users are

doing real tasks over a longer period of time.

Field studies are beneficial primarily because the user is observed interacting with the system in

a realistic context. Furthermore, the interaction usually involves the user engaging in a real task

using the system. It may also involve several users co-operating to complete a task which

enables evaluators to assess the system in the context of interpersonal communication. This

provides evaluators with a richer and more detailed understanding of the system in a context of

use. Finally, some usability issues emerge only after the prolonged use of a system. Since field

studies tend to be more long term than formal experiments, evaluators are able to observe users’

ongoing experiences with a system (Good, 1989).

- 58 -

However, despite their benefits, field studies suffer from a number of limitations. One of the

main problems associated with field studies is the lack of frameworks for data collection and

analysis. Due to the fact that field studies are carried out in the natural environment, the nature

of the data collected is highly unstructured and descriptive. Recording this type of data is

problematic, which minimises its effectiveness in the design process (Rubin, 1994). The users’

natural environment is also more complex than a laboratory, making it difficult to control.

Unlike a laboratory, it is dynamic (Preece et al, 2002) and plagued by high levels of noise and

interruptions (Dix et al, 1998). The complexity of the environment has implications for the

reliability of the data collected because the evaluator must ensure that only the relevant data is

being recorded. This is a challenging task because of the high volumes of data the evaluator

must deal with and filter. In such a situation, it is simple to overlook important information and

events. It also introduces a potential bias into the field study because the perception of data

relevance is subjective to the evaluator. Finally, despite the natural surroundings, a field study

can also be somewhat intrusive since an evaluator is present. To avoid this, some field studies

employ unobtrusive methods such as software logging, an automated way of recording the

users’ interaction with the system.

2.8.3.3 Usability (User) Testing

Although used loosely by researchers and practitioners alike, the term usability testing refers to

“a process that employs participants who are representative of the target population to evaluate

the degree to which a product meets specific usability criteria” (Rubin, 1994, p. 25). Sometimes

referred to as user testing (although this term is less favourable because it has negative

connotations of testing users instead of systems), usability testing emerged from the classical

experimental methodology (ibid) and is distinguished from expert- and model-based UEMs by

the direct involvement of users. This makes it the most fundamental, mainstream UEM

(Nielsen, 1994b; Hartson et al, 2001) and the de-facto standard for system evaluation (Landauer,

- 59 -

1995), widely adopted and used by researchers and practitioners worldwide (Holleran, 1991).

Usually carried out in the controlled environment of a usability laboratory but less scientific

than formal experiments, the goal of usability testing is to collect quantitative and qualitative

data about an interface so that any identified problems can be fixed. However, the objectives,

scope and scale of usability tests vary greatly. Rubin (1994) identifies four types of usability

tests: exploratory, assessment, validation and comparison tests. Exploratory tests are used to

evaluate the effectiveness of preliminary design concepts early in the design lifecycle;

assessment tests are the most commonly used and evaluate the usability of lower-level

operations and aspects of an interface; validation tests are used to determine how a system

compares to a pre-defined usability standard or benchmark; while comparison tests can be used

in conjunction with any other previous three types to compare two or more alternative designs.

Regardless of the type of usability test being conducted, usability testing method follows the

same commonly accepted methodology described by various authors and practical usability

testing handbooks (Rubin, 1994; Dumas & Redish, 1993; Nielsen, 1993; Wixon & Wilson,

1997; Monk et al, 1993). The basic steps in the usability testing process will be described in the

next section. Since the system users participate in a usability test, the terms “user(s)” and

“participant(s)” will be used interchangeably in the describing the usability testing process.

Usability Testing Process

The usability testing process begins with a definition of the problem statement or the test

objectives. Despite having roots in the formal experiment method, usability testing does not

define hypotheses due to the diverse organisational and human constraints involved and the

qualitative nature of usability testing goals (Rubin, 1994). Defining the test objectives is part of

the larger process of developing a test plan which acts as a blueprint for the test. A usability test

plan includes the following elements (Rubin, 1994):

• A statement of purpose;

• A problem statement or list of test objectives;

- 60 -

• A profile of the users;

• A detailed description of the method (i.e. how the test will be carried out);

• A list of tasks that the participants will perform during the test;

• A description of the test environment and the equipment used;

• The role of the test monitor or facilitator (optional);

• An overview of the evaluation measures (the data to be collected);

• A summary of the sections to appear in the final test report.

At this stage, specific roles can also be assigned to the evaluators involved in a formal usability

test, including facilitator, data logger, timer, video recording operator, product and/or technical

expert and test observer (Rubin, 1994).

The second step involves selecting and recruiting a sample of representative user participants.

Depending on the circumstances, it is possible to do a random selection. However, at other

times representative users may not be readily available. Nevertheless, this is the most critical

element of the testing process (Rubin, 1994). This step begins by profiling the target population

(Dumas & Redish, 1993). Rubin (1994) proposes using the characteristics shown in Figure 2.12

(adapted to system evaluation) to derive a generic user profile with the caveat that the make-up

of the profile will depend on the system being evaluated. Dumas and Redish (1993) suggest a

similar list of characteristics.

- 61 -

Figure 2.12 Generic user profile (adapted from Rubin, 1994)

The other aspect of selecting participants is choosing the sample size, or the number of

participants to test. According to Rubin (1994) this number will depend on several factors,

including the degree of confidence in the results that is required, the available resources, the

availability of participants, the duration of the test session and the time required to prepare for

the test. Rubin (1994) recommends a minimum of four to five participants, while Dumas and

Redish (1993) claim that a usability test typically includes six to twelve participants. Nielsen

and Landauer (1993) proposed a mathematical model for finding usability problems depending

on the number of participants involved. They propose that the number of usability problems that

have been found at least once by i participants is

Found(i) = N(1 – (1 – λ )i )

- 62 -

where N is the total number of problems in the interface, and λ is the probability of finding the

average usability problem with a single, average participant. In their study, the mean number of

problems (N) was 41 and the mean probability of finding any problem with a single participant

(λ) was 31%. Accordingly, a sample of four to five participants is sufficient to find 80% of the

usability problems if the likelihood of the problem detection ranges from 30% to 40% (Virzi,

1992; Nielsen, 1994a). Involving additional participants results in fewer and fewer new

usability problems being found (Virzi, 1992; Lewis, 1994). More recently however, Woolrych

and Cockton (2001) have disputed the Nielsen and Landauer (1993) formula and provided

evidence to show that the formula “can fail spectacularly to calculate the required number of

test users for a realistic web-based test” and is suitable only for detecting simple problems (p.

105). They argue that this arises because of the “unwarranted” assumptions made by Nielsen

and Landauer (1993) regarding individual differences in problem discovery.

It is also necessary to divide the user sample into subgroups depending on the differences

between participants’ characteristics (Nielsen, 1993). This can be done based on the

participants’ role or their computer experience. In order to define the subgroups, each

characteristic must be clearly identified and quantified (Dumas & Redish, 1993). For example,

participants who have been using a computer for less than three months would be categorised as

novices.

Participants for a usability test can be recruited from a variety of sources, including internally

(from the organisation), through employment agencies, market research firms, networking and

newspaper advertisements (Rubin, 1994; Dumas & Redish, 1993). An additional consideration

is user compensation. It is conventional to compensate participants for their time and this

usually takes the form of a nominal payment, a gift voucher/certificate or a small token such as

a T-shirt.

- 63 -

The most labour intensive step in usability testing is the preparation of the test materials (Rubin,

1994; Dumas & Redish, 1993). This involves developing a screening questionnaire, an

orientation script for the participants, a background questionnaire, the data collection

instruments, a nondisclosure agreement and video consent form, a pre-test questionnaire, a set

of task scenarios, any required training materials, a post-test questionnaire and a debriefing

guide. These materials will be described subsequently in the context of the usability test.

(Questionnaires are a standard and established technique for collecting data from users and are

used widely in combination with different UEMs, including usability testing, therefore they will

not be discussed separately.)

The actual usability test usually takes place in a specially designed laboratory. However, this is

not a prerequisite for a usability test (Dumas & Redish, 1993). There are a large number of

laboratory configurations and set-ups, depending on the degree of formality (Wixon & Wilson,

1997), the resources available and the location of the participants. A more informal usability test

may involve the evaluator being present in the same room with the participant and using a

simple video camera to record the participants’ actions. A classic usability laboratory set up is

shown in Figure 2.13 (Rubin, 1994). It includes standard equipment such as video cameras, a

one-way mirror and specialised hardware and software for automated logging and monitoring

the use of the system. However, there exists a wide range of usability laboratory layouts in

practice. An entire issue of the journal Behaviour & Information Technology edited by Nielsen

(1994b) was devoted to usability laboratories providing an insight into the complex operations

and configurations of laboratories in thirteen different organisations including IBM, SunSoft,

Philips, Microsoft, etc. Portable or travelling laboratories are also used in mobile usability

testing. Furthermore, usability laboratories are not only used for physically conducting tests, but

also for remote usability testing which involves monitoring system usage anywhere in the world

using network hardware and software as described by Hong et al (2001), Hartson et al (1996)

and Hammontree et al (1994).

- 64 -

Figure 2.13 A typical usability laboratory (Rubin, 1994, p.56)

If the objective of the usability test is to compare two or more systems, prior to the test,

participants are involved in one of two ways depending on whether between-subject or within-

subject testing is being carried out (Nielsen, 1993). Between-subject testing is the simplest form

whereby different participants are used for the different systems. Using this approach, each

participant only takes part in one test session. This may be problematic owing to individual

variability between participants (ibid). A within-subject approach involves all of the users

testing all of the systems. While this overcomes the problem of individual variability, exposing

users to one system first means that they cannot be considered novices for the purposes of

testing the second system.

To initiate the usability testing process, a preliminary screening questionnaire designed to select

appropriate participants who match the user profile is administered. This is usually done at the

first point of contact with a participant in order to determine his/her suitability before the

usability test. Once a potential participant has been screened, he/she is invited to the usability

- 65 -

laboratory where the actual testing process begins. An orientation script describing the test

objectives and what will occur during the test session is read to the participant. Following this,

the participant is required to complete a nondisclosure agreement and a video recording consent

form for ethical reasons. A background questionnaire is then presented to the participant to

collect historical data about his/her background. A pre-test questionnaire may also be used to

assess the participant’s first impressions about the system before actual use of the system

(Rubin, 1994). The participant is then given a set of task scenarios to perform using the system.

Task scenarios are developed by the system evaluators based on their understanding of what

users’ jobs are and how they do the tasks that are part of the jobs (Dumas & Redish, 1993). Task

scenarios are intended to be representations of real work that the participants would engage in

using the system. Widely touted as the champions of scenario-based methodologies for system

development and the use of scenarios in the design and evaluation of systems, John Carroll and

Mary Beth Rosson (Carroll 2000a, 2000b, 2000c, 1995a; Carroll & Rosson, 1992; Rosson &

Carroll, 2002) define scenarios as informal narrative descriptions of human activities which

highlight the users’ goals and what they are trying to do with the system. Scenarios are intended

to be concrete yet flexible enough to be easily revised as the need arises, and written at multiple

levels from many different perspectives and for many purposes (Carroll, 2000c). Carroll (ibid)

argues that by using scenarios throughout the design and development lifecycle, instead of

documents such as functional specifications, designers are in a better position to understand and

build systems that support real human activities. He promotes the use of scenarios at all stages

of the development process, including the requirements specification, the prototype

development and the evaluation. The benefits of this approach are manifold. Scenarios help

designers reflect on the design process, evoke empathy for the users, raise questions for

designers to address, afford multiple views of an interaction and promote a work orientation,

rather than a system orientation (Carroll, 2000c). For examples of some real-life examples of

scenario use see Bødker (2000).

- 66 -

Rubin (1994) suggests that task scenarios in a usability test should include the following

information:

• The end result that participants are trying to achieve;

• The motives of the participants;

• Actual data and names (rather than general information);

• The state of the system when the task is initiated;

• Screen and document outputs that the participants will see while performing the

task.

A sample task scenario proposed by Rubin (1994) is shown in Figure 2.14.

Figure 2.14 A sample task scenario for setting up a printer (Rubin, 1994)

Dumas and Redish (1993) suggest that a good scenario is short, written in the users’ language,

unambiguous, provides sufficient information to complete the task and is directly linked to the

goals of the usability test. A scenario should not indicate to the participant how the task should

be carried out, only what must be achieved. Wixon and Wilson (1997) distinguish between two

types of tasks: result-based and process-based. Result-based tasks require users to replicate a

result (for example, providing participants with a drawing which they must replicate using the

drawing software being tested). Process based tasks provide step-by-step instructions about

what is required. At times, prerequisite training may be necessary before a user can begin to

complete the task. This occurs when it is necessary for the participant to have a prescribed

minimum level of expertise to use the system or when an advanced feature of the system is

being tested.

- 67 -

While completing the task scenarios, a participant’s performance is recorded using video and

specialised recording, monitoring and logging equipment for analysis at a later stage. However,

it is important to determine at the outset which performance measures will be collected and

analysed. The measures can be quantitative and qualitative in nature and can include (but are not

limited to) any of the measures shown in Figure 2.15 (Rubin, 1994; Nielsen, 1993; Dumas &

Redish, 1993).

Quantitative Performance Measures Qualitative Performance Measures Time to complete a task Time spent navigating menus Time to recover from an error Time spent reading vs. working Number of errors Number of incorrect menu choices Number of times turned to manual Percentage of tasks completed successfully Frequency of accessing help facilities

Comments from users Explanations from users Descriptions of system usage Observations of frustration Responses to questions

Figure 2.15 Sample performance measures used in usability testing

Performance data can be collected using fully automated data loggers or online data entry by the

test facilitator. Some usability laboratories even have highly sophisticated equipment which

allows evaluators to track a participant’s eye movements (Goldberg et al, 2002).

While doing the task scenarios, participants may also be asked to “think aloud”. The Think

Aloud Protocol is a technique developed by Ericsson and Simon (1984) that can be used in

combination with other UEMs as well. It requires participants to think aloud or verbalize their

thoughts while they are performing tasks. This externalisation of the thought processes enables

evaluators to explore users’ problem solving strategies. Touted as the “single most valuable

usability engineering method” by Nielsen (1993), the Think Aloud Protocol generates a wealth

of qualitative data from the users which specifically relates to the interaction with a system at a

certain point in time. This is more useful than asking the users to rationalise their actions in

- 68 -

retrospect (Nielsen, 1993). The main problem with the Think Aloud Protocol is its

unnaturalness because it is difficult to verbalize thoughts and perform actions at the same time

(Rubin, 1994). Participants often fall into silences while concentrating on the task at hand and

obtrusive prompting is required from the facilitator. To overcome this problem some authors

(Hackman & Biers, 1992; O’Malley et al, 1984) have suggested the use of teams of participants,

making it easier to verbalize cognitive processes through conversations between the

participants. This technique is commonly referred to as co-discovery. However, Nielsen (1993)

and Rubin (1994) also argue that thinking aloud slows down the participant’s performance and

alters their problem solving behaviour. An alternative method for eliciting user comments

during a usability test is the Question Asking Protocol (Kato, 1986) which differs from the

Think Aloud Protocol in that users’ are asked probing questions while performing the task. This

technique also known as Active Intervention (Dumas & Redish, 1993).

Following the completion of the task scenarios, the participant is asked to complete a post-test

or satisfaction questionnaire in order to collect preference data about the system. This is

subjective data about the users’ opinions and feelings which assists evaluators to understand the

system’s strengths and weaknesses (Rubin, 1994). Usually a Likert type rating scale is used to

answer questions or rank statements about the perceived ease of use, ease of learning, usefulness

and helpfulness of the system (Dumas & Redish, 1993), although specialised questionnaires,

such as the Web Usability Questionnaire (WAMMI) and the Software Usability Measurement

Inventory (SUMI) have been developed. The SUMI questionnaire measures affect (or the

participants’ emotional response to the system), efficiency (the degree to which the system

enables the participant to complete a task in a timely and economical manner), learnability (the

ease of learning to use the system), helpfulness (the extent to which the system communicates in

a helpful way to resolve difficulties) and control (whether the system responds consistently and

whether its operations can be internalised by the participant). Other questionnaires available to

evaluators include the Questionnaire for User Interface Satisfaction (Chin et al, 1988), the

- 69 -

Computer System Usability Questionnaire (Lewis, 1995) and the Purdue Usability Testing

Questionnaire (Lin et al, 1997).

It is also appropriate to use the interview technique to solicit satisfaction and preference data

from participants. This can be done as part of the debriefing session at the end of the usability

test which involves a review of the test with the participant.

It is customary to conduct a full pilot usability test prior to going through the above process with

actual participants so that any “bugs” in the process can be eliminated (Dumas & Redish, 1993;

Nielsen, 1993). Depending on various organisational, system and human factors, the usability

testing process can take anywhere from one to twelve weeks to conduct (Dumas & Redish,

1994).

The final step in the usability testing method involves compiling and analysing the findings, and

making recommendations for improvements to the design based on the results. Quantitative

performance and preference data from questionnaires and logs is usually summarised and

analysed using descriptive or inferential statistics, while qualitative interview and think aloud

data is synthesised into meaningful categories and transcribed. Data can also be summarised

according to groups (for example, the data for novice users is summarised separately to the data

for expert users). The video recording is also analysed to identify errors that caused usability

problems at the interface. These problems must then be organised by scope and ranked in order

of severity (Dumas & Redish, 1993). This enables the evaluators to prioritise problems,

consider appropriate solutions and make recommendations in the final report which is

subsequently prepared. Due to the vast amount of data collected during a usability test, it is

possible to identify usability problems by triangulating data from several sources, as shown in

Figure 2.16. However, the notion of a usability problem in itself is a problematic one and will

be dealt with separately in Section 2.9.

- 70 -

Figure 2.16 Triangulating usability test data from different sources (adapted from Dumas & Redish, 1993)

The usability testing process described above can be thought of as a “textbook” or traditional

usability testing method. There are variations in the method when it is applied in real world

evaluation projects by different organisations or teams, depending on their needs and

constraints. Some examples of the variations in the textbook method can be found in Sazegari

(1994), Wiklund (1994), Dayton et al (1994), Brooks (1994), Zirkler and Ballman (1994) and

Szczur (1994). It is important to note, however, that the deviations are minor and found in the

detail rather than the core steps in the usability testing process. The widespread use of usability

testing in research and industry has led to high levels of confidence in the results of this UEM

(Hartson et al, 2001). It is important to examine whether this confidence is justified by

discussing the benefits and disadvantages of usability testing.

Benefits of Traditional Usability Testing

Usability testing directly involves users in the evaluation process, which is a significant benefit

because primary data about the system usage and the interface can be collected. Primary data is

richer, more meaningful and expressive than data collected from experts, models and formal

experiments. This has lead several authors (Hartson et al, 2001; Brooks, 1994; Pejtersen &

Rasmussen, 1997) to assert that evaluators have high levels of confidence in the results of

- 71 -

usability testing. This view is supported by studies (Hartson et al, 2001; Karat, 1997; Nielsen,

1992; Virzi, 1992; Jeffries et al, 1991; Bailey et al, 1992; Karat et al, 1992; Desurvire et al,

1992), which show that usability testing produces high quality results including uncovering

more usability problems, more high severity usability problems and more unique usability

problems than other UEMs. According to Rubin (1994, p. 27), usability testing is “an almost

infallible indicator of potential problems”. As such it is more effective than other UEMs

(Brooks, 1994; Pejtersen & Rasmussen, 1997) and more valuable for making specific design

decisions (Kantner & Rosenbaum, 1997).

Usability testing is usually carried out in the controlled environment of a laboratory using

specialised logging equipment designed to collect objective and quantifiable data which can be

analysed using statistical methods to measure user performance. Evaluators are able to control

different aspects of the testing process allowing them to evaluate specific usability features of

the system in detail. The use of video recording also assists in the data analysis stage because

evaluators can re-play and re-examine the tapes. This is especially beneficial in relation to field

studies which do not have the benefit of a controlled environment. Furthermore, it has been

argued that observing participants during a usability test often results in attitude changes

towards users because of the insights they provide (Dumas & Redish, 1993).

Although usability testing is a costly UEM to conduct, it does appear to be cost effective when

considered on a cost-per-problem uncovered basis (Dumas & Redish, 1993). Nielsen’s (1993)

calculations show that in a usability test with three users, the projected benefits were $413,000

compared to the costs of $6,000, suggesting a 69:1 benefit to cost ratio.

Clearly, the benefits of usability testing are significant, especially when compared to other

UEMs that do not involve users. However, despite this, the traditional usability testing method

suffers from numerous problems and limitations.

- 72 -

Limitations of Traditional Usability Testing

Traditional usability testing has a number of limitations and problems that may occur at all

stages of the testing process described previously. These have been identified by various

research studies as well as practitioners’ insights and will be highlighted and examined next.

Wixon and Wilson (1997) have drawn attention to the problems associated with setting the

usability testing goals during the initial stages of usability testing. They claim that usability

testing goals may be too ambitious or too many in number. This in turn will create the need for

more elaborate and lengthy test procedures, as well as generating more data to analyse. This is

highly undesirable, considering that human factors experts spend on average 33.2 hours per user

when conducting usability testing (Jeffries et al, 1991). Increasing the complexity of the testing

process also increases the risk of failure (Wixon & Wilson, 1997). Conversely, arguments have

been put forward that usability testing cannot be used to evaluate every aspect of the system

(ibid), making the process deficient in collecting data about the whole system. Therefore, there

appears to be a trade-off between the complexity of the test objectives and the need to test the

system as a whole.

Holleran (1991) identified a number of methodological pitfalls in the usability testing process

associated with sampling. He states that our confidence in the generalisability of the usability

testing results increases with the number of participants who are representative of the target

system users. While the former issue has been addressed by the work of Nielsen and Landauer

(1993) and Virzi (1992), Holleran’s (1991) concern lies in the degree to which the participants

are representative of typical users. This issue is problematic for two reasons. Firstly, participants

may only be as representative as the evaluators’ ability to understand and categorise the target

user population (Rubin, 1994). Secondly, it may be difficult to identify, access and recruit

representative users for any number of reasons (e.g. remote location, unavailability,

unwillingness to participate, etc.).

- 73 -

Generating the scenarios to be used in the usability tests is another area of contention owing to

the complexity of the task. Scenarios are intended to reflect the activities that typical users

perform and their associated goals. However, in practice, the scenarios are designed to reflect

specific aspects of the system that are representative of its capabilities” (Fath et al, 1994). The

implication is that, rather than evaluating whether the system supports the users’ tasks, usability

testing is concerned with the evaluation of specific functions of the system that have been

implemented, regardless of the usefulness or relevance of these functions. Generally, scenarios

for usability testing are generated by the developers (Hartson et al, 2001) and/or evaluators,

which introduces an inherent bias into the evaluation. Jacobsen et al (1998) and Hertzum and

Jacobsen (2001) have termed this bias the ‘evaluator effect’. The presence of the evaluator

effect means that different evaluators who are testing the same system will detect substantially

different sets of usability problems because the use of their judgement is required. Hertzum and

Jacobsen (2001) claim that the evaluator effect “persists across differences in the system

domain, system complexity, prototype fidelity, evaluator experience, problem severity, and with

respect to detection of usability problems as well as assessments of problem severity” (p. 439).

Their findings show that on average, the agreement between any two evaluators can range from

5% to 65% which is an astonishing discrepancy. The CUE studies by Molich et al (1998) and

Molich et al (1999), which will be described in Section 2.9, have found even more divergent

levels of agreement.

The involvement of evaluators and/or developers in the task scenario generation process also

often results in the wrong terminology being used to describe users’ tasks. It is critically

important that scenarios are written in the language of the users (Fath et al, 1994) because the

way in which the users interpret the scenarios will ultimately have an effect on the results of the

usability testing. If, for example, a user is unable to understand the requirements of the task, this

indicates that the task is not representative and that the user may not be able to complete the task

because of interpretation difficulties rather than problems with the system.

- 74 -

The controlled environment and equipment of a usability laboratory, where most usability

testing takes place, is designed to enable evaluators to evaluate a system as objectively as

possible and generate quantitative data. However, despite the seeming objectivity enabled by the

testing environment, there are several subjective factors involved at crucial stages of the testing

process, including: the setting of usability goals, the generation of task scenarios and the

interpretation and analysis of results, all of which are carried out by expert evaluators. Also,

while the quantitative performance data collected during a usability test may be useful in

general terms, it may not be a reflection of actual performance because contextual influences

have not been factored into the data. The use of descriptive and inferential statistics does not

necessarily provide insights into whether a system actually works. Measures of statistical

significance used in analysing quantitative data are simply a measure of probability that the

results did not occur due to chance (Rubin, 1994). This type of ‘micro-level’ analysis does not

actually prove that the system is usable, or more importantly, useful.

A major limitation of a laboratory-based usability test is the unnaturalness and artificiality of the

environment (Hartson et al, 2001; Wilson & Wixon, 1997; Dix et al, 1998; Rubin, 1994). The

lack of contextual factors and unrealistic setting reduce the ecological validity of the evaluation

process. According to Thomas and Kellogg (1989), ecological validity refers to how close a

testing situation is to the real world. They identify four ecological gaps in laboratory-based

usability testing: work-context gap, user gap, task gap and artifact gap. The mismatch between

the users’ real context and a test context is referred to as a work-context gap. The work-context

gap does not only include differences in the physical context, but the job context and social and

cultural contexts of user activities. Rubin (1994) also points out that any form of usability

testing (in a laboratory or in the field) depicts only the situation of usage and not the situation

itself. Naturally, this will have an effect on the test findings because the situation in this instance

represents the context of the usage and the two are inextricably linked. Rubin (1994) has

suggested creating laboratory spaces which resemble the users’ real context, however, even with

- 75 -

these improvements, usability testing is not a perfect indicator of field performance (Nielsen &

Phillips, 1993). One of the reasons for this lies in the participants’ motives. Holleran (1991)

argues that participants in a usability test will persevere in doing tasks which they are unlikely

to do in their own context out of willingness to comply with the evaluators’ requirements.

Thomas and Kellogg (1989) have called this the user gap. It is also more widely known as the

Hawthorne effect (Mayo, 1933) and is amplified by the presence and use of video cameras in a

usability laboratory to record testing sessions.

The artificiality of the laboratory is also conducive to the use of brief and clearly defined task

scenarios which are usually completed within a specific time period. This is in stark contrast to

the ill-defined and ongoing activities that users actually engage in. This mismatch is termed the

task gap by Thomas and Kellogg (1989). Finally, the artifact gap refers to the differences

between short-term system usage during a test and long-term usage in the real world which

evolves over time. Some usability problems may only emerge after prolonged system usage

which is clearly not possible in the context of a laboratory.

Holleran (1991) also questions the validity and reliability of the data collected during usability

testing. Validity refers to whether the evaluators are actually measuring what they intend to

measure. Considering the problems associated with deriving representative task scenarios, the

validity of usability testing results remains a controversial issue. Furthermore, it is not always

possible to collect quantitative data or data for which suitable statistical measurements are

available (Holleran, 1991). Reliability is the extent to which the data produced in one usability

test will match the data produced in another if the testing is replicated under the same

conditions. The CUE studies (Molich et al, 1998; Molich et al, 1999) have shown the reliability

of usability tests to be quite low.

Usability testing is also plagued by logistical problems, such as participants failing to show up

at the appointed time (Wilson & Wixon, 1997), scheduling convenient times for the testing, and

- 76 -

problems with obtaining and maintaining specialised equipment. As such, usability testing is the

most expensive and time consuming UEM (Hartson et al, 2001; Jeffries et al, 1991; Kantner &

Rosenbaum, 1997). It also requires a working prototype and specialised expertise to conduct

(Jeffries et al, 1991) because, even though users take part in the process their role is restricted.

Usability testing is strongly controlled by the evaluators at every stage (Mayhew, 1999) and

driven by the system (Sweeney et al, 1993). The users are reduced to being passive participants

who are controlled, observed, recorded and surveyed in order to collect performance data. The

evaluators decide what, how, when and where to evaluate and the users have no input into the

design of the evaluation or the interpretation of the results.

One of the most critical disadvantages of usability testing is that, like most other UEMs, it has

no theoretical basis in HCI. Loosely based on the formal experiment, the method emerged from

practice and has since been widely adopted and applied. Different approaches to usability

testing are used and then compared to find out which approach works better and why, without

any theoretical framework to allow this type of analysis. Holleran (1991) refers to this as

“dustbin empiricism”. Dustbin empiricism also raises serious questions about the results or

outcomes of usability testing – usability problems. The concept of a usability problem is in itself

notoriously abstract and difficult to define, as the following section will demonstrate.

2.9 The Problem with Usability Problems

Although we believe that it is possible to recognise a usability problem when we see one, there

is no reliable and widely accepted formal method for identifying and defining a usability

problem. In fact, the definition of a usability problem can be seen as the “holy grail” of HCI.

The term “usability problem” is commonly used to refer to any difficulties or trouble a user may

have while using the system, or any faults in the system which cause a breakdown in the

interaction. However, there are no explicit criteria defining when such a difficulty or fault

- 77 -

constitutes a usability problem (Hertzum & Jacobsen, 2001) and so any problem reported is

deemed to be a usability problem.

Different UEMs identify different usability problems and no one single UEM can be relied on to

uncover every possible usability problem. Ideally, different UEMs should identify similar

usability problems, however the reality indicates the opposite (Gray & Salzman, 1998). It is

quite possible for the same UEM to produce different outcomes when applied to the same

system several times. To illustrate this anomaly, Molich et al (1998) and Molich et al (1999)

conducted the Comparative Usability Evaluation (CUE) studies to determine whether

professional usability testing laboratories worldwide would detect the same usability problems

in two commercial systems. The results of the CUE studies were startling. There wasn’t a single

usability problem that was reported by every usability laboratory. The differences in the rate of

problem detection were substantial, with 91% of the usability problems reported only once (i.e.

by one laboratory) in the first study, and 79% in the second study. The results of the two studies

indicate an astonishingly trivial overlap between the reporting of usability problems. Although

the number of problems reported is not a reliable indicator of the existence of real usability

problems, the CUE studies highlight the fact that a single UEM cannot identify the same

usability problems even after being replicated by nine professional usability testing laboratories

on the same interface.

Without a clear understanding of what constitutes a usability problem our ability to detect

usability problems is reduced and the integrity of the entire evaluation process is undermined. It

also poses a difficulty when comparing the results of different UEMs because there is no way of

knowing which outcomes are legitimate usability problems and which UEMs produce better

results. This situation is evident in the previous section that made mention of various UEM

comparison studies, none of which have reported the same (or even similar) results. However,

there have been various attempts in HCI to define, validate and rate the severity of usability

problems. These will be discussed briefly.

- 78 -

2.9.1 Defining Usability Problems

Hartson et al (2001) suggest that a usability problem “is real if it is a predictor of a problem that

users will encounter in real work-context usage and that will have an impact on usability” (p.

383). This definition excludes problems with a trivial impact and those that occur in situations

that the user would not encounter from this definition. Hartson et al’s (2001) definition implies

that, for a usability problem to be real, it must occur in the real work context of real users. Since

it is not always possible to study actual users in their actual work environment, the

characterization of a real usability problem remains elusive. However, Hartson et al’s (2001)

definition does have significant implications because it suggests that usability problems occur in

a context. Our current UEM methods, as described previously, do not provide the means to

identify usability problems in a context (with the exception of field studies which suffer from

other limitations listed in Section 2.8.3.2).

In their study of evaluator effect, Jacobsen et al (1998) used a set of nine problem detection

criteria to define a usability problem. Participants in the study were asked to specify three

properties for each usability problem detected: a free-form description of the problem, evidence

of the problem and one of the nine criteria based on which the problem was identified. These

included the following (where specific criteria were used for the purpose of the study, it has

been generalised here):

1. the user stating a goal and not being able to achieve it within three minutes;

2. the user giving up;

3. the user stating a goal and trying three or four different actions in order to achieve it;

4. the user performing a task that is different to the task specified by an evaluator;

5. the user expressing surprise;

6. the user making a negative statement or declaring a problem;

7. the user suggesting a design alternative;

- 79 -

8. the system failing;

9. the evaluator generalising problems into a new problem.

The approach used by Jacobsen et al (1998) has merit in that it allows evaluators to indicate a

user or system event that triggers the recognition of a usability problem. However, the nine

criteria listed above are not exhaustive because they are user, system and task dependent. It is

quite possible for some usability problems to remain undetected if they can’t be linked to the

nine criteria above. The heavy reliance on user events as triggers means that those usability

problems which the user remains unaware of will not be discovered. For example Spool et al

(1999) found that users may not be aware that they are in the wrong place while using web sites.

Also, the application of the criteria is a subjective affair, requiring the exercise of the

evaluator’s personal judgement. Furthermore, the nine criteria do not indicate whether the

usability problem is a real one (i.e. one that is likely to occur in the user’s real environment), or

exists just during the evaluation process. Thus the issue of context is raised once again.

2.9.2 Validating Usability Problems

Several attempts have been made to provide a validation framework for usability problems

generated by UEMs. The most well-known of these is a detection matrix proposed by Gray and

Salzman (1998) and shown in Figure 2.17. The matrix is based on two factors: whether a UEM

claims that A is a usability problem and whether A actually is a problem. If A is claimed to be a

problem, and it does actually exist as a problem, this is called a hit. If A is claimed to be a

problem, but it is not an actual problem, then this is termed a false alarm. Alternatively, if a

UEM claims that A is not a problem, but in truth it is, then this is known as a miss. Finally, if A

is not a problem and the UEM does not claim that it is, then this instance is labelled a correct

rejection.

- 80 -

Figure 2.17 Usability problem detection matrix (adapted from Gray & Salzman, 1998)

The problem with this approach lies in its implication that we are able to judge correctly the

absolute existence of a usability problem. Gray and Salzman (1998) state that this is somewhat

misleading because we do not “have access to truth as the final arbiter of usability problems” (p.

239). In fact, this particular approach does not shed any light on the definition of a usability

problem. Not only does it base itself on the claim made by a particular UEM that a usability

problem exists (which in itself is contentious because it is unclear whether a particular issue has

been identified correctly as usability problem in the first place), but it also suggests that the

validity of this claim can be determined by checking if it conforms to an absolute external

reality (i.e. the truth). Since there is no way of objectively measuring absolute external reality,

the usability problem detection matrix is of limited usefulness.

2.9.3 Rating Usability Problems

Common sense dictates that in situations where a problem arises, the problem is first evaluated

to determine how severe it is and based on this, an appropriate course of action is taken. This

applies equally to situations where a usability problem emerges. Nielsen (1994a) views the

severity of a usability problem as the combination of three factors:

1. The frequency of the problem: how often does the problem occur (commonly or

rarely);

- 81 -

2. The impact of the problem: how simple is it for users to overcome the problem (easy

or difficult);

3. The persistence of the problem: is it a one-off problem that can be resolved or a

repeatedly occurring problem which will bother users continuously.

Nielsen (1994a) suggests that the severity rating of a usability problem should be assessed by

asking the evaluators to complete a survey or questionnaire about each usability problem

following the evaluation. To assist in this process, a five-point rating scale is proposed by

Nielsen (ibid) and shown in Figure 2.18. The reason for completing the survey after the

evaluation is because the evaluators are focused on actually finding usability problems during

the evaluation. It also enables the evaluators to determine the severity rating of all the usability

problems after they have been identified and in relation to each other.

Figure 2.18 A five-point rating scale for the severity of usability problems (Nielsen, 1994a, p. 49)

Nielsen (1993) also proposed a combination of orthogonal scales to rate the severity of a

usability problem. This approach, shown in Figure 2.19 below, was based on two dimensions:

1. The proportion of users experiencing the problem (few or many users);

2. The impact of the problem on the users who are experiencing it (small or large).

Figure 2.19 Estimating the severity of usability problems using a combination of orthogonal scales (Nielsen, 1993)

- 82 -

Nielsen (1994a) found that the reliability of the severity ratings from a single evaluator was very

low, indicating that the use of several evaluators involved in the evaluation process was a more

dependable method. Even though Nielsen’s severity rating scale has been widely used in HCI

studies, it is flawed because it relies on the evaluators’ subjective experience and opinion to

judge how severe a particular usability problem is. A method which involves users is a more

appropriate means of assessing usability problem severity because users are able to indicate

first-hand how severely a usability problem affects them.

Clearly, the notion of what constitutes a usability problem is confusing and ambiguous. Not

only is there no clear-cut and consistent definition of what a usability problem is, but there is no

reliable method or technique of validating usability problems and rating their severity. Until

these issues are resolved, it remains virtually impossible to determine whether the outcome of a

UEM is a set of real and valid usability problems, and, consequently, to compare the outcomes

of different UEMs. The identification and rating of usability problems is, at best, a subjective

endeavour undertaken by the evaluator. It can be argued that the vagueness surrounding the

definition of usability problems is a symptom of the ambiguity of the concept of usability itself.

Only when an unambiguous and structured definition of usability itself is proposed and

formalised in research and practice will it be possible to identify real usability problems because

a clear understating of the basic concept (usability) itself will result in improved usability

evaluation outcomes.

So far, this chapter has identified a number of issues in usability that remain unresolved or

problematic. Amongst these are the limitations of current conceptualisations of usability

(described in Section 2.2), the fragmented nature of UEMs and the limitations of each UEM

(described in Sections 2.7 and 2.8) and the problematic nature of defining usability problems

(Section 2.9). It is clear that the field of usability evaluation suffers from fragmentation, and that

our mainstream UEMs are plagued by serious flaws and deficiencies. All of these problems

represent challenges faced by the HCI community that need to be overcome if the integrity of

- 83 -

the discipline is to be maintained. The aim of this thesis is to contribute towards this endeavour.

As a first step an alternative conceptualisation of usability as being distributed has already been

proposed in Section 2.3. This alternative conceptualisation may contribute towards resolving

some of the challenges faced by the HCI community which can be directly attributed to the

traditional view of usability (as discussed in Section 2.2). The traditional view of usability

focuses only on the system, and not the system in use. Therefore, the second step involves

identifying specific UEM challenges and proposing the means to overcome them. This will be

done in the following section.

2.10 UEM Challenges

A UEM challenge is defined as a specific limitation associated with usability evaluation

methods in general. Following an examination of the weaknesses associated with different

UEMs discussed in previous sections, a list of UEM challenges has been derived. These

challenges are listed and described in the following sections. Each challenge applies to most

UEMs, however, where it is specific to a particular UEM, this will be clarified.

2.10.1 UEM Challenge 1 [UEMC-1]: UEMs are system focused and technology driven

Most UEMs focus on the properties and functions of the system, rather than emphasising the

use of the system in a real world context. An understanding of the system from a technical

perspective is insufficient because it does not reveal the complexities of actually using the

system to achieve real goals. Understanding the use-situation is critical because it evolves over

time as new ways of using the interface emerge. Existing UEMs ‘divorce’ the system from the

use-situation in different ways. Expert-based UEMs do not usually involve users, cutting off the

most critical stakeholder group for the sake of a faster and cheaper evaluation process. Nielsen’s

(1994a) heuristics are arguably the most commonly used expert-based UEM because of their

simplicity and ease of use. However they are also clearly system focused, requiring evaluators

- 84 -

to determine things like “the visibility of system status”, “consistency and standards” of the

system, and “aesthetic and minimalist design” of the system. Other heuristics are designed from

the system point of view. For example, “Does the system help users recognize, diagnose and

recover from errors?” focuses on errors that can be attributed directly to the system (e.g. if the

system crashes). It does not take into consideration errors or problems indirectly caused by the

system (e.g. the way in which the system formats a sales data summary report may affect how

useful the report is to sales managers).

While the user is involved in user-based UEMs (traditional usability testing being the most

common), the purpose of the evaluation remains to assess the functions of the system. As Fath

et al (1994) point out, scenarios in a usability test are reflective of those aspects of the system

that are representative of its capabilities. The starting point for designing the evaluation is the

system itself. Designers and evaluators jointly determine what the system can do (i.e. the

functions of the system) and then develop scenarios to test how well the system does what it can

do. Therefore, one of the most significant limitations of our UEMs is that they are not designed

to evaluate whether the system is actually useful and supports the users’ real-life activities.

To overcome this challenge, the HCI community needs a UEM that will:

• Be user-driven, instead of system-driven.

• Be focused on the usability and usefulness of the system, and not on the system itself.

• Provide a framework for analysing purposeful user activities that the system supports.

2.10.2 UEM Challenge 2 [UEMC-2]: Lack of understanding how users’ goals and motives are formed

Even when employing user-based UEMs which involve users directly, the goals and motives of

the users are not real. For example, during usability testing, participants are given a set of tasks

or scenarios to complete. In essence, the participants are given a pre-determined goal and a

- 85 -

motive to complete the tasks. This implies that the process of user goal and motive formation is

omitted in usability evaluation. Yet this is one of the most critical aspects of actually using a

system in real life. By giving the user an artificially prepared goal and motive to carry out a task

using the system, the entire evaluation process is undermined because it is not representative of

an actual use-situation.

To overcome this challenge, the HCI community needs a UEM that will:

• Reflect the actual goals and motives of users.

• Incorporate actual user motives and goals into the evaluation process.

• Assess usability in relation to the users’ motives and goals to determine the usefulness

of a system.

2.10.3 UEM Challenge 3 [UEMC-3]: Lack of user involvement in the design of the evaluation and analysis of the evaluation results

The user does not actively participate in the planning and design of any UEM. Instead, expert

and model-based UEMs attempt to predict and quantify user performance, while user-based

UEMs are only focused on observing the user carry out pre-defined tasks in relation to the

functions of the system. Instead of being at the core of the evaluation process, the user either

exists implicitly as a set of assumptions in expert-based UEMs or as a controllable subject in

user-based UEMs. This approach distorts the importance of the user because it does not

emphasise the fact that it is the user who actually has to use the system in real life and therefore

should have a central role in designing the evaluation. Furthermore, the results of an evaluation

are analysed and interpreted by the evaluators with no input from the users. This introduces a

significant bias into the evaluation because the users’ perspective in interpreting the results is

omitted.

- 86 -

To overcome this challenge, the HCI community needs a UEM that will:

• Place the user at the centre of the evaluation process.

• Provide a framework for involving users in the evaluation process in a well-managed

way.

• Provide a means for users and evaluators to collaborate effectively.

2.10.4 UEM Challenge 4 [UEMC-4]: Limited understanding of users’ knowledge

Current UEMs conceptualise users’ knowledge of the system in terms of a single continuum

ranging from novices to experts. This continuum is flawed because it is only in relation to the

users’ knowledge about the system. Users are categorised as novices or experts depending on

how much prior experience they have had with the system. This prior experience is usually

measured in terms of the length and frequency of system use. What is missing from this one-

dimensional view is an insight into the users’ knowledge of the activity that the system

supports. For example, a personal accounting system (e.g. Quicken) supports the activity of

preparing annual tax return statements. If a user is only superficially familiar with the activity of

preparing a tax return statement, this will affect the way that he/she perceives the personal

accounting system and perhaps lead to the perception that the system is difficult to use and

ineffective. Clearly, a user’s knowledge of the underlying activity or domain will have an

impact on the way the system is conceived and used, and therefore, needs to be considered when

conceptualising users’ knowledge of a system.

To overcome this challenge, the HCI community needs a UEM that will:

• Take into account the users’ knowledge of the activity that the system supports.

• Provide a framework for capturing this knowledge.

- 87 -

• Assess the usefulness of the system in relation to the users’ knowledge of the activity

that the system supports.

2.10.5 UEM Challenge 5 [UEMC-5]: Lack of means for including contextual factors in an evaluation

Currently, there is no means for effectively understanding and analysing contextual factors in an

evaluation. Systems are used in social, cultural and physical contexts. They do not exist in

isolation nor are they removed from these contexts. Thomas and Kellogg (1989) refer to this as

a work-context ecological gap between a test situation and the real world. Placing users in a

usability laboratory, completely segregates them from their natural context or environment.

Providing the user with a task scenario to complete in a laboratory setting is akin to giving

someone a map with no landmarks or street names and telling them to use the map to get from

A to B. Although conducted in the users’ natural environment, field studies also suffer from a

limitation: the lack of HCI frameworks to analyse data collected from these types of studies.

The social context is particularly important because people rarely act alone. Human activities

are social and collaborative. A typical social context consists of various stakeholders that make

up the larger community of which the system is a part. For example, a community that might

use a travel booking system would consist of travel agents (users), travellers (customers),

airlines, hotels and tour operators (suppliers), as well as the system designers, IT support staff,

etc. Their interaction, activities and the outcomes of those activities all have an impact on the

way in which the system is used. It is necessary to incorporate the social aspect of users’

activities into an evaluation if meaningful and effective results are to be obtained.

To overcome this challenge, the HCI community needs a UEM that will:

• Identify all the different stakeholders in community and their activities.

- 88 -

• Reflect the social nature of system use, and the collaboration between all the

stakeholders in system use.

2.10.6 UEM Challenge 6 [UEMC-6]: Lack of understanding how the system and system use co-evolve over time

Current UEMs evaluate the system and how it is used at a single point in time. This is usually

done by setting brief and clearly defined task scenarios for users to complete in a pre-specified

time frame. This approach has a number of problems. Firstly, it does not take into account how

the system has evolved, nor does it provide an insight into the historical development of the use-

situation itself. User activities are not static. They change over time in a non-linear and ill-

defined way as new systems or procedures are introduced. It is important to take this evolution

into account because current user activities contain elements or remnants of previous use-

situations. UEMs in their current form, ‘freeze’ the system and the use-situation while the

evaluation takes place, yet in reality user activities are in a constant state of development and

the way in which a system is used changes as these activities develop.

To overcome this challenge, the HCI community needs a UEM that will:

• Explain how systems and user activities develop and co-evolve over time.

• Identify remnants of previous system use in current use-situations, explain why they are

still there and their implications.

• Evaluate the usefulness of systems in relation to ongoing and changing activities over a

prolonged period of time.

- 89 -

2.10.7 UEM Challenge 7 [UEMC-7]: No common vocabulary for describing evaluation processes and defining evaluation outcomes

The previous discussion has already highlighted the differences and problems in labelling UEM

processes and outcomes. Terms such as “goals”, “tasks”, and “techniques” are used to mean

different things in different evaluation methods. Similarly, and even more disturbingly, there is

no single standard for defining the outcome of evaluation – a usability problem. None of the

UEMs described in Section 2.8 provide a clear conceptualisation of what constitutes a usability

problem, nor do they provide any advice to evaluators on how to identify real usability

problems. This is not only an issue of UEM integrity, but also a source of confusion when trying

to compare UEMs and their results in order to determine which ones are more effective.

Furthermore, it is a source of misunderstandings in communicating UEM outcomes, leading to

inconsistencies in distinguishing real usability problems and “false alarms”.

To overcome this challenge, the HCI community needs a UEM that will:

• Use consistent terminology to describe usability evaluation processes and outcomes at

all stages of the evaluation.

• Have an unambiguous definition of what constitutes a usability problem to enable

evaluators to clearly identify and rate usability problems.

2.10.8 UEM Challenge 8 [UEMC-8]: Lack of a theoretical framework

“It is generally accepted that the lack of an adequate theory of human-computer interaction

(HCI) is one of the most important reasons that progress in the field of HCI is relatively modest,

compared with the rate of technological development” (Kaptelinin, 1996, p. 103). By extension,

the lack of an adequate theory of UEMs is a pre-requisite for advancing the field. Most current

UEMs are not derived from or underpinned by a theoretical framework against which systems

- 90 -

can be evaluated. The huge gap between HCI research and practice (Kuutti, 1996) can be

directly attributed to this problem. HCI practitioners have been slow to take up methods and

tools generated by HCI research primarily because they are not based on a theory that reflects

the way in which real users use real systems. The need for a fundamental theory is real and

immediate.

To overcome this challenge, the HCI community needs a UEM that will:

• Be based on a theoretical framework that enables evaluators to explain and analyse how

systems are used in real user activities.

• Allow evaluators to design evaluation processes that reflect actual system use.

2.10.9 Summary of UEM Challenges

As mentioned previously, the challenges described above can be directly attributed to the

traditional view of usability (as discussed in Section 2.2) which most UEMs rely on. Most of the

mainstream UEMs were developed in the 1980s and 1990s and only variations on existing

UEMs have been developed more recently (for example, automated versions of UEMs as

described by Tiedtke et al (2002) and Ivory and Hearst (2001)). This implies that the UEMs are

rooted in (and therefore, measure) the traditional view of usability, which localises usability as

an attribute or quality of the interface. The purpose of most UEMs is to simply to measure this

attribute. For example, the use of a laboratory for usability testing allows evaluators to focus

only on the system and eliminate any external or contextual variables that would interfere with

the assessment of the system performance. Similarly, usability testing requires a working

prototype of the system so that task scenarios are generated after the functions of the system are

known introducing a bias into the process because the tasks are based on the system functions,

- 91 -

rather than user activities. Instead of assessing whether the system does what the users want it to

do, current UEMs simply evaluate how well the system does what it has been built to do.

The key to overcoming the challenges identified above lies in expanding our horizons and

thinking about useful, rather than simply usable systems (Nardi, 1996b). Thinking about useful

systems implies that systems are designed to serve a purpose or to support user activities. The

HCI community needs a reliable and robust user-centred UEM that will evaluate how well those

activities are supported by the system in relation to the context in which they occur. The first

and second steps towards this UEM have been described above by defining an alternative notion

of usability (distributed usability) and identifying eight specific UEM challenges that need to be

addressed. The third step requires a theoretical framework to form the basis for such a UEM,

and at the same time to overcome the last challenge [UEMC-8]. Such a framework would have

to place the user and the user’s activities in context, rather than placing the system itself at the

centre of the evaluation process. Kuutti (1996) suggests that Cultural Historical Activity Theory

can provide this theoretical framework and also potentially address the ubiquitous fragmentation

of the HCI field. Before describing the basic principles of Cultural Historical Activity Theory,

however, it is important to situate it in relation to the existing HCI paradigm – Information

Processing.

2.11 Information Processing

Traditionally, HCI has based itself as a discipline in the domain of cognitive psychology, and

more specifically the information processing branch of this domain. The information processing

model (shown in Figure 2.20) is based on the notion of a symmetry between the internal

processes and functions of humans and computers. It equates the human brain to an information

processing device which transforms input stimuli collected by our senses into meaningful

symbols and appropriate output reactions. The model has been widely used in HCI as a means

of conceptualising users and user behaviours and maintained its stronghold on HCI methods

- 92 -

since Card et al (1983) first published their seminal work “The Psychology of Human Computer

Interaction”. However, critics of the information processing model (see for example: Winograd

& Flores, 1986; Whiteside & Wixon, 1987; Suchman, 1987; Ehn, 1988; Bannon, 1990; Bødker,

1991a; Greenbaum & Kyng, 1991; Kuutti, 1991; Kaptelinin, 1992; Draper, 1992) have argued

that the ideal of cognitive science and the use of experimental apparatus of laboratory oriented

classical psychology has been unable to truly penetrate the human side of human-computer

interaction (Kuutti, 1996) because humans do not process information in the same way that

computers do.

Figure 2.20 The information processing model (Card et al, 1983)

The information processing model reduces an individual to a set of cognitive processes that

include perception, attention, memory and knowledge representation. Based on assumptions

about these cognitive processes predictions are made about human behaviour. However, the

explanatory power of this model is low because it does not take into account the context in

which human behaviour occurs, therefore implying that human cognition is situated purely

- 93 -

inside the head. Critics have argued that this is not the case and that cognition can only be

understood in a larger social and cultural world (Lave, 1988). As a result, cognitive psychology

has proven to be of limited value to HCI. This is evident from the low adoption rate and the lack

of relevance of HCI research outcomes in actual practice. Landauer (1991) has been particularly

scathing in his censure, stating that “nothing remotely resembling what one would hope for as a

basis for HCI, nothing with substantial generality, power, and detail at the required level of

cognition has ever materialized” in a hundred years of the study of human behaviour.

It was Bannon (1991) who first formalised and recommended a new direction for HCI research

to bridge the gap between theory, experiment, system design and work settings. The elements of

this new direction are listed and briefly described in Figure 2.21.

Figure 2.21 Changes in HCI direction (Bannon, 1991)

The extent to which Bannon’s (1991) recommendations have been operationalised over a

decade later in current HCI research and practice remains questionable. However, the

recommendations can be viewed as an initial move towards a paradigm shift in HCI. If such a

shift were to take place, a whole new theoretical basis for HCI or a fundamental expansion of

- 94 -

traditional cognitive psychology would be required. Soviet based Cultural Historical Activity

Theory has emerged as a potential solution owing to its broad principles which offer a means of

addressing the key deficiencies of cognitive psychology.

2.12 Cultural Historical Activity Theory

Cultural Historical Activity Theory, or simply Activity Theory as it is widely known, has been

generating interest amongst HCI researchers since Bødker (1991a) published her PhD thesis,

“Through the Interface: A Human Activity Approach to User Interface Design”, a book now

widely regarded as the cornerstone of Activity Theory based research in HCI. It should be stated

at the outset that, despite its name, Activity Theory is not a full articulated theory per se.

Instead, it is a multi-disciplinary philosophical framework for studying human behaviour and

practices as developmental processes interlinking individual and social levels from a cultural-

historical perspective (Kuutti, 1996). As such Activity Theory is a powerful clarifying tool

(Nardi, 1996b), a multi-faceted magnifying glass that offers a set of principles and ideas for

understanding and explaining human activities in context. Brought to the West by Wertsch

(1981), Activity Theory has been widely adopted in European countries in the work of

Engeström (1987; 1990), Bødker (1991a), Kuutti (1991; 1992) and Kaptelinin (1992; 1996), as

well as Cole (1988; 1996), Draper (1992) and Nardi (1996a) in the US. It espouses “active

subjects whose knowledge of pre-existing material reality is founded on their interactions with

it” (Wertsch, 1981, p.10). Providing a complete overview of Activity Theory is beyond the

scope of this thesis. Instead, the following sections will present a brief historical overview of its

development, a discussion of its key principles and an outline of how it has been applied in HCI

to date.

- 95 -

2.12.1 Activity Theory – A Historical Perspective

Unlike their Western counterparts, Soviet psychologists have traditionally based their work on

Marxist theory. (It remains unclear whether Marx was a genuine muse for Soviet researchers, or

whether research was sanctioned only if it was grounded in Marxism as the prevailing official

political philosophy of the time.) Working within the boundaries of Marx and Engels’ notion of

dialectic materialism and inspired by the German philosophy of Kant and Hegel, Lev

Semionovich Vygotsky was concerned with developing a theory of psychology that would

explain "the way natural processes such as physical maturation and sensory mechanisms

become intertwined with culturally determined processes to produce the psychological functions

of adults" (Luria, 1979, p. 43). In other words, Vygotsky set out to answer the question of how

human consciousness is generated. He rejected the cognitivist stimulus-response theories of

behaviour as described by Pavlov (1960) and the reductionist approach of Wundt (1907), which

also broke down consciousness into a series of stimulus-response chains. Instead, Vygotsky

(1987, p. 282) proposed that:

“thought … is not born of other thoughts. Thought has its origins in the motivating

sphere of consciousness, a sphere that includes our inclinations and needs, our

interests and impulses, and our affect and emotion. The affective and volitional

tendency stands behind thought. Only here do we find the answer to the final “why”

in the analysis of thinking.”

Essentially, Vygotsky (1978) proposed that instead of directly interacting with the environment,

human interactions are realised in the form of “objective” activity and mediated through the use

of tools. This notion of tool mediation is central to Activity Theory and based on the ideas of

Marx (1915) who wrote in “Capital”: “an instrument of labour is a thing […] which the labourer

interposes between himself and the subject of his labour, and which serves as the conductor of

- 96 -

his activity” (p. 199). The difference between the traditional notion of unmediated behaviour

(which is typical of animals) and mediated human behaviour, is shown in Figure 2.22.

Figure 2.22 Unmediated vs. mediated Behaviour (Vygotsky, 1978, pp. 39-40)

Vygotsky (1978) argued that external human activities mediated by tools are the primary

generators of consciousness. The tools themselves are transmitters of cultural knowledge and

are socially derived devices which embody human practices. The human mind, therefore, is a

product of active socio-cultural interactions with the external environment using material and

psychological tools. Material tools are physical tools that allow humans to master nature (for

example, an axe) while psychological tools, which Vygotsky was particularly interested in,

enable humans to control their behavioural processes and influence their mind or behaviour.

Psychological tools include language, signs, symbols, maps, counting systems, laws, methods,

etc.

Even though Vygotsky focused on the independent exploration and interaction with the material

world as the basis for human consciousness, he emphasised the role of language and speech

(Wertsch, 1981). It was his belief that human behaviour is transformed by internalising cultural

sign systems, thus shaping mental processes and bringing about individual development

- 97 -

(Vygotsky, 1978). Herein lies the influence of the Marxist notion of dialectical materialism on

Vygotsky because it highlights the dual nature of tools. The set of tools that are available to

humans at a certain point in time contain the total societal knowledge and reflect the human

activities or practices of the time. As these activities evolve historically, so do the tools.

However, changes in the tools bring about transformations in the activities themselves.

Internalisation is the transfer of these external activities onto the internal plane, thus resulting in

the development of the individual’s mind.

To Wertsch (1981) the notion of internalisation, as “the ontogenesis of the ability to carry out

socially-formulated, goal-directed actions with the help of mediating devices” (p. 32), represents

Vygotsky’s most significant contribution. By rejecting prevailing theories of human behaviour,

Vygotsky, together with one of his most ardent followers, Alexander Luria, demonstrated that

the human mind is developed through the internalisation of external, social activities that are

mediated by the cultural tools and evolve historically. In current literature, Vygotsky’s central

notion of tool mediation has been represented as shown in Figure 2.23. This model can be

thought of as the first generation of Activity Theory.

Figure 2.23 Current conceptualisation of tool mediation in Activity Theory literature

Much of Vygotsky’s work was too general, abstract and theoretically complex to apply in

practice. Furthermore, there is a lack of empirical evidence or raw data to support Vygotsky’s

research. Instead, his contribution is measured in philosophical terms as a set of broad principles

which have been interpreted in a variety of ways and which form a useful starting point for

Subject Object

Tool

- 98 -

explaining human behaviour. As a psychologist, most of Vygotsky’s work was oriented towards

the individual, and even though he situated the development of the mind in a socio-cultural

context, his model (as shown in Figure 2.23) did not incorporate collaborative aspects of human

behaviour. Vygotsky died of tuberculosis at the age of 38 before his work was completed. It was

his student, Alexei Nikolaivitch Leont’ev who formalised Activity Theory and expanded it by

making a distinction between individual actions and collective activity. Leont’ev’s work marks

the second generation of Activity Theory.

Leont’ev (also spelt as Leontyev, Leontjew and Leontiev) took up Vygotsky’s work as a

departure point for further investigations towards an integrated “theory of activity” (Leont’ev,

1978). Rather than dealing with activities in a general, abstract sense, Leont’ev (ibid) proposed

dealing with specific activities. He defined a specific activity as a “system that has structure, its

own internal transitions and transformations, its own development” (p. 50). The structure is

shown in Figure 2.24. It appears at first glance to be a simple hierarchy of activity levels.

However, the apparent simplicity is misleading because the hierarchy is dynamic and flexible,

and offers a powerful means for analysing activities at three different levels. The structure can

be briefly described as follows: an activity (“deyatelnost”) is directed towards a motive (or

motives) based on a human need or needs. It is achieved through a series of goal-oriented

actions or chains of actions. The actions themselves consist of operations that are determined by

the available conditions at the time.

Figure 2.24 Hierarchical structure of an activity (Leont’ev, 1978)

- 99 -

Leont’ev (1978) proposes that all collective activities are directed towards a single object. The

object of an activity is its “real motive”, whether material or ideal, real or imagined.

Transforming the object into an outcome over an extended period of time is what motivates the

existence of an activity (Kuutti, 1996). Therefore all activities exist to satisfy this “true motive:

“An activity does not exist without a motive; ‘non-motivated’ activity is not activity without a

motive but activity with a subjectively and objectively hidden motive” (Leont’ev, 1978, p. 59).

Motive is a prerequisite requirement for an activity to occur because it is the fundamental need

that a human is seeking to satisfy through the activity. This is done through a series of actions

(“deistvie”) or chains of actions.

Actions are conscious individual short-term processes or tasks that translate the activity into

reality and are subordinated to a goal or set of goals (Leont’ev, 1978). Actions and goals are

conscious representations of desired outcomes that are limited in duration compared to an

activity (which has a longer life span). An action has an intentional aspect (what must be done)

as well as an operational aspect (how it can be done). Any number of actions can be directed

towards the achievement of any number of goals, and the same action can be used to realize

different activities. A single action can also be carried out iteratively to achieve the desired goal.

These actions often become routine with practice and humans internalise them as operations.

Operations (“operatsiya”) are the means by which an action is executed. When an action is

performed repeatedly, it is internalised and suppressed to the unconscious level. Operations are

actions which do not require conscious thinking to carry out (e.g. holding a pencil when writing

a letter is an operation). Operations are usually crystallised into tools (Leont’ev, 1978) because

they take on the character of machine-like functions. Wartofsky (1979) refers to those tools

which correspond to the level of operations as primary tools. The operational aspect of activities

is defined by and dependent on the conditions under which it is performed. If these conditions

- 100 -

change, the operation may be transformed into an action or series of actions. The goals of an

action are embedded in those conditions.

Leont’ev’s hierarchy is not a hierarchy in the traditional sense because unlike other similar task

analysis models in HCI (e.g. GOMS by Card et al (1983)), it is dynamic. If the conditions of an

activity change, the activity itself is dynamically transformed, implying that conditions can

shape an activity. For example, the levels of the activity can move up from operation level to

action level when conditions change (this is referred to by Bødker (1991a) as

conceptualisation), and vice versa when the previous conditions are restored or the new action

is internalised (this is known as operationalisation). Hence, the use of arrows in both directions

in Figure 2.24. Regardless of these changes, however, the motive of the activity remains

constant. This brings to prominence an important facet of Activity Theory and one which

Leont’ev emphasised: the dynamic nature of the activity hierarchy implies that an activity

cannot be simply reduced or decomposed into a set of basic elements, such as actions and

operations because if the conditions change, the activity is reshaped at every level. Instead, the

activity hierarchy enables us to understand the relationships between the actions and operations

that constitute an activity. This flexible hierarchy reflects the nature of human activities better

than traditional HCI hierarchies (such as GOMS and task analysis) because it captures the way

humans respond to changes in conditions and redirect their focus appropriately. It is a means of

disclosing the characteristic internal relations and transformations of an activity and allows an

analysis of the activity from a variety of viewpoints to be performed (Wertsch, 1981).

To illustrate the activity hierarchy, Leont’ev (1978) used the example of driving a car. To a

learner, every operation is a conscious action which is directed towards a specific goal. For

example, changing gears is an action directed towards the goal of changing the speed of the car.

Through repetition, this action is internalised into an operation so that experienced drivers do

not consciously think about changing gears. An experienced driver is only consciously aware of

the action of driving the car with the goal of getting from A to B, while changing gears is

- 101 -

performed automatically. However, if a situation arose where the gear lever didn’t function as

expected (e.g. it got stuck) then the operation of changing gears would be transformed into an

action because the conditions of the activity have changed and driver now has to consciously

turn his/her attention to the gearbox. This does not affect the goals or the motive of the activity.

They remain unchanged. This example is illustrated in Figure 2.25.

Figure 2.25 The activity hierarchy – An example

Leont’ev’s work made a significant contribution towards creating a more coherent theory of

activity. He reinforced the notion of meaningful activity as the basic unit of analysis and

illuminated the hierarchical structure of activities. Kuutti (1996) views these multiple levels of

an activity as a significant contribution of Activity Theory because it allows multi-level analysis

to take place. Unlike traditional HCI task analysis models (e.g. GOMS by Card et al (1983)), the

Activity Theory hierarchy provides a flexible structure to analyse what users do. According to

Kuutti (1996) there are no firm borders between activities, actions and operations. What is an

activity to one person, may be an action to another. Similarly, if the conditions in which an

ACTIVITY

ACTIONS

OPERATIONS

To get from A to B

Drive vehicle Find parking

Press clutch Change gear Turn steering

ACTIVITY

ACTIONS

OPERATIONS

To get from A to B

Change gear Drive car

Press clutch Turn steering

Normal activity of experienced driver Activity of experienced driver if the gear gets stuck

- 102 -

activity is carried out change, an action may become an activity in its own right. As a rule of

thumb, to make a distinction between activities and actions, Kuutti (1996) suggests that

activities are longer-term formations, while actions are short-term processes that are goal-

oriented.

Despite his contribution, Leont’ev has been criticised for focusing too much on the activity

itself, and not enough on those who engaged in it (Lektorsky, 1990, cited in Engeström et al,

1999a). Although he made a distinction between individual actions and collective activity, he

did not elucidate on the role of individuals in the activity. This element remained unaddressed in

his hierarchy and was clarified much later by a Finnish researcher – Yrjo Engeström.

Engeström’s work marks the beginning of the third (and current) generation of Activity Theory

which has the notion of an activity system at the centre. An activity system is an expanded

version of Vygotsky’s original model of mediated human behaviour (as depicted in Figure 2.23)

which incorporates the collective and collaborative nature of activities through the inclusion of

the subject, the community and the division of labour. The activity system consists of six

elements (or nodes) as shown in Figure 2.26.

Figure 2.26 The activity system (Engeström, 1987)

- 103 -

In Engeström’s (1987) model, the subject is the person or sub-group from whose point of view

the activity is analysed. Thus the subject can be an individual or a collective group. The object

is seen as the “raw material” or “problem space” towards which an activity is directed. The

object reflects the purposeful nature of human activities which are undertaken in order to satisfy

motives. The relationship between the subject and the object is mediated through the use of

material and psychological tools. The subject uses these tools to transform the object into an

outcome. Therefore, the historical relationship between the subject and the object is condensed

in the tools (Kuutti, 1996). The community consists of individuals or stakeholders who share the

subject’s object. The addition of this community element permits us to understand how an

activity is situated in a socio-cultural environment, and forms two new relationships: subject-

community and community-object. These relationships are mediated by the rules that regulate

the activity and the division of labour, respectively. The rules refer to both explicit and implicit

regulations, social conventions and norms which constrain the activity system. The division of

labour describes the community roles, or the horizontal division of tasks and vertical division of

power and status between the members of a community (Engeström, 1987). The elements or

nodes of an activity system do not exist independently of each other, nor is an activity system

reducible to these elements. They are interrelated and can only be understood as a systemic

whole.

The activity system is a heterogeneous, disturbance and innovation-producing entity that is

constantly being renegotiated by the subjects as they construct the object and other system

elements in conflicting ways (Engeström, 1999). There is also movement between the elements

of the activity system (Engeström & Middleton, 1996). For example, when the object is

transformed into an outcome, it may become a tool and then later a rule. Similarly, a rule may

be re-interpreted and turned into an object or a tool. The activity system exists as one

component of a larger network of activity systems with which it interacts. These external

influences are appropriated into the activity system under study and subsequently modified into

- 104 -

internal factors. When an alien element is introduced into the activity system, an imbalance

occurs resulting in so called “contradictions” within and between the activity system nodes.

A contradiction is a “misfit within elements, between them, between different activities, or

between different developmental phases of a single activity. Contradictions manifest themselves

as problems, ruptures, breakdowns, clashes” (Kuutti, 1996, p. 34) in the activity system.

Initially, a contradiction emerges as an “individual exception” in the activity system and as it is

assimilated, it becomes the new “universal norm”, consequently causing the activity system to

develop (Il’enkov, 1977). A hierarchical system of contradictions is inherent to the activity

system, and Engeström (1999) identifies four levels of contradictions:

1. Primary contradictions occur within the elements or nodes of the central activity,

usually between the value and exchange value of an element. As an example,

Engeström (1999) cites the primary contradiction in the object of a doctor’s work. The

contradiction manifests itself in the clash between a patient being viewed as a person

who needs to be helped (the value of the doctor’s object), as opposed to a source of

revenue (the exchange value of the doctor’s object).

2. Secondary contradictions arise between the elements of the central activity system,

usually when a new external element enters the activity system. Following the previous

example, if a new medical instrument is introduced into a nurse’s work enabling the

nurse to perform tasks that are usually carried out by a doctor, a secondary contradiction

between the tool and the division of labour will emerge.

3. Tertiary contradictions take place between the object of the activity and the object of a

more culturally advanced activity, if a culturally advanced object is introduced into the

activity system. For example, if a new model for treating patients is adopted, problems

will arise as it collides with existing practices.

4. Quaternary contradictions occur between the activity and its ‘neighbouring’

activities, such as the tool-producing activity, subject-producing activity, etc. Engeström

- 105 -

(1999) uses the example of a doctor who practices alternative medicine referring a

patient to a hospital using the traditional medical approach.

All four levels of contradictions are shown in Figure 2.27, with each level represented by the

corresponding number (i.e. primary (1), secondary (2), tertiary (3) and quaternary (4)). In order

to understand how these contradictions cause an activity system to develop, it is necessary to

analyse the relationships within an activity and between activities, as well as the hierarchical

structure of the activity as explicated by Leont’ev (i.e. the goal directed actions and the

conditions under which an activity takes place).

Figure 2.27: Four levels of contradictions in a network of activity systems (Engeström, 1999)

2.12.2 Basic Principles of Activity Theory

As described in the previous section, Activity Theory is a complex conceptual framework that

has evolved historically, and continues to evolve as it is applied in research and in practice. In

- 106 -

order to highlight the fundamental tenets of Activity Theory and explicate its theoretical

scaffolding Kuutti (1996) and Kaptelinin (1996) provide a useful summary of the basic

principles on which Activity Theory is constructed. All of the principles are equally important

and cannot be understood in isolation from one another. Giving a comprehensive description of

each principle is beyond the scope of this thesis (particularly, as some of the principles continue

to be debated extensively in research communities), and for a more detailed discussion the

reader is referred to Kaptelinin (1996), Kuutti (1996), Cole (1996) and Wertsch (1981).

Principle 1: Activity as the Basic Unit of Analysis

Activity Theory provides an intermediate concept, that of an activity, to resolve the dichotomy

between studying individuals and social systems (Kuutti, 1991). This dichotomy has resulted in

research methods which either remove individuals from their natural social environment (e.g.

laboratory-based usability testing) or focus on analysing collective interactions at the expense of

the individual (e.g. field studies). By adopting the notion of an activity as the minimum

meaningful context for understanding individual actions, which takes place over an extended

period of time (Kuutti, 1996; Hasan, 2000), we can include the context in the unit of analysis,

yet still view the activity from an individual perspective: “the object of our research is always

essentially collective, even if our main interest is in individual actions” (Kuutti, 1996, p. 26).

Leont’ev (1981) uses an example of the bush beaters and hunters to demonstrate this principle.

A beater taking part in a primeval collective hunt directs his actions at frightening the animals

towards the hunters who are hiding in ambush. The action of frightening the animals itself

appears irrational and senseless unless it is understood as part of the larger context of the

hunting activity. This principle highlights the importance of studying human activities in

context, which is of direct relevance to HCI research and practice as discussed previously in this

chapter.

- 107 -

Principle 2: Object-orientation

The term object-orientation in the Activity Theory context should not be confused with object-

orientation in the software engineering domain. The two are homonyms. The principle of

object-orientation refers to the “objectified motive” of an activity (Christiansen, 1996). Leont’ev

(1981, p. 48) explains:

“The basic characteristic of activity is its object orientation. The expression

“nonobjective activity” is devoid of sense. Activity may seem to be without object

orientation, but scientific investigation of it necessarily requires discovery of its

object. In this regard, the object of activity emerges in two ways: first and foremost,

in its dependent existence as subordinating and transforming the subject’s activity,

and secondly, the mental image of the object, as the product of the subject’s detecting

its properties. This detection can take place only through the subject’s activity”.

The object, whether material (conscious) or ideal (unconscious), is of value in itself in that it

serves to fulfil some human need (Kaptelinin, 1992). Every activity is directed towards this

object and defined by it. Therefore, we are able to distinguish between activities according to

their object. Manipulating and transforming a shared object into an outcome (usually over some

extended period of time) is what motivates the very existence of a purposeful activity (Kuutti,

1996). As Leont’ev points out above, the object reveals itself only in the process of doing.

Consequently, the object is continuously under construction and manifests itself in different

forms for different participants of an activity (Engeström, 1990).

It should be noted that the object of an activity does not equate directly to the motive

(Kaptelinin, 2002). While different individuals participating in an activity may have different

motives for doing so, and the motives for carrying out an activity may change over time, this

does not affect the actual object of the activity. For example, if the object of a systems

development project is to build a system to make processing more efficient, the motives for

doing so may vary from cutting costs (from the managers’ perspective) to improving customer

service (from a marketing perspective). A single object gives the activity direction and purpose.

- 108 -

It is a “powerful sense maker” (Kaptelinin, 2002). However, at the same time the activity may

be polymotivated. Considerable debate still surrounds the notion of object-orientation in

Activity Theory. Most recently at the Fifth Congress of the International Society for Cultural

Research and Activity Theory (now the International Society for Cultural and Activity

Research) a session titled “Perspectives on Objects of Activities” highlighted the problems

associated with interpreting the principle of an object as defined by Leont’ev and distinguishing

objects from motives, goals and outcomes.

Principle 3: Tool Mediation

Tool mediation is often considered to be the most fundamental of the Activity Theory principles

by virtue of the fact that this is the very notion Vygotsky based his original work on. “The tool

mediates activity and thus connects humans not only with the world of objects but also with

other people. Because of this, humans’ activity assimilates the experience of humankind”

(Leont’ev, 1981, p. 56). An activity contains tools and these tools have a mediating role which

implies that the relations between the elements of an activity are not direct, but facilitated by an

intermediate entity (i.e. the tool). As mentioned previously, tools can be physical or

psychological. However, regardless of the type, all tools are transmitters of cultural knowledge

(Kaptelinin, 1996) or a historical residue of activity development (Kuutti, 1996). Tools are

created to carry out a specific activity at a certain point in time, and they embody the practice of

this activity at the time. Thus, tools are repositories of human activities which reveal themselves

fully only when in use (Bannon & Bødker, 1991). Tools have implicit goals built into their

internal structure and this shapes and transforms the external activity of individuals. When the

external activity is internalised, the tools influence and transform the internal mental processes

of individuals about the activity (Kaptelinin, 1996). However, while tools have the ability to

transform an activity, the activity itself affects the way in which the tools are used and therefore

shapes future designs of the tool. This reveals two important and interrelated characteristics of

tools: their dialectic nature and their historical development. The dialectic nature exhibited by

tools can be seen in their role as shaping an activity and at the same time being shaped by the

- 109 -

activity (tools and activities co-evolve). The way in which tools are shaped by the activity

implies that they are “historical devices which reflect the state of praxis” (Bannon & Bødker,

1991). In other words, they can be thought of as “crystallized knowledge” about an activity,

implying that in order to understand a tool in its most current form and how it is being used, it is

necessary to carry out a historical analysis of the tool and the activity (ibid).

Tools both enable and constrain an activity. The enabling aspect of a tool “empowers the subject

in the transformation process with the historically collected experience and skill “crystallized”

to it” (Kuutti, 1996, p. 27). It facilitates the activity by extending natural human abilities

(Kaptelinin, 1996). However, at the same time, tools have the ability to restrict the activity. For

example, by requiring students to enroll in courses using an online web-based information

system, we limit the activity to be only “from the perspective of that particular tool” (Kuutti,

1996, p. 27). This notion is incorporated in an attribute that is common to all tools: the

integration of tools into functional organs. “Functional organs are functionally integrated, goal-

oriented configurations of internal and external resources.” (Kaptelinin, 1996, p. 50). When

external tools, such as computers, are integrated into functional organs, they are perceived as an

attribute of the individual, implying that they naturally extend the individual’s abilities, thus

blurring the boundary between internal (based inside the human mind) and external (based in

the outer world) tools. In the context of HCI, this merging of internal and external tools is

particularly evident in expert users who use computers transparently, i.e. as a seamless

extension of their own abilities. For novice users, who are still learning how to use the

computer, the boundary between internal and external tools is most apparent. It is only by using

the tool repeatedly to carry out an activity that the boundary becomes less clear.

Principle 4: Hierarchical Structure of Activity

This principle has been described in detail previously and is based on Leont’ev’s (1981)

structure of an activity (as shown in Figure 2.24). At the highest level are activities which are

directed towards motives. Actions are at the next level. Actions are processes functionally

- 110 -

subordinated to activities and directed towards conscious, auxiliary goals. The actions are

realised through operations, at the lowest level. Operations are determined by the conditions of

the activity. Kaptelinin (1996) explains that, in order to understand human behaviour, it is

important to determine what level of the activity it is directed towards – the motive, goal, or the

actual conditions.

Principle 5: History and Development

The previous discussion highlighted another key principle of Activity Theory: historical

development. A system of inner contradictions (as described in Section 2.12.1) is inherent to an

activity and causes it to develop over time (Engeström, 1990). However, this development is not

linear or structured in a predictable pattern. It is irregular and discontinuous (Kuutti, 1996) as

foreign elements are introduced into and absorbed by the activity. As an activity develops over

time, remnants of older phases of the activity remain embedded in the development process

(ibid). These remnants are embodied in the mediating elements of the activity: tools, rules and

the division of labour. Therefore, in order to understand a current activity, it is important to

analyse its historical development. This historical context is essential for two reasons: activities

develop in an unpredictable manner, and it is not possible to clearly delineate between stages or

phases of an activity. Activities are dynamic and in a continuous state of evolution, with

development taking place at all the different levels of an activity (Kuutti, 1996). By analyzing

the elements (especially, the mediating elements) and the inner contradictions of an activity over

time, it is possible to gain an insight into this evolutionary development process and situate the

activity in its historical context.

Principle 6: Internalisation - Externalisation

Activities have a dual nature because they have an internal and an external side (Kuutti, 1996).

This nature is explicated in Vygotsky’s (1978) original model (see Figure 2.23) by the

relationship between the subject and the object. The subject transforms the object through the

use of mediating tools, while at the same time, the attributes of the object penetrate the subject

- 111 -

and transform his/her mind (Kuutti, 1996). This is referred to as the process of internalisation.

According to Leont’ev, during internalisation “processes are subjected to a specific

transformation: they are generalized, verbalized, abbreviated, and, most importantly, become

susceptible of further development which exceeds the possibility of external activity” (1974, p.

18). Leont’ev (1981) stresses that the process of internalisation is not just a simple transfer of

external activities to a pre-existing internal plane. It is the process by which the actual internal

plane is formed. Vygotsky (1978, pp. 56-57) describes the notion of internalisation as a series of

transformations:

“(a) An operation that initially represents an external activity is reconstructed and

begins to occur internally. […] (b) An interpersonal process is transformed into an

intrapersonal one. […] (c) The transformation of an interpersonal process into an

intrapersonal one is the result of a long series of developmental events.”

Vygotsky reveals several important unifying facets of his theory through the concept of

internalisation. Firstly, higher psychological functions (i.e. the mind) are developed by

internalising practical, external, tool-mediated activities. This is in direct contrast to cognitive

psychology which views the mind as generating activity, rather than being generated by it.

Secondly, the external activities which are internalised are social and collaborative in nature

which implies that higher psychological functions “originate as actual relations between human

individuals” (Vygotsky, 1978, p. 57). Kuutti (1996) refers to this as the assimilation of the

experience of humanity. Thirdly, higher psychological functions are generated in a

developmental fashion. Once internalised, the mental representations are externalised by

individuals in the process of carrying out mediated activities. The external activities are then

internalised and this cycle continues in an evolutionary fashion forever. Even though we speak

of internalisation and externalisation as two separate processes, they are similar in structure and

inextricably intertwined through mediating tools in the, previously described, notion of a

functional organ.

- 112 -

According to Kaptelinin (1996), in the context of computer use, there are several kinds of

functional organs. The most important of these is an extension of the internal plane of actions

(IPA). The IPA is an Activity Theory construct which refers to the ability of humans to perform

manipulations on an internal representation of external objects, prior to performing actions with

these objects in reality (ibid). The IPA is distinct from mental models because it does not refer

to a specific representation, but to an ability to create and transform representations. It is a

process rather than an entity on the mental plane. Through the process of internalisation, an

individual “acquires the ability to perform some actions “in mind” and in this way avoids costly

mistakes and becomes free from the immediate situation” (Kaptelinin, 1996, p. 52). The IPA is

the system of mental structures that makes it possible to do this. Since computers enable users to

model, manipulate and evaluate objects, they can be thought of as extensions of the IPA (Figure

2.28). Two of the challenges faced by HCI are the lack of knowledge about the internalisation

process and the IPA, and the shortage of methods (or tools) to model internalised activities and

the computer as an extension of the IPA.

Figure 2.28 The computer tool as an extension of the internal plane of actions (Kaptelinin, 1996, p. 52)

- 113 -

Principle 7: Unity of Consciousness and Activity

Kaptelinin (1996) calls this the most fundamental principle of Activity Theory. Activity Theory

was born out the perceived need by Soviet psychologists for a better way of explaining how

human consciousness is generated. Advocates of behavioural psychology view consciousness as

an outcome of the impact that external objects have on an individual through the senses.

However, Vygotsky (1978) viewed consciousness as emerging from an individual’s activities in

the material world. As such it could not be understood separately from man’s living, practical

connections with the surrounding, social world (Leont’ev, 1975, cited in Wertsch, 1981).

Activity Theory is the framework that enables us to understand how the process of

consciousness generation takes place. “Activity theorists argue that consciousness is not a set of

discrete disembodied cognitive acts (decision making, classification, remembering), and

certainly it is not the brain; rather, consciousness is located in everyday practice: you are what

you do. And what you do is firmly and inextricably embedded in the social matrix of which

every person is an organic part” (Nardi, 1996b, p. 7). This implies that the development of the

human mind can only be understood in the context of purposeful, social activities as described

above.

Together with Leont’ev’s (1981) hierarchy levels of an activity and Engeström’s (1987) activity

system, these seven principles make up a powerful explanatory conceptual framework that has

been applied in a variety of disciplines. The use of Activity Theory in the study of organizations

and computer information systems has been widespread in the last decade, particularly in

Scandinavian countries. The work of Engeström has been concerned with formulating a cyclical

model of expansive learning (1987; 1990), the Developmental Work Research (DWR) method,

and more recently the concept of ‘knotworking’ (Engeström et al, 1999b). Hill et al (2001) have

applied Engeström’s DWR method to the study of organizations in New Zealand. Blackler

(1995) is well known for his application of Activity Theory in the field of Computer Supported

- 114 -

Collaborative Work (CSCW) and for proposing a theory of organizations as activity systems

(1993). Kari Kuutti (1991; 1996; 1999) has been an avid advocate of applying Activity Theory

in CSCW (Kuutti & Arvonen, 1992) and the Information Systems domain and has studied

organizational memory (Kuutti & Virkkunen, 1995), and information systems in networked

organizations (Kuutti & Molin-Juustila, 1998) using Activity Theory. For other sources of

research employing Activity Theory in the study of organizations and information systems, the

reader is referred to Star (1989; 1997), Korpela et al (2000; 2002), Bardram (1997), Ellison and

McGrath (1998), Fjeld et al (2002), Gifford and Enyedy (1999), Miettinen (1999), Lim and

Hang (2003), Cluts (2003) and Vrazalic (2001). Since this thesis is situated in the field of

Human-Computer Interaction, the following section will discuss the application of Activity

Theory to HCI research specifically.

2.12.3 Activity Theory in Human-Computer Interaction

Activity Theory has been widely touted as being ideally suited to the study of HCI because it

places the interaction of humans and computers in a socio-cultural context of activities, and

views computers as just another mediating tool. Activity Theory first appeared in HCI research

in the work of Susanne Bødker (1991a). Her work was instrumental in formulating a new

direction for HCI research. Bonnie Nardi (1996a) later paved the way for Activity Theory in

HCI in her aptly named edited book “Context and Consciousness: Activity Theory and Human-

Computer Interaction” consisting of a collection of articles demonstrating how Activity Theory

was being used in the study of HCI. Shortly after, Viktor Kaptelinin and Bonnie Nardi (1997)

took Activity Theory to the HCI community by presenting tutorials at conferences such as

CHI97. In 1999, Kaptelinin, Nardi and Macaulay developed the first Activity Theory based HCI

method: the Activity Checklist. Other researchers who have applied Activity Theory in HCI are

Olav Bertelsen (1998; 2000) and, most recently, Daisy Mwanza (2002). Describing all the

different studies of the above researchers is beyond the scope of this thesis. Therefore, only the

key issues raised in their research will be discussed in the following sections.

- 115 -

2.12.3.1 First Steps “Through the Interface”

If we are to consider, for a moment, HCI research as a purposeful mediated activity system, then

Bødker’s (1991a) work introduced “a series of contradictions into the activity system”, to use

Activity Theory terms. Bødker argued for a shift in HCI design because “a computer

application, from the user’s perspective, is not something that the user operates on but

something that the user operates through on other objects or subjects” (1991a, p. 1). She

proposed that design, from a theoretical perspective, needed to focus on human work activity

and use, instead of the application itself. To demonstrate how this can be done she introduced

the notion of breakdowns which are described as “unforeseen changes in the material

conditions” of an activity (1991a, p. 22). Breakdowns occur when an operation is

conceptualised. Conceptualisation is the opposite of operationalisation (i.e. when actions are

turned into operations). When a conflict occurs between the assumed and actual conditions for

an operation, this causes an individual to conceptualise an operation to the level of a conscious

action, therefore causing a breakdown. In the example used previously to demonstrate

Leont’ev’s activity hierarchy (Section 2.12.1), the vehicle’s gear getting stuck results in a

change of the material conditions of the activity. This causes the driver to conceptualise the

operation of changing gears to the level of a conscious action (i.e. it constitutes a breakdown).

Since it is the use of the tool that is operationalised, a conceptualisation occurs if the tool does

not perform as expected. In the case of HCI, the tool is the system or user interface.

Bødker (1991a, p. 40) distinguishes between three aspects of a user interface: physical, handling

and subject/object-directed aspects. Physical aspects provide support for operations toward the

computer application as a physical object; handling aspects provide support for operations

toward the computer application; and subject/object-directed aspects are the necessary

conditions for operations directed towards the object or the subjects dealt with through the

computer application. A well designed user interface will provide ideal conditions for all three

- 116 -

types of operations so that the user is able to focus on the objects or subjects that he/she is

working with. If this is the case, the user is said to be operating “through the interface” on the

objects or subjects, and not on the interface itself. Bødker uses the example of writing a

document to illustrate this notion. If an individual is using a text editor to write a document and

the interface of the text editor is a good one, the individual will eventually forget that he/she is

working with a computer and focus only on the object (i.e. producing the document). If, on the

other hand, the text editor becomes the object of the individual’s activity, then he/she is left

without a tool (Hasu & Engeström, 2000). The goal of HCI, then, is to ensure systems allow

users to work “through the interface”. In her book, Bødker (1991a) proposes two design

approaches to achieve this: the tools approach and the linguistic approach (p. 122). However,

she states that these approaches are not prescriptive and are also incomplete.

Bødker (1996) later provided a more concrete method of applying Activity Theory in HCI

design. To study computers as tools-in-use, she recommended analyzing breakdowns (as

described above) and focus shifts. A focus shift is a “change of focus or object of the actions or

activity that is more deliberate than those caused by breakdowns” (p. 150). Unlike breakdowns

which usually occur because the tool functions in an unexpected way, a focus shift is a result of

trying to articulate the “otherwise unarticulated” (ibid). For example, when trying to explain to a

novice driver how to change gears, the expert driver is articulating what is to him/her an

operation. Thus, the experienced driver is shifting focus from his/her ‘operationalised’

understanding of changing gears, to the level of an action or activity in order to explain

operation of changing gears to the novice. The conceptualisation in this case (from operation to

action or activity) is not caused by a breakdown in the experienced driver’s interaction with the

car. Bødker (1996) then proposed a checklist of questions to carry out an analysis using focus

shifts and breakdowns. This checklist is shown in Figure 2.29.

- 117 -

Figure 2.29 A checklist for HCI analysis through focus shifts and breakdowns (Bødker, 1996, pp. 168-169)

2.12.3.2 Formalizing Activity Theory in HCI: “Context and Consciousness”

Bødker’s (1991a; 1991b; 1996; 1997) work in the 1990s provided the foundation for applying

Activity Theory in HCI. (For more recent research, the reader is referred to Bødker (2000) and

Bødker and Buur (2002)). Bonnie Nardi followed Bødker’s lead in 1994 by initially using

Activity Theory to reflect on her own design practice, concluding that in studying task-

specificity in software design, Activity Theory would have provided a “well-articulated

conceptual apparatus and a core set of concepts” for analyzing the collected data (Nardi, 1992,

p. I-11). Later, she collated and edited a volume of articles by leading Activity Theorists aimed

at introducing Activity Theory concepts to HCI researchers in the West and demonstrating the

usefulness of these concepts to the study of HCI (Nardi, 1996a). In doing so she espoused the

potential benefits of Activity Theory as providing a common vocabulary for the HCI

community and expanding our horizon to think about useful rather than just usable systems.

- 118 -

Nardi (1996b) recognized that Activity Theory did not provide us with a readily available set of

methods and techniques to use. Instead, she argued that Activity Theory was a “powerful and

clarifying descriptive tool” (1996b, p. 7) and recommended four ways of embracing an Activity

Theory perspective in HCI design (1996c, p. 95):

1. Adopt a long-term research time frame in order to fully understand users’ objects.

Since activities are long-term formations whose objects are not instantaneously

transformed into outcomes (Kuutti, 1996) it is necessary to adopt a phased approach to

studying activities. Only this type of approach enables us to understand how objects

change over time and their relation to the objects of others.

2. Pay attention to broad patterns of activity rather than narrow episodes. A narrow

episode view is inherent to usability testing. Nardi (1996c) implies that it is necessary to

examine activities, rather than just goal-oriented actions in isolation.

3. Use a varied set of data collection techniques. The techniques Nardi (1996c) suggests

are based on collecting data directly from users (interviews, observations, video, etc.).

She warns against the reliance on a single technique or method.

4. Commit to understanding things from users’ points of view. Although this appears

to be a general principle of user-centred design, it is not actually explicated by Gould et

al (1985) as such. Rather than simply understanding who the users are, it is important to

view things from their perspective. This implies a more active role for the designer and

a better understanding of how users perceive an activity.

In the same book (1996a), Nardi compared the usefulness of Activity Theory to the study of

contextual factors in relation to Situated Action Models and Distributed Cognition. Her

conclusion was that, while all three approaches have merit, Activity Theory seemed to be the

richest framework for studying context:

“Aiming for a broader, deeper account of what people are up to as activity unfolds

over time and reaching for a way to incorporate subjective accounts of why people do

- 119 -

what they do and how prior knowledge shapes the experience of a given situation is

the more satisfying path in the long run.” (1996c, p. 94).

In the same volume, Kuutti (1996) proposes that Activity Theory can address the ubiquitous

fragmentation in the field of HCI and emphasizes three perspectives for doing so:

multilevelness, interaction context and development. Multilevelness affords HCI researchers and

practitioners the possibility of discussing issues belonging to different levels (activity, actions,

operations) within an integrated framework. The principles of Activity Theory also provide a

useful starting point for studying contextually embedded interactions. Finally, Activity Theory

is a framework capable of explaining the developmental and dynamic features of human

practices. For example, Kuutti (1996) proposes that the dynamics between actions and

operations should be used to show how new operations are formed and integrated into the

design of interfaces. While the recommendations by Nardi (1996c) and perspectives by Kuutti

(1996) are useful and provide a conceptual insight into how Activity Theory can be employed in

HCI practice, they are open to interpretation and fall short of offering a method of doing so.

Kaptelinin et al (1999) attempted to correct this.

2.12.3.3 Towards an Activity Theory Based HCI Method: The Activity Checklist

In an effort to operationalise Activity Theory, Kaptelinin et al (1999) proposed an analytical

tool shaped by the theory, simply called the “Activity Checklist”. The Activity Checklist is

intended for use in the early stages of system design or for evaluating existing systems and in

conjunction with other techniques (e.g. interviews). There are two versions of the Checklist: the

“design version” and the “evaluation version”. The two versions cover a series of contextual

factors which could potentially influence the use of the computer in a real-life setting, so that

designers can identify the most relevant issues that need to be addressed. The structure of the

Checklist reflects the basic principles of Activity Theory combined with four other principles

(p. 33):

- 120 -

1. Means and ends (Activity Theory principle: hierarchical structure of activity) – “the

extent to which the technology facilitates and constrains the attainment of users’ goals

and the impact of the technology on provoking or resolving conflicts between different

goals”;

2. Social and physical aspects of the environment (Activity Theory principle: object-

orientedness) – “integration of target technology with requirements, tools, resources,

and social rules of the environment”;

3. Learning, cognition and articulation (Activity Theory principle:

externalisation/internalisation) – “internal versus external components of activity and

support for their mutual transformations with target technology”; and

4. Development (Activity Theory principle: development) – developmental

transformation of the foregoing components as a whole.

The end result is a Checklist consisting of four sections (with sample questions to ask)

corresponding to the above four perspectives (the full Checklist is too large to replicate here so

the reader is referred to the original research). It is up to the designers to choose sections which

are relevant to a systems development project. Kaptelinin et al (1999) demonstrate the use of the

Checklist in the design of Apple Data Detectors. Although the Checklist appears to be a

practical and handy tool for designers, the authors do not establish a clear link between Activity

Theory principles and the Checklist, since the sample questions are too general.

2.12.3.4 Latest Activity Theory Developments in HCI: Crystallization and the AODM

Olav Bertelsen and Daisy Mwanza are two other Activity Theorists who have made important

contributions to HCI. Bertelsen (1996; 1997; 1998; 2000) was a PhD student of Bødker.

Together they edited a special edition of the Scandinavian Journal of Information Systems on

Activity Theory in 2000. In his research, Bertelsen has focused on the concept of design

artifacts (tools) as a unifying perspective for systems development, based on Wartofsky’s

- 121 -

(1979) primary, secondary and tertiary artifacts and Star’s (1989) notion of boundary objects.

Bertelsen (2000) views design artifacts as belonging to clusters of primary, secondary and

tertiary artifacts, each simultaneously mediating different elements of the systems development

process. This mediation takes place in a boundary zone “where heterogeneous praxes meet to

change a given praxis through the construction and introduction of new (computer) artifacts”

(1998, p. i). Using this notion of design artifacts, Bertelsen (2000) proposes a design-oriented

epistemology as a way of understanding how knowledge can be transferred or transformed from

research into practice and vice versa.

Mwanza’s (2001; 2002) research is also situated in the domain of HCI design methods. Her

PhD research resulted in the Activity-Oriented Design Method (AODM) to support

requirements capture. The AODM consists of four methodological tools developed from an

empirical analysis of the work practices of two organizations in the UK:

1. The Eight-Step-Model: This model consists of a series of open-ended questions related to

the elements of Engeström’s (1987) activity system. For example, “Who is involved in

carrying out this activity?” (Subject); “By what means are the subjects carrying out this

activity?” (Tools); “Who is responsible for what, when carrying out this activity and how

are the roles organized?” (Division of Labour), etc. (2001, p. 345). The data collected using

this model allows the researcher to gain a basic understanding of the situation.

2. The Activity Notation: The Activity Notation breaks down the information gathered using

the Eight-Step Model into smaller manageable units. Each of these units refers to a sub-

activity triangle from the main activity system. The Activity Notation is shown in Figure

2.30. Figure 2.30 shows that each combination in the Activity Notation consists of an Actor

(either the Subject(s) or the Community), a Mediator (one of the three mediating elements

in Engeström’s activity system), and the object of the activity

- 122 -

Figure 2.30 Activity Notation (Mwanza, 2001, p. 345)

3. A technique for Generating Research Questions: Questions are generated based on the

Activity Notation combinations and can be general in nature or specific to a particular

situation. For example, “What Tools do the Subjects use to achieve their Objective and

how?”; “What Rules affect the way the Subjects achieve their Objective and how?”, etc.

(2001, pp. 345-346).

4. A technique of Mapping AODM Operational Processes: A visual mapping of how the

various AODM techniques relate to each other.

The AODM represents the second attempt, following the Activity Checklist, to operationalise

Activity Theory. Although Mwanza validated the AODM in her PhD research, it has not been

formally applied in the HCI domain yet.

The previous discussion highlighted the application of Activity Theory to the study of HCI. It is

now appropriate to consider the key benefits that Activity Theory provides as a conceptual

theoretical framework in informing HCI and information systems development in general.

- 123 -

2.12.4 Benefits of Activity Theory

The benefits of Activity Theory to the study of HCI have been touted as numerous, diverse and

wide-ranging by all of the researchers whose work has been described in the previous sections.

However, Kuutti (1991) provides a useful summary of the four key reasons why Activity

Theory is a promising framework for Information Systems research in general. These reasons

are equally applicable to the HCI domain, and they are:

1. The ability of Activity Theory to integrate individual and collective perspectives in the

activity hierarchy, and the relationship between these hierarchical levels which causes

individual and social transformation.

2. Activity Theory places practical, human activities at the core of the research process.

Activities are carried out by real subjects using mediating tools to transform the object

of their work into an outcome. The information system is one such tool, reduced to a

secondary role: “We are never developing only information systems, but rather the

whole work activity in which they will be used” (p. 544). By focusing on the whole

work activity, an analysis of contextual factors, in which the information system exists,

is made possible.

3. Activity Theory is multidisciplinary. By focusing on the activity as the common focus

of study, various disciplines stand to gain significant benefits from each other.

4. Activity Theory offers an elaborate conceptual apparatus for studying developmental

processes. This apparatus has been formalized in the work of Engeström (1987; 1990).

Nardi (1996b) adds two other important benefits to this list. Activity Theory offers a common,

unifying vocabulary for describing activity that all HCI researchers can share. Judging by the

disorderly state of affairs in UEMs and UEM taxonomies described previously, such a

vocabulary is critical. The second benefit is more a statement about the “essential humanness of

Activity Theory”. Nardi (1996a) maintains that Activity Theory “does not forget the end user,

- 124 -

the ordinary person of no particular power who is expected to use the technology created by

others” (p. 252).

Activity Theory is a powerful descriptive tool (Nardi, 1996b), a magnifying glass that helps us

explain situations and understand issues in a new light. It does not offer a set of ready-made

techniques and methods for research, but provides a collection of integrated conceptual

principles and theoretical constructs which can be interpreted and concretized according to the

nature of the situation under study (Engeström, 1987). This flexibility and adaptability of

Activity Theory can perhaps be seen as one of its biggest limitations as well.

2.12.5 Activity Theory Limitations

Activity Theory was developed as a psychological theory of individual activity. The key

principles, as laid out by Vygotsky, have been interpreted by others and over time evolved into

our current conceptualisation of the framework. This conceptualisation remains abstract and

open to further interpretation. While it has been used widely in fields such as education

(Kaptelinin, 1996), it remains difficult to learn, and even more complex to operationalise in HCI

practice. Two attempts by Kaptelinin et al (1999) and Mwanza (2001; 2002) have not yet

proved to be fruitful (although it should be noted that Mwanza’s research is very recent and

therefore it is too early to judge its usefulness). Activity Theory requires researchers to immerse

themselves in the everyday practices of technology users and put on their “Activity Theory

glasses” to understand these practices. Its terminology can be daunting and confusing,

particularly for those encountering it for the first time (Nardi, 1996a). This necessitates training

in Activity Theory, a long-tem approach and a variety of research tools and techniques.

It appears that, despite more than a decade of research, HCI has not yet benefited directly from

Activity Theory in the form of a practical method or tool. However, this does not imply that

such an effort would be futile. Activity Theory principles can be used as a theoretical

- 125 -

framework to underpin a usability evaluation method, as this thesis will demonstrate. The

following section will highlight the principles of Activity Theory which are of direct relevance

to the development of this UEM, and therefore the thesis itself.

2.12.6 Activity Theory Principles Used in this Thesis

Activity Theory has been described in some detail in the preceding sections. It is clearly a

complex framework which requires an in-depth understanding of the underlying constructs and

principles and how they can be applied. Apart from the AODM (Mwanza, 2002), previous

Activity Theory research in HCI has not applied these constructs and principles in the form of a

method in the traditional sense. The AODM uses Activity Theory to gather requirements during

systems design.

The current research study proposes using Activity Theory at the other end of the design process

– evaluation. It aims to develop a UEM that will overcome the challenges identified in Section

2.10. The following Activity Theory principles will be utilized to do this. It should be noted,

however, that since Activity Theory tenets are closely intertwined, none of the principles can be

applied independently.

1. The central activity as the basic unit of analysis. [ATP-1]

The central activity in this case is defined as that activity for which the system being

evaluated is one of the mediating tools. The central activity is defined by the six elements of

the activity system as described by Engeström (1987) and the outcome of the activity. It

describes the use-situation or the context in which the user interface being evaluated exists.

2. The subject(s) internalisation of the central activity. [ATP-2]

The subjects (users) are described in relation to what they have internalised. Expert users

are deemed to have internalised more of the central activity, as well as the system itself.

- 126 -

3. The object of the central activity. [ATP-3]

The shared object (purpose) of the central activity which defines what the subjects (users)

are trying to achieve. It represents what a user is working on “through the interface”.

4. The mediating role of the system and other tools. [ATP-4]

The role of the system in supporting the central activity and the relationship between the

system and other tools mediating the central activity must be understood.

5. The social context of the central activity. [ATP-5]

The social context of the central activity is defined by the community, the rules that exist in

the community and the division of labour in the community. This principle focuses on all

the stakeholders involved in the central activity where the system is employed, rather than

just an individual user.

6. The hierarchical structure of the central activity. [ATP-6]

The hierarchical structure of the central activity as described by Leont’ev (1981), is

necessary to analyse the activity at all three levels and determine where breakdowns and

focus shifts (Bødker, 1996) occur. An analysis of the central activity structure will reveal

the subjects’ (users’) motives and goals, and the conditions of the activity.

7. The network of activities in which the central activity is situated. [ATP-7]

The central activity exists as one component of a larger network of activity systems with

which it interacts. This reveals the larger context of the users’ central activity.

- 127 -

8. Contradictions in the central activity and between the central activity and other

activities in the network. [ATP-8]

Primary, secondary, tertiary and quaternary contradictions that occur in the activity network

as defined by Engeström (1999) will be used in order to determine where problems lie. This

implies that the focus is not on usability problems at the system interface per se, but on

problems distributed within the entire activity which may be affected by the system.

9. The historical development of the central activity and the tools mediating the activity.

[ATP-9]

The historical development of the activity entails an understanding of how the activity has

developed and changed over time. In parallel, the chronological development of tools

mediating the activity over this timeframe and the dialectic relationship between the tools

and the activity needs to be considered. This implies the need for a historical analysis of the

central activity and the system being evaluated.

The Activity Theory principles described above will be used to inform the design of a UEM that

aims to overcome the UEM challenges identified in Section 2.10. These Activity Theory

principles will form the theoretical scaffolding of this UEM. A number of Activity Theory

principles will be applied to address each challenge when developing the UEM, and

subsequently incorporated into the UEM. A conceptual mapping of how this will be done is

shown in Table 2.1.

- 128 -

Table 2.1 Conceptual mapping of Activity Theory principles to UEM Challenges

LEGEND UEM Challenges Activity Theory Principles UEMs are system focused and technology driven

[UEMC-1]

The central activity as the basic unit of analysis.

[ATP-1]

Lack of understanding how users’ goals and motives are formed

[UEMC-2]

The subject(s) internalisation of the central activity.

[ATP-2]

Lack of user involvement in the design of the evaluation and analysis of the evaluation results

[UEMC-3]

The object of the central activity. [ATP-3]

Limited understanding of users’ knowledge

[UEMC-4]

The mediating role of the system and other tools.

[ATP-4]

Lack of frameworks for including contextual factors in an evaluation

[UEMC-5]

The social context of the central activity. [ATP-5]

Lack of understanding how the system and system use co-evolve over time

[UEMC-6]

The hierarchical structure of the central activity.

[ATP-6]

No common vocabulary for describing evaluation processes and defining evaluation outcomes

[UEMC-7]

The network of activities in which the central activity is situated.

[ATP-7]

Contradictions in the central activity and between the central activity and other activities in the network.

[ATP-8] Lack of a theoretical framework [UEMC-8]

The historical development of the central activity and the tools mediating the activity.

[ATP-9]

Mapping AT Principles to UEM Challenges

UEM Challenge Code Activity Theory Principle Code [UEMC-1] [ATP-1; ATP-4; ATP-7] [UEMC-2] [ATP-1; ATP-3; ATP-6] [UEMC-3] [ATP-1] [UEMC-4] [ATP-2; ATP-4; ATP-9] [UEMC-5] [ATP-1; ATP-5; ATP-7; ATP-8] [UEMC-6] [ATP-1; ATP-2; ATP-8; ATP-9] [UEMC-7] [ATP-1; ATP-6; ATP-8] [UEMC-8] [ATP-1 to ATP-9]

Table 2.1 shows, for example, that to overcome [UEMC-1] (UEMs are system focused and

technology driven), the following Activity Theory principles will be applied and incorporated

into the new UEM:

• [ATP-1]: The central activity as the basic unit of analysis.

• [ATP-4]: The mediating role of the system and other tools.

• [ATP-7]: The network of activities in which the central activity is situated.

- 129 -

This means that the UEM will utilise users’ activities, instead of the system, as the basic unit of

analysis [ATP-1], and subsequently, reduce the role of the system to that of a mediating tool in

those activities [ATP-4] and any other activities in the network [ATP-7].

2.13 Conclusion

This chapter set out to review the literature relevant to this thesis so that gaps in the research

area could be identified. Based on the review conducted above, several important conclusions

about the research area can be made. It was stated at the beginning of this chapter that a system

(or interface) could be understood in terms of its utility and usability. Utility was defined as the

usefulness of the system in completing a task, while usability was described as the ease of using

the system interface. It is now clear that usability evaluation is primarily directed at assessing

the latter without taking into consideration the former. As a result, it has been proposed that the

traditional definition of usability needs to be expanded to include the usefulness of the system.

In order to do so, the system must be examined in relation to the context in which it resides.

Distributed usability has been proposed as a means of expanding the definition of usability and

achieving this.

This chapter also established the central role of evaluation in the systems design and

development process, highlighting the importance of having holistic, robust and reliable UEMs.

However, the deficiencies of current UEMs were initially demonstrated by the lack of valid and

dependable taxonomies which evaluators might use to compare UEMs and to choose the most

appropriate one. Following this, the most widely used UEMs were described and their

limitations were emphasized. One of the key limitations lies in UEM outcomes. The ambiguity

surrounding the identification and validation of usability problems was shown in Section 2.9. It

was argued that the fragmented state of the UEM field was a direct result of the traditional view

of usability that UEMs were based on. Extending the notion of usability and identifying eight

UEM challenges that need to be overcome, were proposed as the first steps towards a more

- 130 -

holistic, robust and reliable UEM which can assess not just the usability, but also the usefulness

of systems.

Finally, Activity Theory was proposed as a theoretical framework which offers a set of

principles that can be used to overcome the UEM challenges and develop a new UEM. A

historical background of Activity Theory was provided and supported by the key concepts and a

list of its basic principles. This was followed by a discussion of how Activity Theory had been

applied in HCI research to date and what its benefits were. Based on this discussion, a list of

nine Activity Theory principles was derived to inform the development of the new UEM. These

principles were then mapped to the eight UEM challenges to indicate how and where they will

be utilised in the UEM development process. The following chapter will describe the research

methodology that will be used to develop the UEM.

- 131 -

Chapter 3

Research Methodology

3.1 Introduction

A review of the relevant literature highlighted a number of gaps in usability evaluation method

(UEM) research and practice that need to be addressed. The gaps were presented as a series of

eight UEM challenges to be resolved by developing a new UEM based on distributed usability.

Activity Theory, a well established theoretical framework, which finds its roots in Soviet

psychology and has been applied previously in HCI research, was proposed as a framework to

inform the development of this new UEM. The aim of this chapter is to describe the process or

research methodology that will be used to build and validate the Distributed Usability

Evaluation Method (DUEM).

A research methodology describes the steps that will be taken in order to achieve the research

goals. It consists of a combination of processes, methods and tools that are chosen by a

researcher to conduct a particular study (Nunamaker et al, 1991). The first step in defining a

research methodology, is to establish a research framework on which the methodology can be

based. This is important for several reasons. A suitable, widely- accepted research framework

provides a reliable structure for analysing existing knowledge and creating new knowledge.

Furthermore, a well developed and highly structured research framework also offers guidance in

the form of approaches, methods and/or tools that are most appropriate for the research being

undertaken. Therefore, a suitable research framework is critically important because it defines

the agenda, the methodology and the outcomes of the research. Finally, the research framework

will situate the study in the context of a particular research domain.

- 132 -

This chapter begins by explicating the research goals of the present study. This is followed by a

description of the research framework of March and Smith (1995) which has been selected for

the study. The framework is situated in the domain of design science and will be contrasted with

other well-known research frameworks to justify its use. The research framework will form the

basis for deriving a research methodology. The research methodology will be implemented in

two stages: the method building stage, and the method validation stage. Each stage will be

described in detail and a diagrammatic representation of the research methodology will be

shown in the conclusion of the chapter.

3.2 Research Goals

The research study has three distinct but closely linked goals. These goals are as follows:

1. To build a UEM based on distributed usability and informed by Activity Theory, that will

overcome the UEM challenges identified in the previous chapter.

2. To apply the UEM in practice.

3. To validate the UEM by assessing whether it overcomes the UEM challenges identified in

the previous chapter (Section 2.10).

Having explicated the research goals, the following section will describe the selection of a

suitable research framework that is consistent with the research goals.

3.3 HCI and Information Systems

Human-Computer Interaction (HCI) was defined in Section 2.2 as being “concerned with the

design, evaluation and implementation of interactive computing systems for human use and

with the study of major phenomena surrounding them” (ACM SIGCHI, 1992, p. 6). Therefore,

- 133 -

HCI is fundamental to all disciplines that are concerned with the research and design of

computer-based systems for people (Preece et al, 2002). The relationship of HCI to these other

disciplines is shown in Figure 3.1, which demonstrates the multi-disciplinary nature of HCI

research. Information Systems is one of the disciplines to which HCI is fundamental.

Information Systems is the study of “effective design, delivery, use and impact of Information

Technology (IT) in organisations and society” (Keen, 1987, p. 3). As such, Information Systems

development can be thought of as an artifact-producing activity because it produces a tangible

artifact – usually a computer-based information system. HCI is a discipline which informs this

activity. It offers a plethora of methods, techniques and tools that can be applied to the design

and development of information systems. For example, information system prototypes are

evaluated using UEMs to assess the extent to which a system’s actual performance conforms to

its desired performance (Long & Whitefield, 1986, cited in Whitefield at al, 1991), particularly

in relation to usability.

Figure 3.1 Relationship of HCI to other computer-related disciplines (adapted from Preece et al, 2002)

- 134 -

It has been stated previously that one of the aims of this research study is to develop a new

UEM. Although this type of research is situated primarily in HCI, it is equally relevant in

Information Systems because the UEM will be applied in the development of information

systems for human use. For that reason, it can be argued that an appropriate Information

Systems research framework (rather than a HCI research framework) is equally suitable as an

underlying structure for describing a research methodology in this thesis. A number of

Information Systems research frameworks exist. Some are general Information Systems

research frameworks (e.g. Nunamaker et al, 1991; Ives et al, 1980; March & Smith, 1995;

Nolan & Wetherbe, 1980; Gorry & Scott Morton, 1971), while others are specific to certain

methods or approaches such as interpretivism (Klein & Myers, 1999; Walsham, 1995),

positivism (Jarvenpaa et al, 1985; Jarvenpaa, 1988), ethnographic research (Myers, 1999), case

study research (Gable, 1994; Cavaye, 1996; Darke et al, 1998), action research (Baskerville &

Wood-Harper, 1996), surveys (Pinsonneault & Kraemer, 1993), etc. This thesis is concerned

with the development of a new method therefore it is necessary to select a research framework

that is suited to this purpose. A detailed review of the research frameworks outlined above

reveals that March and Smith’s (1995) framework based on the design science approach appears

to be ideal because it outlines a means for developing methods (amongst other things). It will be

described in the following section.

3.4 March and Smith’s Research Framework

In their article titled “Design and natural science research on information technology”, March

and Smith (1995) propose a two dimensional framework for research in information technology

(IT). The authors begin by defining IT as “technology used to acquire and process information

in support of human purposes” (p. 252). They go on to state that IT is instantiated in the form of

technology based systems. By doing so, the authors are suggesting that, even though technology

is ubiquitous in the industrialised world, it is only one part of a larger system of hardware,

software, procedures, data and people. Such a system is typically known as an Information

- 135 -

System (Stair & Reynolds, 2003). This would imply that March and Smith’s research

framework applies more broadly to information systems, rather than information technology

alone.

March and Smith (1995) begin with the premise that there are two types of research interest in

technology: descriptive and prescriptive. Descriptive research is a knowledge-producing activity

which aims to understand the nature of technology. The authors equate this type of research to

Hempel’s (1966) natural science approach. In contrast, prescriptive research is primarily

concerned with improving the performance of technology by using the knowledge produced by

descriptive research. The authors liken prescriptive research to Simon’s (1969) design science.

According to March and Smith (1995), these two interests have created a schism in Information

Systems because prescriptive research has been more successful than its descriptive knowledge-

producing counterpart, even though the latter is associated with ‘pure’ research. To reconcile the

conflicting points of view, the authors propose a two-dimensional research framework. The

following section will describe the framework’s theoretical background.

3.4.1 Theoretical Background

Natural science is concerned with the study of the physical world and its phenomena. It includes

research in physical, biological, social and behavioural domains and aims to understand the

reality in these domains (March & Smith, 1995). Kaplan (1964) views natural science as

comprising two key activities: discovery and justification. The discovery activity generates or

proposes scientific theories, laws and models about natural phenomena and reality, while the

justification activity is the highly structured process of testing these scientific constructs against

norms of truth or explanatory power. Traditionally, natural science research has been referred to

as basic research (Järvinen, 1999).

- 136 -

While basic research is concerned with understanding and describing reality, applied research

entails utilising knowledge gained through basic research to develop artifacts for human use so

that they can solve real-world problems (Järvinen, 1999). Applied research is, therefore,

associated with design because artifacts are designed and created by humans. Although ‘design’

is both a noun and a verb, and hence a product and a process (Walls et al, 1992), in the context

of this discussion, it is used to refer to the design process. The term ‘design science’ was coined

by Simon (1969) to describe research that specifically results in the creation and invention (as

opposed to the discovery) of artifacts or things that serve human purposes. Also known as

constructive research (Iivari, 1991), design science is technology-oriented and aims to develop

innovative means for humans to achieve their goals. Unlike natural science, the purpose of

design science is not to explain, but rather to prescribe and create artifacts that embody what is

prescribed. These artifacts are the outcomes of design science, built to perform specific tasks

(March & Smith, 1995). They are then assessed against the single criteria of utility or value to a

community of users. In other words, the key question that needs to be addressed is “Does it

work?”. Galliers and Land (1987) support this view by stating that the value of Information

Systems research can only be measured in terms of how well it improves practice when applied.

3.4.2 Design Science Artifacts

There are four types of artifacts produced through design science: constructs, models, methods

and instantiations. Constructs form the basic language of concepts and terms used to describe

problems within a given domain. For example, keystroking, pointing, homing and drawing are

constructs of the Keystroke Level model (described in Section 2.8.2.1). Constructs can be

combined into higher order artifacts called models. Models are propositions that express the

relationships between constructs. For example, the Keystroke Level model is a set of constructs

used to calculate task performance times for expert users. Models are simply descriptions or

representations of how things are. March and Smith (1995) see models as being more concerned

with utility, than truth, which implies that some representational inaccuracy is acceptable if the

- 137 -

model is useful. Methods are artifacts produced by design science that represent ways of

performing goal-directed activities. A method consists of a set of steps to perform a task. For

example, the usability testing method (described in Section 2.8.3.3) is a highly structured

method consisting of a set of sequential steps to perform a system evaluation. Methods can be

derived from particular constructs and/or models and, equally, the decision to use a certain

method implies certain constructs and/or models. According to March and Smith (1995), the

development of methods is the domain of design science. Finally, instantiations are constructs,

models and methods that are implemented as artifacts and used to actually perform tasks.

Instantiations are operationalisations or realisations of constructs, models and methods that

demonstrate their feasibility and effectiveness. However, it is also possible that an instantiation

can precede the development of constructs, models and methods which are then formalised

based on the instantiation. (The framework of Nunamaker et al (1991) is based on this premise.)

A diagrammatic representation of these four artifacts and their relationships as described above,

is shown in Figure 3.2. The figure indicates that constructs inform the creation of models and

methods by providing the underlying terminology, while instantiations serve as a ‘proof of

concept’ for the methods. These relationships are depicted with the thick, black arrows in Figure

3.2. The broken arrows indicate that the opposite is also possible when instantiations are built in

order to formalise methods, models and constructs.

- 138 -

Figure 3.2 Relationships between constructs, models, methods and instantiations (based on March & Smith, 1995)

According to March and Smith (1995) design science strives to develop constructs, models,

methods and instantiations that are innovative and useful. To do this, design science involves

two key activities: building and evaluation. Building entails constructing an artifact for a

specific purpose and, as a result, demonstrating that it is actually possible to construct the

artifact. Evaluation determines how well the artifact works when used in practice. Evaluation

involves developing appropriate criteria and assessing the performance of the artifact against

these criteria. In order to achieve progress in design science, existing artifacts must be replaced

with more effective ones. Therefore, the purpose of the evaluation activity is to establish

whether any progress has been made. March and Smith (1995) do not specify how the two

activities are undertaken because building and evaluation are complex in nature and not well

understood. This is due to the fact that the artifact and its performance are directly related to and

influenced by the environment in which it operates (ibid). The implication is that the context in

which an artifact is used must be taken into account when building and evaluating the artifact.

- 139 -

3.4.3 Two Dimensional Research Framework

March and Smith (1995) propose a research framework based on two dimensions: research

outputs and research activities. The former dimension is based on the outputs or artifacts

produced by design science (constructs, models, methods and instantiations), while the latter is

based on the broad research activities of design and natural science as described previously

(build, evaluate, theorise, justify). This framework is shown in Figure 3.3.

The four-by-four framework proposed by March and Smith (1995) as shown in Figure 3.3

creates sixteen cells which represent “viable research efforts” (p. 260). It is possible to build,

evaluate, theorise or justify theories about constructs, models, methods or instantiations, with

each cell having its own objectives and utilising its own methods to achieve these objectives. It

is also possible for a research effort to encompass multiple cells.

RESEARCH ACTIVITIES

Figure 3.3 Two-dimensional research framework (March & Smith, 1995)

Artifacts are built to perform a specific task. According to March and Smith (1995), research in

the Build activity column should be judged based on its utility or usefulness to a particular

community of users. The Build activity can involve creating a completely new artifact (a first in

a discipline) or re-building an existing artifact. If a new artifact is being developed, the research

- 140 -

contribution lies in the innovation or novelty of the artifact. If the building activity involves re-

developing an existing artifact, then the research contribution is found in the significance of the

improvements made, i.e. is the re-developed artifact more comprehensive and/or does it perform

better?

Once built, artifacts must be evaluated to determine whether progress has been made.

Evaluation requires the development of metrics or criteria based on which the performance of

the artifact can be assessed. These metrics or criteria are a reflection of what the artifact is trying

to accomplish. March and Smith (1995) suggest different sets of metrics for evaluating the four

different artifacts. These are shown in Table 3.1.

Table 3.1 Metrics or criteria used to evaluate different artifacts (March & Smith, 1995)

However, the authors do not provide an explanation of these metrics, nor do they specify how

they can actually be measured. Instead, the metrics are broad and reflect the purpose of the

evaluation process, which is to determine whether progress has been made. This implies that

different sets of evaluation metrics can be developed provided that they are able to demonstrate

research progress. Once the appropriate metrics are selected and developed, the artifact is

- 141 -

implemented in its environment and empirical work is undertaken to apply the metrics. This can

involve observing, surveying or interviewing a subject group using the artifact, or comparing

the artifact to other existing artifacts based on the selected metrics. Iivari (2003) suggests that

empirical evaluation can be postponed in favour of “ideational” evaluation which is sufficient to

demonstrate that the artifact includes novel ideas and addresses significant theoretical or

practical problems with existing artifacts. Regardless of the type of evaluation undertaken, the

aim is to determine “how well” the artifact works, rather than “how or why” it works. The latter

is the purpose of the theorising and justifying activities.

Once the artifact has been evaluated, the natural science activities of theorising and justifying

can be applied to determine how and why the artifact performed or failed in its environment.

Theorising results in explicating the attributes of the artifact, its internal workings, and its

interaction with the environment. For example, to explain why a particular system performs

well, Norman (1986) theorises that there is a “gulf” between the way humans conceptualise a

task and the way a system performs the task. If this gulf is large, then the system performance is

poor. A smaller gulf indicates that the system meets the users’ expectations and its performance

is closely aligned with the way people conceptualise the task. The theorising activity is followed

by justifying the proposed theory. This process may involve any number of research methods

using empirical data collection and analysis.

Based on the above discussion, it appears that March and Smith’s (1995) framework is ideally

suited to the development of a new method, which is the aim of this thesis. However, it is

necessary to justify the suitability of their framework in relation to its benefits and problems, as

well as other existing Information Systems research frameworks.

- 142 -

3.5 Research Framework Justification

March and Smith (1995) define a framework for developing a research methodology. It

explicates research activities and outputs that constitute legitimate research and offers a general

overview of how this research can be undertaken by describing what each activity involves. The

broad nature of the framework is indicative of its wide applicability in Information Systems

research because it encompasses a variety of research activities and outcomes. It permits

researchers a high degree of flexibility in selecting the most appropriate methods to carry out

research, while at the same time imposing constraints which ensure that the research is sound

and worthwhile. For example, a research effort in the Evaluate/Model cell will have its own

distinct set of objectives and methods for achieving them, however, at the same time, the

research must adhere to rigorous research standards by ensuring that a viable set of metrics is

developed and used to measure the effectiveness of the research effort.

The ability to select appropriate methods in March and Smith’s (1995) framework also reveals a

characteristic inherent to Information Systems research: the use of multiple methods or

methodological pluralism (Nunamaker et al, 1991; Robey, 1996; Landry & Banville, 1992;

Iivari, 1991). Not only is Information Systems a domain that is sufficiently broad to encompass

a wide range of research methods, but a single research methodology on its own is often

inadequate for such a diverse field of study (Nunamaker et al, 1991). A multimethodological

approach provides researchers with a set of complementary methods, with the view to

maximising the strengths of each one (Fitzgerald & Howcroft, 1998). The framework of March

and Smith (1995) reflects and facilitates such a multimethodological approach to Information

Systems research.

However, the flexibility and lack of prescriptive detail associated with March and Smith’s

(1995) framework have been the source of its strongest criticism by Järvinen (1999). Järvinen

(1999) claims that the research activities identified by March and Smith (1995) are ill-defined

- 143 -

and restricted mostly to the implementation process, ignoring the specifications and planning

that are an integral component of building and evaluating an artifact. Järvinen (1999) doesn’t

provide any evidence to support this claim, nor is it self-evident in March and Smith’s (1995)

article. In fact, Järvinen’s (1999) claim that the primary focus of March and Smith’s (1995)

framework is on instantiation artifacts is more applicable to another well-known general

Information Systems research framework by Nunamaker et al (1991). Nunamaker et al (1991)

place the development of a system at the hub of their multimethodological framework for

Information Systems research (as shown in Figure 3.4) because “the developed system serves

both as proof-of-concept for the fundamental research and provides an artifact that becomes the

focus of expanded and continuing research” (p. 92). Since the development of a new UEM does

not entail developing an actual system, Nunamaker et al’s (1991) framework was not considered

to be appropriate for the present study.

- 144 -

Figure 3.4 A multimethodological framework Information Systems research (Nunamaker et al, 1991, p. 94)

Despite its limitations, when compared to other research frameworks, March and Smith’s

(1995) framework is the most appropriate for this thesis for several reasons. The framework is

highly structured yet flexible enough to provide a methodological scaffolding for the thesis

which includes applying different research methods. The goals of this thesis require a multi-

method approach and other Information Systems frameworks are unsuitable because they are

specific to certain research methods (as described in Section 3.3). Furthermore, the framework

outlines a structure for developing a method, and evaluating its effectiveness in relation to its

- 145 -

practical utility. This is unlike other Information Systems frameworks which focus primarily on

tangible system artifacts or, in March and Smith’s (1995) terminology, instantiations.

In conclusion, March and Smith’s (1995) framework is consistent with the research goals

described in Section 3.2. It deals specifically with the building and evaluation of methods, and

supports a multi-method approach for this purpose. Therefore, it has been chosen as the

underlying research framework for this thesis. Having selected an appropriate research

framework and justified its use, the following sections will describe the research methodology

that will be applied to achieve the research goals.

3.6 Research Methodology

In the context of March and Smith’s (1995) framework, the research goals of this thesis are

concerned with building and evaluating a method. Therefore the research is situated in the

Build/Method and Validate/Method cells, as shown in Figure 3.5. To avoid any confusion that

may arise due to the use of the word evaluation in “Distributed Usability Evaluation Method”

and in the “Method Evaluation” stage of the research methodology, for the purposes of this

thesis the Evaluate/Method cell has been re-named Validate/Method.

RESEARCH ACTIVITIES

Build Validate Theorise Justify

Constructs

Models

Methods X X

RE

SEA

RC

H

OU

TPU

TS

Instantiations

Figure 3.5 Situating the thesis in March and Smith’s research framework

- 146 -

As stated previously, the objectives of each cell and the research methods used to achieve these

objectives need to be specified. By doing so, it is possible to define a highly structured research

methodology consisting of several stages to achieve the research goals. Each stage consists of a

series of logically sequenced steps and uses a number of different methods (this is consistent

with the multi-methodological nature of Information Systems and HCI research described

previously in Section 3.5).

Figure 3.5 implies the use of a two-stage research methodology. The first stage is concerned

with developing a UEM that is innovative and useful (the method building stage), while the

second stage will validate the new UEM to determine how well it works when applied in

practice (the method validation stage). The objectives and steps to achieve the objectives of

each stage will now be described in detail.

3.6.1 Stage I: Building the Distributed Usability Evaluation Method

According to March and Smith (1995) the Build activity results in the development of an

artifact that performs a specific task. Such an artifact can be completely new (a first in a

discipline) or a re-development of an existing artifact with the aim of improving it. In this

thesis, the artifact being built is a usability evaluation method (UEM). Since this UEM is based

on distributed usability, it has been named the Distributed Usability Evaluation Method

(DUEM). Although DUEM has been previously referred to at times as a new UEM to

distinguish it from other UEMs, it should be noted that it is not new in the sense that March and

Smith (1995) use the term, i.e. it is not the first UEM in the discipline. Chapter 2 described a

number of existing UEMs in the discipline of HCI. Therefore, in March and Smith’s (1995)

terms, DUEM represents the re-development of an existing UEM to improve it by overcoming

the challenges listed in Section 2.10.

- 147 -

Chapter 2 has highlighted the widespread use of usability testing as the de-facto standard UEM

in HCI. Despite a number of limitations, its popularity lies in the involvement of actual users in

the evaluation and the ability to manage the testing process in a controlled usability laboratory

environment. This is in direct contrast to other user-based UEMs, which do not afford the same

benefits (see Sections 2.8.3.1 and 2.8.3.2), and expert- and model- based UEMs, which do not

involve users directly and subsequently do not have as much value placed on their results

(Hartson et al, 2001). Observing, surveying and interviewing actual users about their interaction

with a particular system is of critical importance if accurate and bona fide primary data is to be

collected about the system’s usefulness and performance. Primary data is richer, more

meaningful and expressive than predicted data gathered from experts or derived through the use

of models such as the Keystroke Level model. Owing to the importance of involving users

directly in the evaluation process, DUEM represents a significant re-development of traditional

usability testing (as described in Section 2.8.3.3).

Therefore, the primary objective of Stage I is to re-develop traditional usability testing with the

aim of improving it by overcoming the UEM challenges identified in Section 2.10. This re-

development would result in the Distributed Usability Evaluation Method (DUEM).

March and Smith’s (1995) framework does not actually specify research methods that can be

used to build artifacts, such as UEMs. The authors state that the process is complex and not well

understood, thus articulating the difficulty of engaging in this type of research. There are few

prescriptive approaches for building a method in Information Systems or Human-Computer

Interaction (Kumar & Welke, 1992; Brinkkemper, 1996; Brinkkemper et al, 1999; Goldkuhl et

al, 1998). Of these, the most formal are the framework proposed by Brinkkemper (1996) called

‘method engineering’, and Goldkuhl et al’s (1998) Method Theory.

- 148 -

Method engineering is the engineering discipline for the design, construction and adaptation of

methods, tools and techniques for systems development. Brinkkemper (1996) argues that

method engineering provides a much needed “structure to take stock, generalize, and evaluate”

methods and tools used in systems development. One of the issues that the discipline of method

engineering deals with is the provision of construction principles for building system

development methods. These construction principles are manifested in the form of standardised

method building blocks called method fragments or method components. A method fragment is

a single coherent ‘piece’ of a method. There exist product and process fragments. Product

fragments represent the structures of the method products or deliverables (e.g. tables, diagrams,

specifications, etc.), while process fragments are models of the development process. Goldkuhl

et al (1998) formalise the structure of a method fragment (which they have termed a method

component) by describing it as a relationship between procedures, concepts and notations.

Procedures determine how to work or what questions to ask when applying a method, notations

prescribe how the answers to these questions should be documented (this is often referred to as

modelling techniques), and concepts describe what to talk about: processes, activities,

information, objects. Concepts are the “cement between procedure and notation; the overlapping

parts of procedure and notation” (p. 115). A diagram of this relationship is shown in Figure 3.6.

Figure 3.6 Structure of a method component (Goldkuhl et al, 1998)

- 149 -

A method is a composition of several method components (Lind, 2003). Together the method

components form a structure or framework that defines how the questions are related.

Furthermore, all methods are built on an explicit or implicit perspective or philosophy. The

perspective defines the conceptual and value basis of the method (Lind, 2003; Goldkuhl et al,

1998). It reveals what is important or what to focus on when asking the questions. Finally, it is

important to understand how different people interact and co-operate when applying the

method. This aspect has been termed co-operation forms and defines the division of labour

between the participants, i.e. who asks the questions and who answers the questions. The

relationships between method component, perspective, framework, and co-operation forms are

shown in Figure 3.7. Goldkuhl et al (1998) have termed this “Method Theory”.

Figure 3.7 Method Theory (Goldkuhl et al, 1998)

The development or creation of a method is the result of integrating different method

components. “The purpose of this kind of method integration is to create a new method that

- 150 -

goes beyond the original methods but where each of the original methods make a valuable

contribution to the new integrated method” (Goldkuhl et al, 1998, p. 116).

Since DUEM is not a new method, per se, but rather an improvement on existing usability

evaluation methods, and usability testing in particular, in order to build DUEM, traditional

usability testing method components will be revised based on distributed usability and Activity

Theory principles (as described in Section 2.12.6), and integrated. To achieve this, it is

necessary to first examine traditional usability testing in detail and decompose it into the

relevant method components so that a better understanding of these method components can be

developed. The perspective, framework and co-operation forms of the traditional usability

testing method will also be examined.

However, to gain an empirical in-depth understanding of the limitations of traditional usability

testing, it is necessary to apply the traditional usability testing method in an actual evaluation

project. By doing so it will be possible to examine each individual method component

comprehensively and determine the limitations or problems associated with it. Furthermore, it

will be possible to comment on the method component framework, the perspective or

philosophy that traditional usability testing is based on, and the co-operation forms. The results

of the application may add further limitations to the ones identified previously and listed in

Chapter 2 (Section 2.10) as UEM challenges.

Having arrived at a list of empirically derived limitations, the method components of traditional

usability testing will then be revised and re-developed. This will be done by integrating

distributed usability and the Activity Theory principles described in Section 2.12.6 into

traditional usability testing. A proposed conceptual framework for doing this has already been

presented in Chapter 2 (Table 2.1). The UEM challenges listed in Section 2.10 and the

empirically derived limitations represent weaknesses of traditional usability testing. Distributed

usability defined in Section 2.3 and the Activity Theory principles described in Section 2.12.6

- 151 -

will be integrated into traditional usability testing method components in an attempt to

overcome these weaknesses and improve traditional usability testing.

The integration of distributed usability and Activity Theory principles is a multifaceted and

complex process which will involve examining the method components of traditional usability

testing both individually and as a whole, and then making modifications to these components.

Each revised method component will consist of a set of procedures, notations and concepts, and

the method perspective will be informed by distributed usability and Activity Theory. Finally,

the revised method components will be integrated into the Distributed Usability Evaluation

Method (DUEM). The name chosen for the method being built reflects the extended notion of

usability as being distributed across an entire activity system.

Following the building of the method, DUEM will be comprehensively tested and validated in

practice during the Stage II of the research study, which is described in the following section.

3.6.2 Stage II: Validating the Distributed Usability Evaluation Method

According to March and Smith’s (1995) framework, evaluation of methods is undertaken in

order to determine whether progress has been made. The purpose of the validation stage is to

assess to what extent DUEM has improved the traditional usability testing method. March and

Smith (1995) propose that the evaluation of methods requires the development of metrics or

criteria based on which the performance of the method can be gauged. These metrics or criteria

should reflect what the method that has been built is trying to accomplish. A set of metrics

proposed by March and Smith (1995) for evaluating methods was shown in Table 3.1. However,

by their own admission, these metrics are not pre-defined, nor are they specific enough to apply

quantitatively. Rather, they are intended to be used as guidelines. Furthermore, the proposed

metrics are presented in absolute terms, implying that the method built can be assessed

independently and without reference to other existing methods. Since the purpose of the

- 152 -

evaluation stage is to determine whether an improvement has been made to usability testing, it

is not possible to use the proposed metrics because DUEM must be evaluated in relation to the

traditional usability testing method. However, the task of comparing UEMs has historically

been fraught with difficulties. A number of studies illustrating this and describing the problems

associated with comparing UEMs will be described in the next section.

3.6.2.1 Comparing Usability Evaluation Methods

The previous chapter (Section 2.7) highlighted some of the difficulties associated with creating

a universal taxonomy of UEMs and argued that, owing to these difficulties, the process of

comparing UEMs itself was complex and challenging. However, the importance of comparing

UEMs is widely recognised by researchers and practitioners alike because it is necessary to

establish the best evaluation method to use in a specific situation. Comparative studies of UEMs

have, therefore, been widely reported in the literature. A summary of most well-known studies

comparing UEMs is presented in Table 3.2.

Table 3.2 indicates the authors of each study, the UEMs compared, the criteria used to perform

the comparison and the major conclusions. It does not represent an exhaustive list or description

of comparative UEM studies, nor is it intended to. The purpose of summarising some of the key

studies in the area is to demonstrate the disparate approaches and comparison methods used in

this type of research which has implications for the validity and reliability of the claims made

by these studies.

- 153 -

Table 3.2 Summary of comparative UEM studies

Author(s) and Year UEMs Compared Comparison Criteria/Metrics

Claims/Findings/Conclusions

Jeffries, Miller, Wharton & Uyeda (1991)

• Heuristic evaluation • Cognitive walkthrough • Guidelines • User testing

• Total problems found • Severity analysis • Benefit/cost ratio

• Heuristic evaluation produced the best results (found most problems and serious problems, at lowest cost)

• Usability testing did a good job of finding serious problems, recurring problems, general problems and avoiding low priority problems

• Cognitive walkthrough is roughly comparable to guidelines

Karat, Campbell & Fiegel (1992)

• User testing • Individual walkthrough • Team walkthrough

• Total identified usability problems

• Problem Severity Classification ratings of usability problems

• Cost-effectiveness

• User testing identified largest number of problems and significant number of severe problems

• Team walkthroughs achieve better results than individual walkthroughs

• User testing and walkthroughs are complementary UEMs

Desurvire, Kondziela & Atwood (1992)

• Heuristic evaluation • Cognitive walkthrough

• Named problems • Problem Severity Code (PSC) • Problem Attitude Scale (PAS)

• Heuristic evaluation is better than cognitive walkthroughs at predicting specific problems

• Evaluators using heuristic evaluation are able to identify twice as many problems that caused task failure compared to evaluators using cognitive walkthroughs

Nielsen & Phillips (1993) • Cold heuristic estimates • Warm heuristic estimates • Hot heuristic estimates • GOMS • User testing

• Time estimates in seconds • Cost-benefit analysis

• User testing is the best method for arriving at estimates of user performance

• User testing is more expensive than cold heuristic estimates

• Performance estimates from heuristic estimation and GOMS analyses are highly variable

Virzi, Sorce & Herbert (1993) • Heuristic evaluation • Think Aloud Protocol • Performance testing

• Problem identification • Cost

• Nine common problems identified by all three • Heuristic evaluation identified largest number of

problems and was cheapest to use • Heuristic evaluation by non-experts is ineffective

Dutt, Johnson & Johnson (1994)

• Heuristic evaluation • Cognitive walkthrough

• Number, nature and severity of usability problems

• Time required • Ability to generate

requirements for re-design

• Methods are complementary and should be employed at different stages of the design process

• Heuristic evaluation found more problems • Heuristic evaluation takes less time • Both methods are equally difficult to apply

- 154 -

Table 3.2 Summary of comparative UEM studies

Author(s) and Year UEMs Compared Comparison Criteria/Metrics

Claims/Findings/Conclusions

Doubleday, Ryan, Springett & Sutcliffe (1997)

• Heuristic evaluation • User testing

• Number of problems identified • Comparative costs • Ease of problem fixing • Effectiveness of problem

fixing

• User testing indicates only symptoms, while heuristic evaluation identifies causes of usability problems

• Heuristic evaluation suggests solution to problems

John & Marks (1997) • Claims Analysis • Cognitive walkthrough • GOMS • Heuristic evaluation • User action notation • Reading specifications

• UEM effectiveness based on their predictive power

• Number of usability problems found that led to changes

• Slightly less than half of the usability problems were observed during usability tests

• User action notation is not a technique for finding usability problems

• Relatively few of the predicted problems resulted in changes

• All UEMs predicted some problems that were observed and some that were not

Sears (1997) • Heuristic walkthroughs • Heuristic evaluation • Cognitive walkthroughs

• Number of usability problems found

• Validity of UEM • Thoroughness of UEM • Reliability of UEM

• Heuristic evaluation is inexpensive and effective • Heuristic walkthroughs identify more intermediate and

minor problems than cognitive walkthroughs • Heuristic evaluations identify more intermediate and

minor problems than cognitive walkthroughs • Heuristic evaluations are less valid than heuristic

walkthroughs or cognitive walkthroughs • Cognitive walkthroughs are less thorough than heuristic

evaluations or heuristic walkthroughs • Heuristic evaluations and heuristic walkthroughs are more

reliable than cognitive walkthroughs

- 155 -

Table 3.2 illustrates how widely UEM comparison studies have varied in their approaches and

findings. A range of different comparison criteria has been used by researchers, generating

disparate and, often, contradictory results. Muller et al (1993) found that comparative studies

had internal and external inconsistencies. Internal inconsistencies were manifested in the use of

differing criteria, while external inconsistencies are evident in the disparity of the outcomes.

The implications of this are considerable because the validity and reliability of the comparative

studies are called into question, subsequently casting doubt on the basic assumptions upon

which UEM selection decisions are made by practitioners and researchers. Gray and Salzman

(1998) reviewed five comparative studies and found severe methodological breakdowns in the

research, suggesting the inappropriate use of UEMs based on misleading and flawed claims.

They examined the influence of five known threats to validity in the comparative studies,

namely threats to statistical conclusion, internal, construct, external and conclusion validity.

Statistical conclusion validity refers to the relationship between the independent variable and the

dependent variable. It determines whether there are real differences between the groups.

Internal validity is concerned with establishing whether these differences are causal or

correlational. Construct validity can be thought of in terms of two issues: the causal construct

(are the researchers manipulating what they claim to be manipulating?) and the effect construct

(are the researchers measuring what they claim to be measuring?). External validity refers to the

scope of the generalisability of the conclusions. A study lacks external validity if it makes

claims that are beyond the scope of the research. Finally, conclusion validity verifies if the

conclusions are consistent with the study findings. The results of Gray and Salzman’s (1998)

review are shown in Table 3.3. They indicate that all five comparative UEM studies suffer from

validity breakdowns which have led to erroneous claims and incorrectly generalised conclusions

about differences between UEMs. This raises serious concerns about the process and usefulness

of comparing UEMs.

- 156 -

Table 3.3 Problems by validity types across comparative UEM studies (Gray and Salzman, 1998)

The cause of such discrepancies can be attributed to the evolutionary nature of UEMs in general

and the lack of well-defined standard comparison criteria and measures. Most UEMs have

evolved over time through use and their definitions have changed. However, these changes have

occurred in an ad hoc and symbiotic manner, whereby the same UEM is applied in different

ways or several UEMs are used jointly to complement each other, therefore blurring the

boundaries between them. Furthermore, the system and the context in which a UEM is applied

will influence its effectiveness, therefore rendering comparative studies somewhat futile

because it is not possible to replicate the same circumstances for the purposes of objective

comparison.

The lack of widely-accepted and well-defined criteria is another reason why comparative UEM

studies have not produced useful results. Table 3.2 shows that various studies have employed a

number of different criteria ranging from the number of usability problems found, to more

complex measures such as the effectiveness of the UEM. Despite the use of the same

terminology, comparative studies have also defined the various criteria and measures in

different ways, further complicating the process of generalising and extracting any useful results

from the studies. Although Gray and Salzman (1998) advocate the use of multiple criteria

compared to a one-dimensional measure, the plethora of criteria being used by researchers has

introduced a high level of complexity into comparative research and created confusion. Hartson

- 157 -

et al (2001) attempted to rectify the situation and simplify our understanding of how UEMs can

be compared by introducing four quantifiable metric criteria: thoroughness, validity, reliability

and effectiveness. Thoroughness refers to the number of real usability problems found in

relation to the number of real problems that exist. This definition was extended by adding the

severity level of a usability problem, generating the following formula:

Validity is described as a measure of how well a method does what it intends to do. In UEM

terms, it indicates the proportion of usability problems found by a UEM that are real problems.

Hartson et al (2001) express this as the following formula:

Effectiveness is the simultaneous effect of thoroughness and validity, i.e. the product of the two

previous measures. The authors argue that this is important because neither of the two measures

is sufficient on its own because a UEM with a high thoroughness measure does not rule out

including problems that are not real and a UEM with a high validity measure still allows valid

problems to be missed. Finally, reliability is defined as the consistency of UEM results across

different evaluators. This has been described previously (in Section 2.8.3.3) as the evaluator

effect (Hertzum & Jacobsen, 2001). A reliable UEM will produce the same or highly similar

results regardless of who the evaluators are (experts or non-experts).

Although Hartson et al (2001) provide four quantifiable measures of UEM performance which

can be applied in comparative studies, the measures are fundamentally flawed because they are

based on the ill-defined concept of a usability problem. The troublesome notion of usability

problems has already been discussed in detail previously (Section 2.9) and will not be re-visited

Number of real problems found at severity level (s)

Number of real problems that exist at severity level (s) Thoroughness(s) =

Number of real problems found Number of issues identified as problems Validity =

- 158 -

here except to state that, in its current form, it cannot be used as a reliable underlying concept

for comparing and evaluating UEMs.

Based on the previous discussion, it is clear that the process of evaluating a UEM by comparing

it to other UEMs is complex and challenging. It is also questionable whether such an effort is

worthwhile, considering the evolutionary nature of UEMs, the number of different factors that

have to be taken into account, the lack of established comparative methods and criteria, and the

contradictory results that have emerged from existing research. However, although comparing

UEMs is proving to be a laborious and unsuccessful endeavour, for the purposes of this thesis, it

is still necessary to compare the Distributed Usability Evaluation Method (DUEM) to traditional

usability testing in order to determine whether it represents an improvement. Rather than

employing some of the comparison criteria described above, this will be done by evaluating

DUEM in relation to the challenges associated with existing UEMs, and traditional usability

testing in particular. This approach, which has been used previously by Mwanza (2002), will be

described and justified in the following section.

3.6.2.2 Validation of the Distributed Usability Evaluation Method

The previous chapter (Section 2.10) identified a set of eight challenges associated with existing

UEMs based on an extensive review of the literature. These challenges apply equally to the

most commonly used UEM: usability testing. To further verify that these challenges are real, it

has also been proposed that usability testing will be examined in detail in Stage I of the research

methodology (as described in Section 3.6.1) in order to identify the problems associated with

different method components that make up usability testing. These components will then be re-

developed from the perspective of distributed usability and Activity Theory principles, and

integrated into a single method. The result of this process will be an enhanced form of usability

testing termed the Distributed Usability Evaluation Method (DUEM). The key objective of

Stage II, the method validation stage, is to assess whether DUEM actually represents an

- 159 -

improvement over traditional usability testing. Since an improvement is an indication that the

challenges associated with usability testing have been overcome, the key question that needs to

be addressed is: Does DUEM overcome the eight challenges identified in Chapter 2?

Therefore, in order to validate DUEM, the method will first be applied in practice to evaluate a

system. The results of this application will allow us to establish whether DUEM overcomes the

eight UEM challenges identified in Section 2.10. A similar approach for validating a method

was employed previously by Mwanza (2002) in validating the Activity-Oriented Design Method

(AODM). To evaluate AODM, Mwanza (2002) made six claims about the method and provided

evidence from two case studies to support the claims. For the purposes of Stage II, a series of

claims about DUEM must also be derived. A description of how this has been done is provided

next.

The eight challenges identified in Section 2.10 are replicated in Table 3.4 below.

Table 3.4 Eight UEM Challenges (from Section 2.10)

[UEMC-1] UEMs are system focused and technology driven [UEMC-2] Lack of understanding how users’ goals and motives are formed [UEMC-3] Lack of user involvement in the design of the evaluation and analysis of the

evaluation results [UEMC-4] Limited understanding of users’ knowledge [UEMC-5] Lack of frameworks for including contextual factors in an evaluation [UEMC-6] Lack of understanding how the system and system use co-evolve over time [UEMC-7] No common vocabulary for describing evaluation processes and defining

evaluation outcomes [UEMC-8] Lack of a theoretical framework

Based on the eight UEM challenges, a corresponding set of eight claims about DUEM has been

posited, and is shown in Table 3.5. Each claim in Table 3.5 corresponds directly to a challenge

in Table 3.4. For example, claim [DUEM-1] DUEM is user focused and user activity driven,

maps directly to challenge [UEM-1] UEMs are system focused and technology driven, implying

that to overcome the challenge of UEMs being system focused and technology driven, DUEM

- 160 -

must be user focused and user activity driven. To validate DUEM, each of the eight claims will

be addressed and data collected from applying DUEM in practice will be sought as evidence to

either support or refute each claim.

Table 3.5 Eight claims about DUEM

[DUEM-1] DUEM is user focused and user activity driven [DUEM-2] DUEM provides a means of understanding users’ goals and motives [DUEM-3] DUEM involves users directly in the design of the evaluation and in the

analysis of the evaluation results [DUEM-4] DUEM provides a framework for understanding users’ knowledge [DUEM-5] DUEM provides a framework for including contextual factors in an evaluation [DUEM-6] DUEM provides a means of understanding how the system and system use co-

evolve over time [DUEM-7] DUEM offers a common vocabulary for describing evaluation processes and

defining evaluation outcomes [DUEM-8] DUEM is a theory informed method

Section 2.10 provides a description of the exact issues that need to be addressed under each of

the eight challenges in order to demonstrate that the challenge has been overcome. These issues

will also be used in the DUEM validation process in the form of questions that the evidence

from applying DUEM in practice must provide answers to. These answers will serve to support

or refute each of the eight claims. Table 3.6 shows the eight claims about DUEM that will be

used in validating this method and the corresponding questions that will be addressed for each

claim.

For example, to address the claim [DUEM-1] DUEM is user focused and user activity driven,

the following questions must be answered based on the data collected from applying DUEM in

practice:

• Is DUEM user driven, instead of system driven?

• Is DUEM focused on the usefulness of the system, and not on the system itself?

• Does DUEM provide a framework for analysing purposeful user activities that the system

supports?

- 161 -

Table 3.6 Eight claims about DUEM and corresponding questions for validating DUEM

CLAIM Questions to be addressed [DUEM-1] DUEM is user focused and user activity driven

• Is DUEM user driven, instead of system driven? • Is DUEM focused on the usefulness of the system, and not on the

system itself? • Does DUEM provide a framework for analysing purposeful user

activities that the system supports? [DUEM-2] DUEM provides a means of understanding users’ goals and motives

• Does DUEM reflect and incorporate the motives and goals of users into the evaluation?

• Does DUEM assess usability in relation to the users’ motives and goals to determine the usefulness of the system?

[DUEM-3] DUEM involves users directly in the design of the evaluation and in the analysis of the evaluation results

• Does DUEM place the user (not the system) at the centre of the evaluation process?

• Does DUEM involve users in the design of the evaluation process in a well-managed way?

• Does DUEM provide the means for users and evaluators to collaborate effectively?

[DUEM-4] DUEM provides a framework for understanding users’ knowledge

• Does DUEM take into account the users’ knowledge about the activity that the system being evaluated supports?

• Does DUEM provide a framework for capturing this knowledge? • Does DUEM assess the usefulness of the system in relation to the

users’ knowledge of the activity that the system supports? [DUEM-5] DUEM provides a framework for including contextual factors in an evaluation

• Does DUEM identify all the different system stakeholders and their activities?

• Does DUEM reflect the social nature of system use?

[DUEM-6] DUEM provides a means of understanding how the system and system use co-evolve over time

• Does DUEM explain how systems and user activities co-evolve over time?

• Does DUEM identify remnants of previous system use in current use-situations?

• Does DUEM evaluate the usefulness of a system in relation to ongoing activities over a prolonged period of time?

[DUEM-7] DUEM offers a common vocabulary for describing evaluation processes and defining evaluation outcomes

• Does DUEM use consistent terminology to describe the evaluation process and outcomes?

• Does DUEM provide an unambiguous definition of a usability problem?

• Does DUEM allow evaluators to clearly identify and rate usability problems?

[DUEM-8] DUEM is a theory informed method

• Is DUEM based on a theoretical framework that enables evaluators to explain and analyse how systems are used in real activities?

• Does DUEM allow evaluators to design evaluation processes that reflect actual system use?

DUEM purports to improve the traditional usability testing method by integrating distributed

usability and Activity Theory principles into traditional usability testing. By doing so, DUEM

- 162 -

attempts to overcome the UEM challenges listed in Section 2.10 and any empirically derived

limitations of traditional usability testing. Therefore, the validation of DUEM aims to address

the key question of “Does DUEM do what it purports to do?” (i.e. improve traditional usability

testing). Iivari (2003) refers to this type of validation as “ideational”. The purpose of

“ideational” validation is to demonstrate that a new artifact (in this case DUEM) includes novel

ideas and addresses significant theoretical or practical problems with an existing artifact (in this

case, traditional usability testing). To determine whether DUEM addresses the challenges and

limitations of traditional usability testing, the data collected from applying DUEM in practice

will be used to support or refute the claims made about DUEM in Table 3.6.

3.7 Conclusion

The objective of Chapter 3 was to describe a research methodology that will be used to achieve

the three research goals: to build the Distributed Usability Evaluation Method (DUEM), to

apply it in practice and to validate it. The first step in defining a research methodology is to

establish a suitable research framework on which the methodology can be based. March and

Smith’s (1995) two-dimensional research framework was selected for this purpose. The

framework (shown in Figure 3.3) is situated in the domain of design science or applied research,

which is concerned with developing various artifacts for human use to solve real-world

problems. These artifacts are then assessed against the criteria of utility or value to their

community of users. There are four types of artifacts produced through design science:

constructs, models, methods and implementations. The aim of this thesis is to develop an

evaluation method, which makes March and Smith’s (1995) research framework ideally suited

to defining the research methodology. The framework is structured yet flexible in that

researchers are able to select the methods and tools most appropriate to the research problem at

hand, which falls into one or more of sixteen research cells in the framework. This framework

supports a multi-method approach, which is often necessary in Information Systems research

(Nunamaker et al, 1991), and specifically in this case because of the multiple research goals.

- 163 -

Based on the research goals, the study was situated in two cells of March and Smith’s (1995)

framework: Build/Method and Validate/Method. This implied the use of a two-stage research

methodology. The first stage (Stage I) is the method building stage which utilises the notion of

method components (Brinkkemper, 1996; Goldkuhl et al, 1998). It was argued that DUEM does

not represent a new UEM, per se, but rather a re-development and improvement of the most

commonly used UEM: traditional usability testing. DUEM is based on distributed usability and

informed by Activity Theory principles (as described in Chapter 2). Therefore, to develop

DUEM, during Stage I, traditional usability testing will be decomposed into method

components and then applied in practice to empirically derive the limitations of traditional

usability testing. The framework, perspective and co-operation forms of traditional usability

testing will also be analysed. The method components will then be revised from the perspective

of distributed usability and Activity Theory principles, and integrated into DUEM. A conceptual

mapping for doing this has already been described in Chapter 2 (Table 2.1).

The second stage (Stage II) is the method validation stage. The objective of this stage is to

evaluate DUEM. Although March and Smith’s (1995) framework provides a set of criteria

against which to assess the improvement in a proposed method, these criteria are vaguely

defined and difficult to apply formally. Furthermore, they are absolute criteria which do not

enable a comparison with traditional usability testing. This comparison is necessary because it is

the only way of demonstrating that DUEM represents an improvement over traditional usability

testing. However, comparative studies of UEMs are fraught with problems (as described in

Section 3.6.2.1) because of inconsistent and poorly defined criteria and contradictory results.

Consequently, the decision was made to validate DUEM using the same approach employed by

Mwanza (2002): applying the method in practice and then validating it against a set of claims.

The claims were derived from the set of eight challenges described in Chapter 2 (Section 2.10).

Each of the claims will be addressed and supported or refuted by data collected from applying

DUEM in practice. To do this, a series of questions was generated for each claim. This approach

- 164 -

to validating DUEM represents “ideational” evaluation (Iivari, 2003) because it demonstrates

whether the limitations of traditional usability testing have been addressed, and, subsequently,

whether DUEM represents an improvement over traditional usability testing.

A diagrammatic representation of the research methodology described in this chapter is shown

in Figure 3.8. The two stages shown in Figure 3.8 represent the contents of the Build/Method

and Validate/Method research cells shown previously in Figure 3.5.

Figure 3.8 Research Methodology

o Distributed Usability o Activity Theory

principles

Informed by:

Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability testing

based on application 6. Revise and re-develop method components based on notion of

distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

STAGE I: Method Building

Objective: To validate (evaluate) the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Claims (Mwanza, 2002) Steps: 1. Apply DUEM in practice 2. Document application of DUEM 3. Address DUEM claims using evidence from application to

confirm or refute each claim

STAGE II: Method Validation

- 165 -

The thesis will now proceed as follows: Chapter 4 will describe the building of the Distributed

Usability Evaluation Method (i.e. Stage I of the research methodology), and Chapter 5 will

describe the validation of the Distributed Usability Evaluation Method (i.e. Stage II of the

research methodology).

- 166 -

Chapter 4

Stage I: Method Building

4.1 Introduction

Chapter 3 outlined the research methodology that will be followed in developing and validating

the Distributed Usability Evaluation Method (DUEM). The research methodology entails two

stages:

1. the method building stage, and

2. the method validation stage.

The purpose of this chapter is to describe Stage I – how DUEM was built. The structure of this

chapter reflects the steps of Stage I (Method Building) as shown in Figure 3.8 and replicated

below.

Figure 3.8 Research Methodology (replicated)

Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability testing

based on application 6. Revise and re-develop method components based on notion of

distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

STAGE I: Method Building

o Distributed Usability o Activity Theory

principles

Informed by:

- 167 -

Initially, an in-depth analysis of traditional usability testing will be presented, based on Steps 1

to 3 in Figure 3.8. The analysis will examine the method components that constitute traditional

usability testing, as well as the framework, perspective and co-operation forms as defined by

Method Theory (Goldkuhl et al, 1998). Following this “desk review” of traditional usability

testing, it will be applied in an actual evaluation project (Step 4 in Figure 3.8) to derive an

empirical list of limitations that need to be addressed in order to improve traditional usability

testing and develop DUEM as a result (Step 5 in Figure 3.8). The traditional usability testing

method components will then be revised, based on distributed usability and Activity Theory

principles, and integrated into DUEM based on the conceptual mapping shown in Table 2.1

(Steps 6 and 7 in Figure 3.8). The process carried out in Steps 6 and 7 is a multifaceted, non-

sequential one, which would be overly complex if described explicitly. Instead, the process will

be described implicitly, as part of a detailed description of DUEM itself. DUEM will be

described as a series of phases, and the manner in which distributed usability and Activity

Theory were integrated into each phase of DUEM will be explained, thus indicating how

DUEM was developed. This will be followed by a description of how DUEM overcomes the

limitations associated with traditional usability testing. DUEM will also be compared to

traditional usability testing to illustrate how the latter has been improved. The benefits of

DUEM will also be discussed. It should be noted that the benefits of DUEM described in this

chapter do not constitute actual or proven benefits, but potential benefits. The actual benefits of

DUEM will be discussed following its validation and application in practice in Chapter 5.

Finally, the potential limitations of DUEM will be presented (actual limitations will be

discussed in Chapter 5).

4.2 Traditional Usability Testing

As discussed previously (in Section 3.6.1), DUEM represents a re-development and

improvement of traditional usability testing. Before an improvement is to be made, it is

- 168 -

necessary to gain an in-depth understanding of traditional usability testing and its limitations so

that these can be overcome. To do this, Goldkuhl et al’s (1998) Method Theory will be used.

First, traditional usability testing (as described in Section 2.8.3.3) will be decomposed into a

series of method components (as described in Section 3.6.1). These components will then be

analysed to determine any initial limitations that become apparent following the decomposition.

The framework, perspective and co-operation forms of traditional usability testing will also

be discussed. Finally, traditional usability testing will be applied in an actual evaluation project.

The results of this application will be used to derive a list of specific limitations of traditional

usability testing. This will then form the basis for developing DUEM.

4.2.1 Traditional Usability Testing Method Components

Based on the description in Section 2.8.3.3, traditional usability testing can be logically

decomposed into five clearly delineated and sequentially ordered method components. Each

component is self-contained in that it represents a single, coherent activity that results in a

tangible outcome. The outcome from each activity feeds into the activity associated with the

subsequent method component. For example, the first method component “Define usability

testing plan” is a planning activity which involves making decisions about the testing

objectives, defining the users and the tasks, etc. The outcome of the first method component is a

usability testing plan. This plan forms the basis of the activity associated with the next method

component: “Select and recruit representative users”. Figure 4.1 shows the sequence of five

method components that make up the traditional usability testing process.

- 169 -

Figure 4.1 Sequence of five method components that make up traditional usability testing

As explained previously, a method component consists of concepts, procedures and the

notation. The concepts, procedures and notation for each method component of traditional

usability testing are defined and shown in Table 4.1. The table represents the evaluators’

perspective and indicates what is discussed at different stages of the usability testing process

Define usability testing plan

Select and recruit representative users

Usability testing plan

Recruited users

Prepare test materials

Test materials

Usability test

Usability test results

Analysis and reporting of results

Conclusions and recommendations

- 170 -

(i.e. the concepts), what questions are asked in relation to the concepts (i.e. the procedures) and

in what form the answers to these questions are recorded (i.e. the notation). For example, the

method component “Prepare test materials” entails several concepts, including participant

screening, orientation, privacy and consent issues, etc. The evaluators discuss these concepts in

preparing the documents and materials required to carry out the usability testing. The questions

that are raised during the preparation of the test materials are directly related to the concepts

discussed. For example, the concept of ‘participant privacy and consent’ is related to the

question of ‘how ethical issues in usability testing will be resolved’. The notation indicates how

the answers to the questions are recorded and expressed. Using the pervious example, the

answer to the question about ethical issues will be expressed in the form of an Information

Sheet prepared for the participants, indicating their rights.

- 171 -

Table 4.1 Traditional usability testing method components

Method Component

Concepts (What to talk about?)

Procedures (What questions to ask?)

Notation (How to express answers?)

Define usability testing plan

• Test purpose • List of objectives • Users’ profile • Test plan • List of users’ tasks • Test environment • Test equipment • Facilitator role • Evaluation measures

• What is the purpose of the usability testing? • What are the key objectives of the usability testing? • Who are the users of the system being tested? • How will the testing be carried out? • What tasks will the users perform to test the system? • Where will the test take place? • What equipment will be used to record the test? • What role will the facilitator have? • Which measures will be used to evaluate the usability of the system?

• Standard plan format with objectives, method, procedure and outcomes

Select and recruit representative users

• Target population • Users’ profile • Sample size • Subgroups • Recruitment strategy • Compensation

• Who are the target population? • What are the characteristics of the users (personal history, education

history, computer experience, system experience, occupation history)? • How many users are required to participate in the test? • How will these participants be divided into subgroups? • How will participants be recruited? • Will any compensation be provided and if so, what?

• Users’ profiles (descriptions)

• Mathematical models (to calculate sample size)

• Statistical analysis

Prepare test materials

• Participant screening • Participant orientation • Privacy and consent • Data to be collected

from participants • Participant tasks • Participant training

• How will the right users/participants be recruited? • What information will be provided to participants prior to the

commencement of the test? • How will ethical issues be resolved? • What data will be collected from the participants? • What tasks will the participants perform using the system? • Will the participants be trained in using the system and if so, how?

• Questionnaires • Interview questions • Scripts • Data collection instruments • Information sheets • Scenarios • Manual(s)

Usability test • Laboratory set-up • Data collection

• How will the laboratory be set up? • How will the data be collected and recorded?

• Notes and diagrams • Video and audio recording • Questionnaires

Analysis and reporting of results

• Usability problems • Recommendations

• What are the usability problems experienced by participants? • How severe are the identified usability problems? • What needs to be done to address the identified usability problems?

• List of usability problems and severity ratings

• List of recommendations

- 172 -

Decomposing traditional usability testing into method components (as shown in Table 4.1)

immediately reveals several limitations of usability testing. Although the users’ tasks are

considered, there is no mention in the concepts of users’ goals and motives. Goals and motives

are important because the former represents what the users’ are trying to accomplish and the

latter answers the subsequent question of why they are trying to accomplish it. If goals and

motives are not taken into account, it is not possible to evaluate the system in relation to what

are arguably the two most critical aspects of system use. Also, a closer inspection of the

notations used in traditional usability testing indicates that a range of diverse documents is

produced at different stages of the testing process. This is problematic because it reveals the

lack of notation continuity from one method component to another. This can result in

inconsistencies, errors and omissions because there is no single unifying notation model used

throughout the usability testing which would make it possible to link the method components.

Together, the method components, shown in Table 4.1, constitute a structure or framework in

Goldkuhl et al’s (1998) Method Theory. Such a framework defines how the concepts,

procedures and notations are related. Table 4.1 reveals that, in the case of traditional usability

testing, the framework is fragmented because there are no clear links between the different

method components that form the underlying method structure.

Furthermore, the concepts, procedures and notations are not related through a single unifying

perspective. The perspective in Goldkuhl et al’s (1998) definition describes the conceptual and

value basis of the method. Since traditional usability testing emerged from the classical

experimental methodology (Rubin, 1994), its conceptual basis can be found in the scientific

experiment. However, owing to the impracticalities associated with formal experiments (as

described in Section 2.8.3.1), usability testing was modified to enable evaluators to focus on

understanding real users’ needs and experiences (Good, 1989), rather than deriving causal

conclusions about variables. As a result of these modifications and due to the variety of ways in

- 173 -

which traditional usability testing has been applied in practice (see for example Sazegari, 1994;

Wiklund, 1994; Dayton et al, 1994, etc.), the experimental perspective, on which traditional

usability testing was originally based, was lost. In its current form, traditional usability testing

lacks a single coherent unifying perspective that reveals what is important or what the

evaluators need to focus on. The lack of perspective is also reflected in a fragmented

framework.

Finally, the method’s co-operation forms, which are defined as the division of labour between

the participants (i.e. who asks the questions and who answers the questions) were examined.

Table 4.1 indicates that the evaluators drive the entire usability testing process. There is no input

from the system users into how the evaluation is designed. The evaluators define the objectives,

users’ profiles and tasks of the evaluation based on the functions of the system and assumptions

about the system users. The users are test subjects whose role is limited to completing the set

tasks and providing feedback in a questionnaire or interview. Afterwards, the evaluators analyse

and interpret the test data collected without any input from the users. The evaluators ask the

questions, and they also answer the questions. Even when it is possible to access the actual

users, the usability testing process does not make any provisions in the form of strategies or

means to involve them in a direct way in the design and planning of the evaluation, or the

analysis of the results. Therefore, the answers to the questions shown in the procedures column

of Table 4.1 are primarily from the evaluators’ point of view and therefore potentially

inconsistent with the users’ needs and activities.

Table 4.1 is the first step in deconstructing and analysing traditional usability testing and

examining its limitations so that improvements can be made. The previous discussion

highlighted some of the limitations that are obvious from the table itself, and also commented

on the framework, perspective and co-operation forms of the traditional usability testing

method. A diagrammatic representation summarising the above analysis of traditional usability

testing based on Goldkuhl et al’s (1998) Method Theory is shown in Figure 4.2.

- 174 -

Figure 4.2 Initial analysis of traditional usability testing

The previous analysis represented Steps 1 to 3 of Stage I as shown in Figure 3.8. These three

steps are now complete (as shown in the replicated Figure 3.8 below).

STAGE I: Method Building Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method Theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability

testing based on application

6. Revise and re-develop method components based on notion of distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

Figure 3.8 Research Methodology (replicated)

Procedures Notation

Concepts

Method Component

As per Table 4.1

As per Table 4.1 & Lack of continuity between method

components

As per Table 4.1 & Lack of users’

goals and motives

Perspective

Framework Co-operation Forms

Originally based on classical experiments,

however in current form lacks perspective

Evaluators ask questions and

answer questions

Fragmented framework; no

clear link between components

o Distributed Usability

o Activity Theory principles

Informed by:

- 175 -

A desk review based on Goldkuhl et al’s (1998) Method Theory alone is insufficient to gain an

in-depth understanding of traditional usability testing and its limitations. Therefore, the fourth

step of Stage I involves applying traditional usability testing in practice in a real-life evaluation

project to collect empirical evidence of its limitations, so that these can be addressed and

improved later. The following section describes the evaluation project in which traditional

usability testing was applied.

4.2.2 Project Background

The traditional usability testing method was used in the formative evaluation of a University

World Wide Web (web) interface. The method was applied in the same sequence as described

in Section 2.8.3.3, to evaluate a high fidelity design prototype of the University’s web interface.

The evaluation formed part of a large-scale web site re-development project at a regional

Australian University in 2002 and 2003. The primary aim of the project was to improve the

existing web site by extending its functionality, reorganising the information, and enhancing the

aesthetic appeal of the site. The aim of the project reflects the twofold purpose of the web site

itself, which is to promote the University to prospective students and visitors, and to provide

information and services to current students and staff members. Prior to the re-development

project, the web site consisted of approximately 60,000 individual web pages providing the

users with information (e.g. information about the University, courses, faculties, research

activities, governance, etc.) and online functions (e.g. an online enrolment system for students, a

student management system for staff, access to e-mail and learning tools, etc.). The top-level

page of the University’s web site prior to the re-development is shown in Figure 4.3.

- 176 -

Figure 4.3 The University’s World Wide Web site prior to re-development

The proposed design of the same top-level page following the re-development is shown in

Figure 4.4. The proposed design in this figure is the high-fidelity prototype that was used to

carry out a formative evaluation.

Figure 4.4 The high-fidelity prototype used in the evaluation

- 177 -

Due to the considerable size of the web site and the variety of online functions available on the

web site, the evaluation of the new design was carried out progressively by testing only sections

of the proposed new design at a time. The sections were chosen on the basis of their intended

users. For example, those pages containing information and functions relevant only to current

students were tested first. This was followed by usability testing of those pages intended for

prospective students, etc. It is beyond the scope of this thesis to discuss the entire evaluation

process and results in detail. Instead, the evaluation of only one section of the new design will

be reported for the purpose of analysing traditional usability testing. The section chosen for this

purpose consists of the pages intended for current students of the University. The information

and functions contained in this section of the web site are shown in Figure 4.5.

Figure 4.5 The Current Students pages of the proposed new design

- 178 -

It is important to state here that although the usability testing reported in this thesis pertains to

the evaluation of an existing web site, it is not the web site itself that was the primary focus of

the research, but the method used to evaluate the web site (i.e. traditional usability testing). It is

not the aim of this research to assess the usability of the University’s web site, but to assess the

traditional usability testing method applied to the web site.

4.2.3 Usability Testing of Proposed New Design

The evaluation of the current students’ section was carried out using the traditional usability

testing method. The evaluation process was documented and will be described next. The

evaluation adhered to the sequence of usability testing steps described in Section 2.8.3.3 and

formalised in Figure 4.1 as a series of five method components. The subsequent sections will

follow the sequence of these five method components.

4.2.3.1 Method Component 1: “Define Usability Testing Plan”

The usability testing process begins with the development of a test plan (i.e. the “Define

usability testing plan” method component in Table 4.1). The test plan for the University’s web

site evaluation was developed by a Project Team consisting of two evaluators, a representative

of the web site owner, and three members of the development team (a programmer and two

designers). Due to privacy and copyright issues, the full test plan cannot be replicated in the

thesis, however a summarised version of the contents will now be discussed.

The purpose of the evaluation was to “ensure that the […] web site promotes a positive image

of the University, captures what is distinctive about the University and that the information

contained therein is accurate, current, relevant and easily accessible to all potential users of the

site.” (UoW Website Review Project Plan, 2002).

- 179 -

Based on the above purpose, a set of five specific objectives for the usability test was defined.

These include the following:

• To identify aspects of the proposed design which fail to meet the needs of the intended

users;

• To determine if any aspects of the proposed design limit or constrain the intended users’

interaction with the web site;

• To derive a set of usability problems with the proposed design;

• To find out how the intended users feel about the proposed design; and

• To highlight potential new uses or functions that can be added to the proposed design.

Profile of Users

The University’s web site is accessed by a diverse set of users, however, two general categories

of users exist: internal and external users. Internal users include current students and employees

of the University. External users include prospective students, graduates (alumni), research and

business partners, suppliers, casual visitors, individuals interested in using the University’s

facilities (e.g. the Function and Conference Centre, the Sports and Recreation Centre, etc.),

members of the local community, and the media. Due to the diversity of users, the web site was

split into sections relevant to each group of users listed above, and usability testing was

undertaken separately for each section. The usability testing reported in this thesis was carried

out on the current students’ section of the web site.

The current students of the University are generally categorised as undergraduate or

postgraduate, and domestic (Australian citizen or permanent resident) or international (non-

Australian) students. Postgraduate students are further classified as being coursework or

research students. Coursework students are enrolled in a Masters degree by coursework, while

research students undertake a Masters (Research) or Doctor of Philosophy (PhD) degree. Figure

4.6 shows the different categories of current students.

- 180 -

Postgraduate

Undergraduate

Coursework Research

Domestic

International

Figure 4.6 Categories of current students at the University

For the purposes of the evaluation, the Project Team decided to treat domestic and international

research students as a single category, based on their assumption that the academic needs of

these students were similar in nature and, consequently, undertaking two separate tests for

domestic research and international research students was thought to be redundant. Furthermore,

undergraduate and postgraduate coursework students were also treated as a single category

because the only distinct difference between these two categories was the coursework level.

While undergraduate students were enrolled in lower level courses, postgraduate students

undertook higher level courses. Academically, their needs were also deemed be similar.

Therefore, domestic undergraduate and postgraduate (coursework) students were merged into a

single category, and the same rule was applied to international undergraduate and postgraduate

(coursework) students. Following this classification process, a set of three categories of student

users emerged. Figure 4.7 shows these three actual categories of current student users that were

applied in the usability testing process.

- 181 -

Postgraduate

Undergraduate

Coursework Research

Domestic DOMESTIC STUDENTS

International INTERNATIONAL STUDENTS

RESEARCH STUDENTS

Figure 4.7 Actual categories of current student applied in the usability testing

Each of the three actual categories was labelled accordingly (as shown in Figure 4.7) and the

categories were later used as the basis for recruiting test participants and deriving task scenarios

for the usability test.

Following a series of meetings and discussions, a detailed description of the method used to

carry out the usability testing was prepared by the Project Team. The description of the method

was lengthy and complex, and included information about the tasks performed by the

participants; the test environment and equipment; the role of the facilitator; and the evaluation

measures. These aspects of the usability testing method will now be discussed because they are

relevant to the “Define usability test plan” method component. Other aspects such as the

recruitment of representative users will be discussed later as part of the “Select and recruit

representative users” method component.

Tasks Performed by Participants

The Project Team designed three sets of process-based tasks to be performed by the

participants, one set for each of the three categories of users. Each set of tasks was developed

- 182 -

according to the Project Team’s experience and understanding of students’ needs and activities.

All the tasks were thought to be representative of real activities that students would engage in

and each task was worded as an informal narrative description of the users’ activity (as per

Rosson & Carroll’s (2002) requirements for a task scenario). Rubin’s (1994) suggestions of

what to include in a task scenario were taken into consideration when developing the sets of

tasks, and Dumas and Redish’s (1993) guidelines were followed to ensure that the scenario was

unambiguous and written in the users’ language. None of the tasks indicated how it should be

carried out, only what needed to be achieved. A sample of the tasks that were developed for the

usability test is shown in Figure 4.8, while the complete set of tasks is listed in Appendix A.

Certain tasks were thought to be common to more than one category of students and were

included in more than one set. For example, Task 5 in the Domestic Student task set was also

included in the set of tasks developed for Research Students because it was considered to be

relevant to both categories of students. However, there was no overlap between the International

Student tasks and the other two sets of tasks. Most of the International Student tasks were

unique to foreign students and dealt with issues such as finding information about visa

requirements, the use of foreign translation dictionaries in exams, adjusting to the Australian

lifestyle, etc. Users were instructed to return to the University’s home page before attempting to

complete a task.

- 183 -

Domestic Student Tasks

Task 3 You were sick on Monday and missed a test in one of your subjects. You believe this is a legitimate reason for missing an assessment, so now that you are feeling better, you would like to apply for Special Consideration. Using the Policy Directory, find out what you need to do. Begin on the Uni’s home page. Task 4 Having just arrived at Uni for the first time, you are met with a long queue of cars outside the main gate. You pay the $3 parking fee, and then drive around for 40 minutes looking for a car park. You decide that this may not be the best way to get to Uni, after all, and promise yourself you’ll look into other transport options. You begin your search on the Uni’s home page. Task 5 You are saving up to go skiing in New Zealand in July 2004. Although that is eight months away, you would like to book early to avoid disappointment. However, you are unsure when the mid-year recess is on in 2004. You begin your search on the Uni’s home page.

International Student Tasks

Task 2 You are nearing the end of your degree, and realise that you have 2 more subjects to do the following session before you can graduate. However, your student visa runs out this session, and you need to extend it. You need to find out how this can be done. You begin your search on the Uni’s home page. Task 6 You have decided that on-campus accommodation at International House is not for you because you prefer to cook your own meals. So you would like to move to a sharing accommodation arrangement because you can’t afford to rent a house or unit by yourself. You want to find out what accommodation is available. You begin your search on the Uni’s home page.

Research Student Tasks

Task 2 You are having problems with your thesis supervisor. You feel that she is not providing you with an adequate level of supervision. You think that the Uni may have a formal procedure that must be followed in these cases to resolve the problem. You begin your search on the Uni’s home page. Task 4 You have almost finished writing your thesis and you would like to find out more information about submitting the thesis for examination. You begin your search on the Uni’s home page.

Figure 4.8 Sample tasks developed for usability testing

It should be emphasised that current students were not involved in developing these tasks at any

point. Instead, the Project Team initially decided which aspects of the web site needed to be

tested and then developed the tasks based on this decision. For example, a new web page

(“Getting to UoW”) had been designed to provide information about the different transport

- 184 -

options available, including train and bus transport, parking, etc. The page is shown in Figure

4.9.

Figure 4.9 Web page with information about transport options

The Project Team wished to test this page for ease of navigation and usefulness of content. As a

result, Task 4 in the set of tasks for Domestic Students was developed to evaluate the “Getting

to UoW” page. The Project Team believed that Task 4 was representative of a real activity that a

current student would undertake. However, there was no evidence provided by the team to

support this belief since current students were not consulted. Similarly, the Policy Directory was

a new feature available to students and staff at the University. All of the University’s policies

were stored in the Policy Directory for ease of access from a single location. The Project Team

wished to test the Policy Directory and, as a result, developed Task 3 for the Domestic Students.

Most of the other tasks were developed in the same manner. However, some tasks (such as

- 185 -

finding out what’s showing at the UniMovies and looking for information about submitting a

thesis) were developed based on the premise that they were typical activities that a student

would engage in.

Description of Test Environment and Equipment

It was decided to carry out the actual usability test in a controlled environment - a usability

laboratory. The layout of the usability laboratory used is shown in Figure 4.10.

Figure 4.10 Layout of the usability laboratory

Figure 4.10 resembles a classic usability laboratory layout with two rooms – the test room and

the control room. The test room contains a desk with a desktop computer and two video

cameras. Each camera provided a different angle of the participant so that his/her facial

expressions and hand movements during the use of the keyboard and mouse could be recorded.

Both cameras also captured any sounds and comments made by the participants. The computer

and video cameras were connected to recording equipment in the control room which included a

scan converter (to convert digital signals from the computer to analogue signals), a quad-box (to

record up to four synchronised multiple views from the computer and video cameras

simultaneously), a video recorder (VCR) and a television (TV). Due to physical and resource

constraints, it was not possible to install a one-way mirror between the test room and the control

Test Room

Control Room

Camera 1 Camera 2

- 186 -

room. However, the use and positioning of the two video cameras was deemed to be sufficient

for the purposes of observation. Figure 4.11 shows the equipment configuration in the usability

laboratory.

Figure 4.11 Equipment configuration in the usability laboratory

The Role of the Facilitator

One of the evaluators took on the role of the test monitor or facilitator. The role involved being

present in the test room during a usability testing session with a single participant and

facilitating the test by:

• providing information about the project to the participants;

• providing the participants with the tasks and the questionnaires which they were

required to complete;

• ensuring that the participants completed the required documentation (i.e. consent form

and questionnaires);

• prompting the participants to think aloud;

• providing technical assistance in case of an equipment problem; (This did not include

providing assistance to the participant in completing task scenarios.)

Camera 1

Camera 2

Computer

Scan converter

Quad-box

TV and VCR Camera 1

Computer screen Camera 2

Test Room Control Room

- 187 -

• conducting a brief interview with each participant at the end of a usability testing

session.

The facilitator was required to maintain a neutral attitude at all times and was not permitted to

give the participants any deliberate or inadvertent hints and tips on how to complete the tasks.

Finally, the facilitator was also requested to document a participant’s interaction with the

interface by noting the hyperlinks that a participant clicked on while completing the tasks.

Overview of Evaluation Measures

The Project Team decided to collect data from participants using four techniques: observation,

Think Aloud Protocol, questionnaires and an interview. The observation was carried out using

the equipment described in Section 4.2.3.6. The Think Aloud Protocol was employed in

conjunction with the observation by requesting participants to verbalize their thoughts while

completing the tasks. Two questionnaires were developed for the participants to complete. The

first was a pre-test questionnaire which was used to collect demographic data and build a profile

of the participants. The pre-test questionnaire included questions about the participant’s

background, computer and Internet experience, and previous usage of the University’s web site.

It is shown in Appendix B. The second questionnaire was a post-test questionnaire which was

used to collect subjective satisfaction data from each participant about the web site. The

questions in the post-test questionnaire were related to the participants’ interaction with the web

site. Each participant was required to rate twenty-six statements about the web site using a

Likert type rating scale. The statements were related to the perceived ease of use, usefulness and

helpfulness of the web site, as well as specific aspects of the site such as navigation, content,

organisation, language, etc. The post-test questionnaire can be found Appendix C. Finally, a

brief semi-structured interview was conducted with each participant at the end of a usability

testing session. The rationale for using the interview technique was to avoid participants having

to write free-hand comments on the post-test questionnaire in response to questions such as

- 188 -

“What did you like about the web site?” and “What do you think can be improved?”. By

soliciting responses to these questions in an interview, participants would be more likely to

respond and the facilitator could probe further into specific issues that a participant might raise.

Using these four techniques, both quantitative and qualitative data were collected. These are

shown in Table 4.2.

Table 4.2 Quantitative and qualitative data collected

Quantitative Measures

Technique Used to Collect

Measure

Qualitative Measures

Technique Used to Collect

Measure Time taken to complete a task

Observation Participants’ comments about the web site

Observation; Think Aloud

Protocol Number of hyperlinks used

Observation Participants’ satisfaction

Interview

Number of incorrect hyperlinks used

Observation

Number of participants who completed task successfully

Observation

Statement ratings Post-test questionnaire

Due to the small number of participants involved, a full statistical analysis of the quantitative

measures was not undertaken. Only simple descriptive statistical measures such as frequencies

and means were used.

4.2.3.2 Method Component 2: “Select and Recruit Representative Users”

The second method component of traditional usability testing involves selecting and recruiting

representative users. Since the web site being evaluated was the University’s own web interface,

the Project Team had direct access to representative users to participate in the usability testing.

- 189 -

However, before selecting the participants each target population was profiled in accordance

with Rubin’s (1994) generic users’ profile. A generic profile of the current student population is

shown in Table 4.3.

Table 4.3 Generic profile of current student population

User Characteristics

Description of Elements

Personal History

• Age ranges from 17 to 50+ (Students aged over 25 are classified as mature age students by the University.)

• Male and female students • Left and right handed students • Diverse learning styles (Students undertake degrees from nine faculties

ranging from Creative Arts to Science and Engineering) • Diverse attitudes towards the web site and technology in general

Education History

• Majority of students have completed the Higher School Certificate (HSC), however some students have trade qualifications

• Postgraduate students are assumed to have completed an undergraduate degree

Computer Experience

• Majority of students are assumed to have a basic general knowledge of computer use

• Frequency of computer use ranges from less than once per week to daily • Majority of students use a Personal Computer (PC), however some students

have experience in using Apple computers • Majority of students use the Windows operating system, however some

students have experience in using Macintosh, UNIX, Linux, and other operating systems

• Most interaction is GUI-based Web Site Experience

• Majority of students are assumed to have used the University’s web site at least once

• Frequency of web site usage ranges from less than once per week to daily • Types of tasks performed using web site include enrolment, access to e-mail,

access to online learning tools, searching library catalogue and general searching/browsing for information

Occupation History

• Majority of students are full-time students • No formal training in the use of the web site is offered by the University

Table 4.3 indicates the sheer diversity of the target population, which introduced an element of

complexity in selecting the number of participants for the usability testing. To resolve the issue,

the Project Team decided to use Nielsen and Landauer’s (1993) model which advocates a

sample or four to five participants as being sufficient to find 80% of usability problems. The

reasons for this decision were the lack of resources available for the project and the time

- 190 -

constraints imposed by the Chief Project Manager. A firm deadline for launching the web site

had been set by the University management and to meet this deadline, the usability testing was

limited to a period of four weeks. As a result, it was initially decided to recruit four to five

participants for each of the three categories of current students. One of the Project Team

members subsequently argued that since domestic undergraduate students constituted almost

70% of the total student population at the University, more than four or five participants needed

to be recruited in the domestic student category. A final decision was made to recruit eight to

ten domestic students for the usability testing. Due to the small sample size for each category of

current students, it was decided not to subdivide the sample into subgroups (e.g. by gender, age,

etc.) because a subdivision would not yield any value. The sample sizes were too small to

enable a reliable statistical analysis to be performed.

The recruitment of the participants was primarily done by putting up posters in student areas,

such as the Student Service Centre, the Food Court, the Library and noticeboards in various

faculties and departments. A sample recruitment poster can be found in Appendix D. A $20 gift

voucher from the University Bookshop was offered to participants as remuneration for taking

part in the usability testing. Despite the gift voucher incentive and direct access to the target

population, a total of only twenty-one students volunteered to participate. Of those, ten were

domestic students, three were international students and two were research students. The

remaining six students fell into the domestic student category, however since the maximum

sample size of ten had been achieved, they were declined. Due to time constraints, it was not

possible to extend the recruitment period to increase the number of international and research

student participants. The low rate of response from students was primarily attributed to the

timing of the usability testing which coincided with the end of semester. The end of semester is

traditionally a busy time for students with assessment tasks to be completed and preparation for

exams.

- 191 -

4.2.3.3 Method Component 3: “Prepare Test Materials”

The third component of traditional usability testing involves the preparation of the test

materials, including: a screening questionnaire, an information sheet for the participants, a

consent form, the data collection instruments (pre- and post-test questionnaires and interview

questions), the participant tasks and any training materials.

A screening questionnaire was not deemed to be required for the University web site evaluation

because of having direct access to representative users. Also, the recruitment posters used

clearly indicated the requirements for taking part in the testing. However, a screening process

was implemented to ensure that no more than the required number of participants was recruited.

This was necessary because sixteen domestic students volunteered, but a maximum of ten was

required.

An information sheet describing the project and the participants’ rights in accordance with

standard University ethics guidelines was developed. It is shown in Appendix E. The

information sheet was provided to each participant at the start of a usability testing session to

read. The participant was then given the opportunity to ask questions about the project before

signing the consent form. The consent form (shown in Appendix F) consisted of a statement that

the participant had to sign, indicating that he/she had read the information sheet and understood

his/her rights. The participant signed the second part of the consent form after the usability

testing session was complete to indicate that he/she had received the $20 gift voucher.

The pre-test and post-test questionnaires and the sets of task scenarios for each category of

participants were prepared and formatted professionally. As mentioned previously, these

documents are available in Appendices A, B and C. Finally, since the participants were current

students of the University who had previously used the University’s web site, it was not

- 192 -

necessary to provide training prior to completing the tasks. It was assumed that each participant

had the prescribed minimum level of expertise to use a web interface.

4.2.3.4 Method Component 4: “Usability Test”

The fourth method component of traditional usability testing involves carrying out the actual

usability testing sessions with each individual participant. In the University web site evaluation

project, pilot tests were carried out with two students to eliminate any “bugs” in the process and

the test materials. As a result of the pilot tests minor adjustments were made to the test

materials. The actual sessions were scheduled on weekdays and took place in the usability

laboratory. Each session lasted approximately 45 to 60 minutes and adhered to the following

format:

1. Facilitator provides information sheet to participant

2. Participant reads information sheet (5 minutes)

3. Facilitator asks if participant has any questions (3 minutes)

(a) If yes, facilitator responds to questions then asks participant

to sign consent form

(b) If no, facilitator asks participant to sign consent form

4. Facilitator provides pre-test questionnaire to participant

5. Participant completes pre-test questionnaire (5 minutes)

6. Facilitator provides set of tasks to participant, explains what is

required and asks participant to think aloud

(2 minutes)

7. Participant completes set of tasks (20 to 30 minutes)

(a) Facilitator documents participant’s interaction with system

(b) Facilitator prompts participant to think aloud if necessary

8. Facilitator provides post-test questionnaire to participant

9. Participant completes post-test questionnaire (10 minutes)

- 193 -

10. Facilitator briefly interviews participant (5 minutes)

There were no significant deviations between the individual usability testing sessions. On

several occasions, the participants failed to arrive at the scheduled time and it was necessary to

re-schedule the session. Each usability testing session was recorded on a videotape for the final

step – the analysis and reporting of results.

4.2.3.5 Method Component 5: “Analysis and Reporting of Results”

As part of the last method component (“Analysis and reporting of results”), the data collected

during the usability testing was analysed in several ways. First, the data collected through pre-

test questionnaires was collated to create a profile of the participants. A complete profile of the

participants is shown in Appendix G, while a summarised version is provided in Table 4.4.

Table 4.4 Profile of usability testing participants

User Category

Number of Participants

Summary Profile of Participants

Domestic Students

10 Seven females and three males. Six of the participants were under 25 and the majority (nine) had completed a Higher School Certificate. Enrolled in six different faculties. All of the participants owned a computer and the majority had used computers for six years or more on a daily basis for e-mail, web browsing and word processing. The majority of participants used the web 5 to 6 times per week and classified themselves as competent users of the Internet. All of the participants had used the University’s web site before and the majority used it 3 to 4 times per week on average.

International Students

3 One female and two males. Two of the participants were over 25 and all of them had completed an undergraduate degree. All of the participants were enrolled in Master (Coursework) degrees and owned a computer. All had used a computer for more than three years on a daily basis for e-mail, web browsing, word processing and searching databases. All of the participants used the web daily and classified themselves as either average or competent Internet users. All of the participants had used the University’s web site before and use it on a daily basis.

- 194 -

User Category

Number of Participants

Summary Profile of Participants

Research Students

2 One female and one male, both aged between 21 and 25 years. One participant enrolled in a Masters (Research) program and the other in a PhD. Both had used a computer for more than three years on a daily basis for e-mail, web browsing, word processing, games, searching databases, programming and specialised software. Both participants used the Internet daily and classified themselves as either competent or expert web users. Both participants had used the University’s web site before and use it on a daily basis.

The data recorded on the videotapes was transcribed and used to determine the quantitative

measures, including the time taken to complete a task, the number of hyperlinks used and the

number of incorrect hyperlinks used. The number of participants who completed each task

successfully was also determined. Appendix H shows the quantitative data collected for each

task and each participant, while Table 4.5 provides a summary of the quantitative data collected

for each task.

Table 4.5 Summary of quantitative data collected for each task

DOMESTIC STUDENTS Quantitative Measures T1 T2 T3 T4 T5 T6 T7

Average time taken to complete task (min:sec) 2:44 3:55 2:35 2:17 1:31 1:32 1:42 Average number of hyperlinks used 9 10 6 8 6 4 4 Average number of incorrect hyperlinks used 4 6 4 4 3 2 1 Successful completion (out of 10 participants) 8 3 5 8 8 9 8

INTERNATIONAL STUDENTS Quantitative Measures T1 T2 T3 T4 T5 T6

Average time taken to complete task (min:sec) 3:07 3:44 4:14 5:52 3:26 2:39 Average number of hyperlinks used 6 10 9 10 8 5 Average number of incorrect hyperlinks used 3 6 6 6 5 3 Successful completion (out of 3 participants) 1 1 1 1 2 2

RESEARCH STUDENTS Quantitative Measures T1 T2 T3 T4 T5 T6 T7

Average time taken to complete task (min:sec) 2:46 0:40 0:25 0:24 2:49 0:34 1:05 Average number of hyperlinks used 13 5 4 3 11 2 5 Average number of incorrect hyperlinks used 6 0 0 0 3 0 1 Successful completion (out of 2 participants) 2 2 2 2 2 2 2

The video and audio data recorded was also examined to identify issues that caused usability

problems at the interface. Specific elements of the web site were noted where it was evident that

they caused difficulties for the participants (e.g. when a participant exhibited frustration or

- 195 -

commented that he/she was unsure which hyperlink to select, when the web site failed to

respond in the way in which the participant anticipated etc.). These elements were noted in the

form of a list of usability problems which were then ranked in order of severity using Nielsen’s

(1993) ratings. The data collected from the post-test questionnaires was also used to find

evidence of usability problems with the web site. The median of each statement rating was

determined to give the Project Team an insight into the participants’ satisfaction with web site.

The results of this are shown in Appendix I. Finally, the participants’ responses to the interview

questions were also taken into account in determining the usability problems. If a participant

emphasised a particular aspect of his/her interaction with the web site as being negative, this

was treated as a usability problem. A full list of the usability problems derived and their severity

ratings (based on Nielsen, 1993) is shown in Appendix J.

It should be remembered that the purpose of the carrying out the usability testing described

above was to compile a list of limitations of traditional usability testing. Consequently, the

actual results of the usability testing (i.e. usability problems with the University web site) are

not considered to be immediately relevant because it is not the web interface that is being

evaluated in this thesis, but the process by which that interface was tested. The actual results of

the usability testing will be reported and referred to in subsequent sections only to the extent

that they are relevant to the discussion of the limitations associated with traditional usability

testing.

The previous two sections (4.2.2 and 4.2.3) represented Step 4 of Stage I of the research

methodology shown in Figure 3.8. This step is now complete (as shown in the replicated Figure

3.8). The next step of Stage 1 involves determining the limitations of traditional usability testing

based on the results of its application in evaluating the University web site described above.

- 196 -

STAGE I: Method Building

Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method Theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability

testing based on application

6. Revise and re-develop method components based on notion of distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

Figure 3.8 Research Methodology (replicated)

4.2.4 Empirical Limitations of Traditional Usability Testing

The fourth step of Stage I of the research methodology (as shown in Figure 3.8) involved

applying traditional usability testing in practice in a real-life evaluation project to collect

empirical evidence of its limitations, so that these can be addressed and improved. This section

will now proceed with Step 5 of Stage I by deriving the limitations of traditional usability

testing based on evidence collected from applying it. This evidence was collected by observing

the usability testing process and noting inconsistencies and issues that arose during this process.

The limitations can be grouped into three broad categories: user-related, process-related and

facilitator-related.

4.2.4.1 User-Related Limitations

User involvement has been identified by a number of authors as being a key factor in system

acceptance and the cornerstone of user-centred design (Preece et al, 1994; Norman, 1986;

o Distributed Usability

o Activity Theory principles

Informed by:

- 197 -

Bannon, 1991; Karat, 1997; Robert, 1997; Singh, 2001; Fisher, 2001). However, traditional

usability testing fails to place the user at the centre of the evaluation process, despite being

considered a user-centred method. From the outset of the evaluation, the users and their needs

and activities are assumed to be secondary to the system being tested. During the design of the

University web site evaluation, there was no attempt to involve users in order to discuss their

needs and how they would use the web site being tested. This became apparent in the previously

described evaluation when the Project Team made the decision to treat different categories of

students as a single group, without first determining whether this was a valid assumption. For

example, domestic and international research students were merged into a single “Research

Students” category for the purposes of simplifying the usability testing, despite obvious

differences in matters relating to cultural and linguistic differences, as well as differences in fees

and legal entitlements between Australian and non-Australian students. The categories shown in

Figure 4.7 created further complications for the Project Team when it became apparent during

the evaluation that the scenarios developed for the “International Students” category were also

appropriate for the international (non-Australian) “Research Students”. By resolving to group a

diverse student body into only three categories of users, the Project Team sought to simplify the

evaluation and conserve resources. However, by doing so they actually introduced a major flaw

into the evaluation process at the very outset.

Limitation 1: Failure to involve users in the design of the evaluation.

This flaw was exacerbated further by a failure to take into account users’ goals and motives,

since no users were interviewed or surveyed in designing the evaluation. Instead, the goals were

pre-determined by the evaluators and prescribed to the users in the task scenarios. There was no

attempt to verify whether the goals specified in the task scenarios were actual goals which were

directly relevant to students. One participant commented that the scenarios were “not things I

would ever do”. Furthermore, the students’ actual motives for doing a particular task or activity

- 198 -

were not examined in any detail. The motives for doing a task can be diverse and intrinsically or

extrinsically driven. For example, a student’s motives for finding a part-time job may be to earn

extra income, gain work experience or save money for buying a car. It is important to

understand students’ goals and motives because they will affect the way in which students use

the University’s web site.

The Project Team derived three sets of tasks for the participants that were based on the

functions of web site being tested. For example, a new online Policy Directory had been

developed and the Project Team was keen to test it (Task 3 for Domestic Students).

Subsequently, one of the tasks required participants to use the Policy Directory to find out how

they could apply for Special Consideration. Students were confused by this task because they

were unfamiliar with the term “Policy Directory”. To find out how to apply for Special

Consideration, students often used other means which are available through the online student

enrolment system. Instead of assessing whether the web site assisted students with finding out

information about the Special Consideration process itself, the task was focused on the Policy

Directory itself. The majority of the Domestic Students commented that they would not

normally use the Policy Directory and if the scenario had not instructed them to use it, they

would have used alternative means.

This particular task also highlighted another important user-related issue that was not taken into

account: the users’ language. The use of official University terminology such as “Special

Consideration” in the scenario was in direct contrast to how users themselves would describe

the situation. During the evaluation, participants questioned the facili1tator about the term

“Special Consideration” and offered their interpretation of the situation, using language and

terms more familiar to students, such as: “I missed the test because I was sick last week and I

want to do a supplementary”. By asking users to use the Policy Directory to find how to apply

for Special Consideration, the scenario description failed to indicate the users’ direct goal (to do

a supplementary test) and motives (to pass the subject; to satisfy assessment requirements), and

- 199 -

also neglected to use language associated with those goals and motives (“supplementary test”

and “pass the subject” rather than “Special Consideration” and “Policy Directory”). The use of

language inconsistent with the users’ own prompted a number of users to ask the facilitator for

clarifications (“What do you mean by this?”). This in turn affected the outcome of the

evaluation, an issue that will be discussed later (in Section 4.2.4.3 – Facilitator-related

limitations).

Limitation 2: Failure to incorporate users’ goals and motives into the design of the evaluation.

Limitation 3: Task scenarios are based on system functions, not actual user needs and activities.

Limitation 4: Failure to use language and terminology familiar to typical users in the task scenarios.

The failure to incorporate actual users’ goals and motives into the design of the evaluation was

manifested during the evaluation in other ways as well. While completing the tasks provided,

users were observed to be uninterested and indifferent. This is to be expected since the task

scenarios did not accurately reflect their goals and motives. Since the Project Team did not

involve users in the design of the evaluation and derived the task scenarios based on the web

site functionality, the tasks were not meaningful to the participants. Also, the participants were

offered compensation for taking part in the evaluation. As a result, the participants’ motives for

completing the tasks were artificial and extrinsic (driven by the compensation). Rather than

being genuinely motivated to complete the task because it mirrored their own real motives, the

participants’ motives were unnatural and focused on pleasing or satisfying the facilitator and

receiving the gift voucher. Upon completing a task, one participant stated: “That looks like what

you want”, while another said “Is that what you are looking for?’.

- 200 -

The implications of these statements are significant because they reveal the true nature of the

users’ interest in the evaluation and also highlight that the users’ actual motives were extrinsic

(to receive a $20 gift voucher ; to please or satisfy the facilitator; to be a ‘good’ participant; to

perform well and be perceived as being helpful by the facilitator; to finish the usability testing

session as quickly as possible) rather than intrinsic (to achieve a goal meaningful to them using

the system), which is the case in real life. This had a direct outcome on the results of the

usability evaluation because users’ simply searched for a satisficing or “good enough” solution,

rather than an optimal solution in completing the tasks. For example, when searching for

information about references for an essay (Task 2 for Domestic Students) some of the

participants clicked on the Library link followed by the Help and Training link. The participants

indicated that they would explore this web page further to look for information about how to

write references, even though a specific link about this topic was not available on the web page

(shown in Figure 4.12).

Figure 4.12 Library web page used by participants to find referencing information

Detailed information about referencing styles was actually available on the UniLearning pages

of the web site (shown in Figure 4.13).

- 201 -

Figure 4.13 UniLearning web page containing referencing information

Since they were not motivated to find information about referencing using the web site in the

first place, participants simply opted for the first feasible solution that they thought would

satisfy the facilitator. In reality, this would not be the case because the users would be driven by

a more powerful intrinsic motive (e.g. to achieve a high mark in the essay) and would more than

likely use a different tool to find the necessary information (e.g. a referencing handbook).

Another instance of indifference and a satisficing solution was documented in a session with an

International Student participant who was completing a task that required her to find

information about getting part-time work (Task 1 for International Students). The participant

started at the University’s home page which contained a link called “Job Vacancies”. Despite

encountering information about job vacancies at the University only (i.e. teaching and

administrative positions at the University), the participant stated that she would look at the

positions available and apply if there was something suitable, without attempting to search for

any other job opportunities available specifically for students.

- 202 -

Limitation 5: Indifference from users in completing task scenarios.

Limitation 6: Users only interested in a satisficing solution.

All of the above limitations stem from the lack of direct user involvement in the design of the

evaluation. Although six distinct user-related limitations have been extracted in the above

discussion, it should be noted that they are all interwoven and characterised by a single

attribute: lack of user involvement. This limitation was documented in Section 4.2.1 as well in

describing the co-operation forms of the method. Lack of user involvement in designing the

evaluation is perhaps the most critical limitation of the usability testing process, and one that

underlies some of the second category of limitations – process-related limitations as well. These

are described in the next section.

4.2.4.2 Process-Related Limitations

A number of limitations related to the traditional usability testing process were identified. These

were associated primarily with the task scenarios and the pre- and post-questionnaires filled out

by participants. As mentioned previously, each category of users was provided with a set of

distinct scenarios describing tasks they had to complete. No time frame was specified for

completing a task, however, those participants who had not completed a task within 10 minutes

stated that they were unable to complete the task and asked if they could move on. Once again,

this highlighted the unnaturalness of the testing process because this situation would not occur

in real life. If an individual were unable to accomplish a goal or complete a task using one

means, he/she would not simply give up, but search for an alternate means of doing so, if the

task was meaningful to him/her.

- 203 -

During the evaluation, participants were asked to only use the University’s web site to complete

a task and were discouraged from using other web sites or sources of information. This was

done to ensure that only the University’s web site was tested. However, in doing so, the actual

means that users would use to accomplish a goal were ignored and disregarded. For example,

when completing the task which required International Student participants to find information

about non-university accommodation (Task 6 for International Students), some participants

stated that they would not use the University web site to do this. Instead, they would search for

available properties on the web site of a local real estate agent or pick up a real estate brochure

available at the University’s student centre. However, the facilitator had to insist that the

participants attempt to complete the task using the University’s web site because that particular

section had been re-developed and needed to be tested. Once again, the issue of artificial

motives (to please the facilitator) arose, affecting the reliability and validity of the results.

Furthermore, by completely disregarding the means that International Students actually use to

find accommodation outside the University, an important design solution is neglected. Since the

students clearly prefer an alternative web site, this represents important feedback to the

designers of the University’s web site who could use this information to create a design that

would satisfy the users. For example, the designers could examine the web sites of local real

estate agents to find out how they are structured and organised and then create the University’s

accommodation web site to mirror this structure, thus providing the users with a familiar

interface. The designers may also re-consider the feasibility of having a non-university

accommodation section of the web site if students do not appear to be interested in using it.

Instead, they could simply provide links to local real estate agents’ web sites. Constraining users

by stipulating that only the system being evaluated can be used to complete a task once again

shows a disregard for actual users’ needs, motives and activities and robs designers of valuable

insights into possible design solutions.

- 204 -

Limitation 7: Users constrained by being required to use only the system under evaluation.

Limitation 8: Lack of understanding of alternative means (tools) for achieving goals.

Each set of scenarios used in the evaluation consisted of distinctly different and discrete tasks.

These tasks were not related in any way and participants were required to complete each one

within a short period of time, before moving on to the next one. The tasks did not resemble

actual user activities, which are usually undertaken over extended periods of time. For example,

one of the Domestic Student scenarios required participants to find out where they could get

assistance with studying and completing assessment tasks (Task 7 for Domestic Students). The

scenario was aimed at a particular section of the web site – the learning and development centre

at the University. Most participants were confused upon encountering this scenario and stated

that they would speak to the lecturer if they needed help. The facilitator provided a more

detailed explanation of the task and asked participants to assume that the lecturer was

unavailable in this instance. Instead, the participants would have to find another form of

assistance. While some participants completed the task successfully and found the learning and

development centre section of the web site, the majority of participants navigated to the Library

section of the web site, stating that they would go to the Library or e-mail a librarian for help.

This scenario highlights the inadequacy of asking participants in usability testing to complete

discrete, short-term tasks which bear no resemblance to actual activities undertaken by users in

real life. In the previously described scenario, participants reacted negatively to the task because

it did not match their actual experience of getting assistance with assignments. In reality,

students may require assistance on different occasions and about related issues that cannot be

addressed in a one-off, discrete manner, as the scenario suggests.

Limitation 9: Scenarios used in usability testing are discrete and short-term tasks.

- 205 -

The use of discrete and short-term tasks in usability testing also draws attention to the lack of

understanding about how user activities develop over time. The scenarios used in the evaluation

did not contain any reference to the historical development of the tasks, assuming instead that

the described situation exists only in the present form. This assumption is problematic because it

is incongruent with the notion of previous experiences that users have which affect their

understanding of the current situation. An example of this occurred in an International Student

scenario (Task 4 for International Students). The University had recently introduced a policy

about the use of foreign translation dictionaries in exams. The new policy required students to

complete a form, based on which, a card permitting the use of a dictionary would be issued. The

students would then have to bring the card along with their dictionary to the exam. In the past,

however, students were not required to obtain a card. Instead, the lecturer would specify on the

front of the exam paper that foreign language dictionaries were permitted and those students

needing one could simply bring it to the exam. While the new policy was widely circulated in

student newsletters, some students still appeared to be unaware of the new requirements. One of

the International Student scenarios asked students to find out what they needed to do to use a

foreign language dictionary in the exam. Several students were confused by the task because

they weren’t aware that they had to do anything, saying: “The lecturer will specify this on the

front of the exam paper. I can just bring the dictionary with me”. The scenario failed to state

that the policy had changed and there were new rules about using dictionaries. If historical

information had been provided explaining the development of the new policy, the participants’

previous experience would not have been at odds with the task.

Limitation 10: Task scenarios do not provide historical information (i.e. how an activity changes over time).

The wording used in the scenarios was found to be another problem. The limitation of using

language that was not familiar to users has already been mentioned previously in Section

4.2.4.1, however it was also observed that users’ performance in completing the tasks was

- 206 -

influenced by the terminology used in the scenarios. For example, a number of participants used

the search facility available on the web site. The search terms they entered were based on the

wording in the scenarios. Most users typed “Special Consideration” into the search text box

even though they were unfamiliar with the term. Using an official University term such as

“Special Consideration” is more likely to result in a successful outcome of the search than if the

participants had entered a term that they were familiar with (e.g. “supplementary test”). This

affects the validity of the usability testing results because, by using the term “Special

Consideration” users are able to complete the task successfully. However, in a real-life situation

it is expected that users will have more problems achieving their goal if they use their own term:

“supplementary test”. The differences between the two search results are shown in Figures 4.14

and 4.15.

Figure 4.14 shows the search results for “Special Consideration” which returns relevant and

useful results. However, Figure 4.15, which shows the search results for “supplementary test”,

does not return any useful information to students. By directing the students’ behaviour through

the scenarios and providing official University terminology such as “Special Consideration” in

the scenario, students are able to obtain useful information. However, in real life students are

more likely to use the term “supplementary test” and obtain irrelevant search results. Therefore,

as a result of the usability testing it would appear that the search engine is effective. This is

misleading and incongruent with the actual performance of the search engine in real life using

actual terminology favoured by students.

- 207 -

Figure 4.14 Search results using the term: “Special Consideration”

Figure 4.15 Search results using the term: “supplementary test”

- 208 -

The terminology and wording used in task scenarios has a direct impact on the users’ behaviour

and performance, as well as the results of the usability testing.

Limitation 11: The wording of scenarios affects users’ performance and the results of the usability test.

The perspective from which the scenarios were written was also a source of confusion for

participants. Although the recruitment of participants ensured that the most representative users

completed each set of scenarios, some participants were unsure about whose viewpoint the

scenario represented: their own or of an imaginary person. This problem was further

exacerbated by the fact that a single participant could relate to some of the tasks and perform

them as if they were his/her own, while other tasks were not as meaningful and the participant

would pretend he/she was the person to whom the task was meaningful. This was evident in the

comments made by participants, such as: “If I was this person, I would…”, “This person

should…”, which were in contrast to statements starting with “I would…”. One participant

asked “Am I domestic or international?”. This confusion over the viewpoint of a scenario arises

largely due to the fact that while some scenarios are meaningful and users can relate to them or

‘own’ them because they have experienced a similar situation previously, others describe

strange and unfamiliar circumstances that users have not encountered in the past. This affects

how a usability testing participant perceives the scenario and whether he/she identifies or feels a

sense of ownership over the situation described in the scenario. Where this sense of ownership

is higher, it can be expected that the scenario is more representative of actual user needs and

activities.

Limitation 12: Confusion over scenario viewpoints affects the extent to which users identify with a scenario.

- 209 -

Additional process-related limitations can be found in the use of pre-test and post-test

questionnaires completed by participants. The pre-test questionnaire is designed to collect data

about the participants: their background, computer experience and knowledge of the system

being evaluated. This data can then be used to create a general profile of the participants, and

correlated to their performance on the task scenarios. For example, a correlation may be found

between the average number of hours a participant spends using the system and the number of

tasks he/she completes successfully. However, this type of analysis is usually not possible with

the small number of users that is recommended in the literature (Nielsen & Landauer, 1993).

The pre-test questionnaire is only useful for the purpose of compiling a profile of the

participants. It does not necessarily indicate anything about the actual user population. In the

University web site evaluation is was not possible to draw conclusions about almost 10,000

domestic students based on a sample size of ten.

The post-test questionnaire revealed two specific process-related limitations. The first limitation

arose as participants attempted to complete the questionnaire. The majority of participants posed

the following question to the facilitator: “Should I complete this questionnaire based on my

experiences using the web site today or based on my previous use of the web site?”. Since the

web site being evaluated was an updated version of the existing University web site, most of the

participants were already familiar with some of the information and functions available on the

new web site. The facilitator requested that participants complete the questionnaire based on

their performance on the task scenarios only. However, it is clearly not possible to make a

distinct separation between using the web site to complete a set of discrete and short-term tasks

during usability testing, and prior usage which is based on actual and real needs and goals. This

observation was made by the Project Team to explain the large discrepancy between the

participants’ recorded performance and their responses on the post-test questionnaire. Although

the video recording analysis showed that the majority of participants had difficulties completing

the tasks described in the scenarios (refer to Appendix H and Table 4.5), the data collected from

the post-test questionnaire indicated that participants were generally satisfied with the web site

- 210 -

(refer to Appendix I). Another possible explanation for this discrepancy is that participants did

not wish to appear negative about the web site because it could be considered impolite or

because they wanted to maintain their self-image as “good testers”. However, from the

evaluators’ point of view, the discrepancy brings into question the validity and usefulness of the

post-test questionnaire data.

The second limitation in relation to the post-test questionnaire arose because participants had to

complete the questionnaire based on their overall experience using the web site rather than with

each individual task scenario. Some participants expressed their concern about this because they

felt that it was not possible to make a single statement about an aspect of the web site. For

example, when asked to rate the navigation statement: “I find it easy to locate the information I

need on this web site” on a Likert scale of 1 (Strongly Disagree) to 5 (Strongly Agree), some

participants declared that it was easy to find the information for some of the tasks, but more

difficult for others. As a result the participants mostly rated their satisfaction as 3 (Neutral),

unless their satisfaction was heavily skewed towards a positive experience if they completed

most of the tasks successfully, or a negative one if they failed to complete most of the tasks.

The limitations related to the use of the pre-test and post-test questionnaire in usability testing

have raised important questions about the reliability, validity and usefulness of this data.

Clearly, the use of questionnaires in traditional usability testing needs to be re-considered. If the

data collected from the questionnaires yields limited value to evaluators and designers, it may

be more beneficial to discontinue the practice of using the questionnaires and spend more time

talking to users instead.

Limitation 13: Due to a small sample size it is not possible to perform statistical analyses.

Limitation 14: Pre-test questionnaires do not indicate if participants are representative of the user population.

- 211 -

Limitation 15: Post-test questionnaire data is contradictory to user performance.

Limitation 16: Prior use of a system (if one exists) affects the users’ perceptions of the system being tested, and subsequently the post-test questionnaire results.

Another process-related limitation arose during the interpretation of the quantitative data

collected by observing users (a summary is shown in Table 4.5, while all of the measures for

each individual participant can be found in Appendix H). Although the measures collected

yielded some value to the Project Team, it was not possible to interpret them separately from the

users’ context. For example, the participants were situated in an isolated usability laboratory, so

these measures were not deemed to be an accurate representation of how users would actually

perform in the real world where other contextual factors such as social influences exist.

Furthermore, the quantitative data collected was based on the participants’ performance on a set

of tasks developed by the Project Team, with no input from the users. If these tasks were not

representative of actual user goals and activities, the data collected would be of little value.

Finally the measures are too low-level to be considered useful in establishing how well the web

site supports user needs. For example, the time taken by a participant to complete the task is not

a completely reliable indicator of usability or usefulness because other factors may affect this

measure and cannot be taken into account (e.g. the users’ emotional state at the time). The

number of hyperlinks used to get to the required information were also counted as part of the

quantitative measures. Since there may be more than one “hyperlink path” users can take to find

the information they require, it is not possible to make comparisons or draw specific

conclusions about the web site’s navigation simply based on this measure. Nielsen (2004)

recently confirmed the problems associated with these types of quantitative measures, stating

that “Number fetishism leads usability studies astray by focusing on statistical analyses that are

often false, biased, misleading, or overly narrow”.

- 212 -

Limitation 17: Quantitative measures collected during usability testing may be of limited value.

As mentioned previously, the outcome of usability testing is a set of usability problems. The

difficulties associated with defining usability problems have already been discussed (in Section

2.9) and will not be re-visited in detail again. However, it needs to be noted that owing to the

difficulties, a lengthy list of usability problems was derived following the completion of the web

site usability evaluation. Every obstacle or difficulty that a participant experienced was

documented as a usability problem. There was an attempt by the Project Team to categorise the

problems into the following groups: navigation/link problems, content/information update

problems, screen layout problems and language problems. However, it became obvious to the

Project Team that this was not feasible because some problems could be categorised into several

groups. For example, the web site was designed to contain all of the information relevant to

students behind the “Current Students” link on the home page. However, very few participants

used this link to complete the task scenarios. This was initially identified as a language problem

because the official University terminology (“Current Students”) did not match the way students

actually viewed themselves (just “Students”). Therefore, the link was not meaningful to the

students. It also became apparent that the labelling of the link was not the only problem. When

analysing the video recording of the participants’ performance, the Project Team observed that

participants opted for links that represented their goals, rather than their role at the University as

current students. When asked to find the mid-session recess (Task 6 for Research Students),

participants would click on the “Teaching and Learning” link or type “recess” in the search text

box because these actions were aligned with what they perceived to be the goal of the task.

Subsequently, the Project Team categorised this as a navigation/link problem. One of the

Project Team members suggested that the location of the “Current Students” link on the home

page was not optimal and therefore affected whether the participants actually noticed that the

link was available. He believed that this was a screen layout problem and if the link had been in

a more prominent location, participants would have been more likely to use it. Subsequently, a

- 213 -

decision was made to simply list the usability problems and rate their severity using Nielsen’s

(1993) scale. The severity rating then determined the order of priority in which the usability

problems would be fixed.

Also, traditional usability testing does not involve users in the analysis of the evaluation results,

and subsequently in defining usability problems. The extent of the users’ input in traditional

usability testing is limited to completing the task scenarios, filling out the questionnaires and

possibly answering some questions at the end of the evaluation. The users do not have access to

the complete set of data collected and they are not involved in analysing this data. By involving

users in this process, it may be possible to benefit from an insight into the users’ perspective

about the usability problems and how they can be resolved.

Limitation 18: No direct means of defining and categorising what constitutes a usability problem.

Limitation 19: Failure to involve users in the analysis of the evaluation results.

The final process-related limitation is linked to the environment in which the usability testing

was carried out – a formal usability laboratory. Despite the absence of a one way mirror

between the test room and the control room, and attempts by the Project Team to furnish the test

room to resemble an office with shelves and books, participants were noticeably uncomfortable

with being in an unfamiliar environment and under the scrutiny of cameras. However, not only

was the actual physical environment artificial and foreign, but the participants’ social context

was also inadequately represented. A typical social context usually includes interacting with

other individuals and groups, and working collaboratively on simultaneous activities at the same

time. The location of the usability laboratory and the goals of the evaluation did not permit the

introduction of social influences into the evaluation in the form of a telephone or another

individual present in the room to complete tasks with the participants.

- 214 -

Isolating the participants from their natural physical and social context in a usability laboratory

does not provide an accurate insight into how the University web site is used. One International

Student participant highlighted this with the following comment while completing a task which

required her to find out about the visa requirements for non-Australian students (Task 2 for

International Students): “I would just ask my friend about this. She would help me find the

information.”. This comment drew attention to the limitations of using an isolated setting for

evaluating a system that is inherently social because its use can be shared collaboratively by

groups of individuals. Considering the widespread availability of screen capture and editing

software such as Camtasia StudioTM, SnagItTM and MoraeTM, and the introduction of wireless

networking technology, the use of a laboratory for usability testing becomes questionable. By

using a laptop which had one of the above screen capture software installed, in an environment

where wireless networking was enabled, evaluators could test web sites in any type of social

and/or physical setting. This would possibly resolve the problems associated with removing the

users’ natural context.

Limitation 20: The users’ physical and social contexts are removed in a usability laboratory setting.

4.2.4.3 Facilitator-Related Limitations

The final set of limitations observed during the evaluation of the University web site was related

to the presence of the test monitor or facilitator in the room. Two particular limitations were

identified: prompting from the facilitator affected the participants’ performance and the

presence of the facilitator affected the participants’ motives.

Due to the wording of the task scenarios (as described in the previous section), participants

sought clarification from the facilitator about what they were required to do to complete the

- 215 -

task. Although the facilitator made every effort to respond in a neutral and unbiased manner,

participants perceived the information provided by the facilitator as a “hint” to help them

complete the task. In this sense, the tasks set out in the scenarios were viewed as tests by the

participants, which had to be completed successfully. Even a minor prompt by the facilitator,

such as “Why did you click on that particular link?”, stated in an impartial tone to find out the

reasoning behind selecting particular links, often resulted in the participants changing their

behaviour and stating in response: “I’m not sure. Is that the right link?”. This occurred because

the participants perceived the facilitator’s questions as indicators that they hadn’t done

something correctly, causing them to alter their performance in completing the task. This

limitation has been termed the “facilitator effect” and needs to be taken into account when

analysing the results of a usability test.

Limitation 21: Prompting and questions from the facilitator affect the users’ behaviour (the facilitator effect).

The second limitation is linked to the participants’ motives and has been described in some

detail previously under user-related limitations. It refers to the users’ motives in relation to the

facilitator. As a representative of the Project Team and the individual with whom participants

come into contact, the facilitator has the role of the “face of the evaluation”. As such,

participants tend to identify the facilitator with the owner of the system and modify their

behaviour accordingly. This was demonstrated previously by some of the comments made by

participants (“That looks like what you want”; “Is that what you are looking for?’), and the

results of the post-test questionnaire which showed high levels of satisfaction with the system

despite the difficulties experienced by participants in using the system.

Limitation 22: The facilitator is perceived as the system owner, which affects the users’ motives and behaviour.

- 216 -

4.2.4.4 Summary of Traditional Usability Testing Limitations

Sections 4.2.1 and 4.2.4 outlined a number of limitations with traditional usability testing. Some

of these limitations have already been discussed in a review of the literature in Chapter 2

(Section 2.8.3.3). The results of the University web site evaluation project have confirmed the

limitations from the literature and added several new ones. Most of the limitations identified can

be traced to a major flaw with traditional usability testing – the focus on the system. Traditional

usability testing is system driven, which is reflected in the definition of usability as an attribute

of the system (e.g. usability is defined as the ‘effectiveness’ of the system, the ‘memorability’ of

the system, the system ‘constraints’, etc.). In traditional usability testing, the system represents

the starting point for the evaluation. The users’ task scenarios are usually developed in relation

to the system and without reference to actual users’ tasks. The implications of this approach,

which localises usability as a property of the system, are significant because the system is

evaluated as if it exists in a vacuum.

This thesis proposes that the limitations of traditional usability testing can be overcome if the

method is revised from the viewpoint of distributed usability. Distributed usability extends the

notion of usability so that it is situated it in the context of a larger system of users’ activities.

This would make usability a quality of the users’ interaction with the system, rather than simply

a quality of the system itself. Evaluating the distributed usability of the system implies the need

to evaluate the system in relation to users’ activities. Instead of just evaluating usability

localised at the system interface, this approach evaluates the distributed usability or usefulness

of the system as well.

The previous three sections (4.2.4) represented Step 5 of Stage I of the research methodology

shown in Figure 3.8. This step is now complete (as shown in the replicated Figure 3.8 below).

- 217 -

STAGE I: Method Building Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method Theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability

testing based on application

6. Revise and re-develop method components based on notion of distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

Figure 3.8 Research Methodology (replicated)

The next step of Stage I involves revising and re-developing the method components of

traditional usability testing by using distributed usability and by applying Activity Theory

principles (as described in Table 2.1). The final step of Stage I involves integrating the revised

method components into the Distributed Usability Evaluation Method (DUEM), a method

which eliminates the limitations of traditional usability testing. These two steps are complex and

multifaceted, with activities from both steps being carried out in parallel. This is a common

phenomenon in the design and development process. Consequently, a chronological description

of Steps 6 and 7 would obscure this process. Instead, the two steps have been amalgamated into

one. By doing so it is possible to describe DUEM at the same time as discussing how it was

developed. This avoids unnecessary repetition and redundant explanations because describing

the development of DUEM without referring to DUEM itself would be a difficult and

cumbersome process. To overcome this, DUEM is described in parallel with giving an account

of how it was developed.

o Distributed Usability

o Activity Theory principles

Informed by:

- 218 -

4.3 Distributed Usability Evaluation Method (DUEM)

The Distributed Usability Evaluation Method (DUEM) represents a significant re-development

of the traditional usability testing method to overcome its limitations described in previous

sections. The primary aim of DUEM, therefore, is to improve traditional usability testing by

eliminating its limitations, while at the same time retaining all of the benefits of a user-based

UEM. The following sections will describe DUEM and the phases that make up DUEM. Each

phase corresponds to a method component. The development of and rationale behind each phase

will be discussed by referring to distributed usability, which forms the conceptual basis of

DUEM, and Cultural Historical Activity Theory (Activity Theory), which provides a theoretical

framework for the method.

4.3.1 DUEM - An Overview

DUEM consists of the following four phases:

1. Selecting and recruiting users,

2. Understanding users’ activities,

3. Evaluating the system in relation to users’ activities,

4. Analysing and interpreting results.

Each of these phases involves users and is underpinned by distributed usability and Activity

Theory principles. This implies that, before they are able to use DUEM, evaluators must have a

basic understanding of Activity Theory and its fundamental principles as described in Chapter

2. This is also important because the users are directly involved in the evaluation and the

evaluators must be able to guide them through the process without the need for them to

understand Activity Theory in-depth. However, prior research by Gould and Larkin (2000) has

shown that users do not experience difficulties understanding Activity Theory because its

principles are intuitive. For the purposes of describing DUEM in this thesis, it will be assumed

that evaluators’ possess the pre-requisite knowledge about Activity Theory.

- 219 -

In Phase 1 of DUEM, both direct and indirect users are recruited to participate in the evaluation.

During Phase 2, a team consisting of evaluators and users jointly define the users’ activities

using the activity system notation (as per Engeström, 1987) and the hierarchical structure

notation (as per Leont’ev, 1978). The system being evaluated and its functions are not the focus

of the discussion during this phase. Instead, the focus is on identifying typical users’ activities

and how they are carried out by users using the system and any other tools. The outcome of

Phase 2 is a detailed, shared understanding of what users do and how they do it.

In Phase 3, the evaluators and users once again work jointly and iteratively to design and

develop a set of evaluation goals based on the users’ activities identified during Phase 2. The

evaluation goals are not system specific. They are defined in relation to users’ activities. This is

in contrast to traditional usability testing which uses the system as the starting point for

developing evaluation goals. Based on these goals, a means of conducting the evaluation is

collaboratively negotiated between the evaluators and the users. The outcome of Phase 3 is rich

qualitative data about the users’ interaction with the system in the form of written notes based

on observations and discussions, and/or video and audio recordings of the interaction.

The data collected during Phase 3 is analysed in Phase 4 using Activity Theory principles. The

data are used to identify breakdowns (Bødker, 1991a) in the interaction and map these

breakdowns to contradictions in the activity (Engeström, 1999). The mapping of breakdowns

and contradictions defines the usability problems, which are distributed across the activity

network, rather than being found only at the system interface. Finally, the evaluators and users

jointly discuss possible solutions to the usability problems.

The four phases described briefly above represent four method components that make up

DUEM. They will now be described in detail with references to how they were developed.

- 220 -

Specific revisions made to traditional usability testing in order to improve it will be shown,

along with a rational explaining why those revisions were made.

4.3.2 Phase 1 – Selecting and Recruiting Users

The first method component of traditional usability testing involves defining the usability

testing plan. This is undertaken from the perspective of the system being tested. The usability

testing goals and objectives, and the tasks that the users will perform to test the system are

defined in relation to the system. This approach was described in the evaluation of the

University web site. As a result of this approach, several limitations were derived in Section

4.2.4, including the following:

Limitation 1: Failure to involve users in the design of the evaluation.

Limitation 2: Failure to incorporate users’ goals and motives into the design of the evaluation.

Limitation 3: Task scenarios are based on system functions, not actual user needs and activities.

Limitation 4: Failure to use language and terminology familiar to typical users in the task scenarios.

The above limitations can be overcome by involving the users in the evaluation from the outset.

Nardi (1996a) maintains that Activity Theory “does not forget the end user, the ordinary person

of no particular power who is expected to use the technology created by others” (p. 252).

Subsequently, DUEM defers the “Define usability testing plan” method component and begins

instead with selecting and recruiting users who will be involved in the design of the evaluation.

In traditional usability testing, this is the second method component, while in DUEM it is the

first.

- 221 -

DUEM Revision 1 – First method component Traditional Usability Testing DUEM

Traditional usability testing begins with defining the usability testing plan.

DUEM begins with recruiting and selecting representative users.

Rationale

To overcome Limitations 1 to 4 of traditional usability testing, users must be involved in the evaluation from the outset.

In traditional usability testing, only direct users of the system are selected and recruited to

participate in the evaluation. Direct users are individuals who will actually use the system in

their daily work and other activities. Indirect system users, such as individuals or groups who

may be affected by the system inputs and outputs, are not selected to participate in the

evaluation. DUEM is based on distributed usability, which implies that the usability of the

system is distributed across the entire activity that the system supports. This implies that both

direct and indirect users need to be involved in the evaluation. As Nardi and O’Day (1999) point

out: “We have to be willing to look at new technologies from the different perspectives of many

people […], not just a few people who play the most visible and important roles” (p. 215). In

Activity Theory terms, the direct users are the “Subjects” of the activity, while the indirect users

are the “Community” which shares the subjects’ object. This is shown in Figure 4.16.

- 222 -

Figure 4.16 Users in a DUEM evaluation (based on the Engeström, 1987)

If the distributed usability of a system is to be evaluated, both the direct users (Subjects) and

indirect users (Community) must be involved in the evaluation. This is because they are both

part of the activity across which the usability of the system is distributed. While the direct users

(Subjects) employ the system being evaluated as a tool in this activity, the indirect users

(Community) may be affected by the system indirectly through its inputs and/or outputs. For

example, the administrative staff at the University are direct users of the student management

system, while the management of the University are indirect users because they do not use the

system directly, but they do receive the reports generated by this system.

DUEM Revision 2 – Types of users involved in evaluation Traditional Usability Testing DUEM

Traditional usability testing involves only direct users of the system

DUEM involves both direct and indirect users of the system

Rationale

DUEM is based on distributed usability which implies that the usability of a system is distributed across an activity. Activity Theory described an activity as consisting of subjects and the subjects’ community. Subjects are direct users who use the system as a tool to accomplish their object, while the community are indirect users who share the subjects’ object, and consequently may be affected by the system.

- 223 -

Traditional usability testing uses Nielsen and Landauer’s (1993) model to determine the number

of users that need to be involved in the evaluation. However, Woolrych and Cockton (2001)

have questioned this model and provided evidence to show that it can fail. As a result of this,

DUEM does not prescribe the number of direct and indirect system users that need to be

selected and recruited for the evaluation. In most cases, the number of evaluation participants

will depend on the type of system being evaluated, the location of the users and the diversity

and size of the user population. If the system being evaluated is an accounting program that

resides inside an organisation and both the direct and indirect users are employees of the

organisation, then the evaluators are able to access and recruit the users more easily than if they

were external to the organisation (e.g. customers). Furthermore, if the organisation is a small

business, it may be possible to test with the whole user population. However, if the system is a

University web site such as the one described in Section 4.2.2, which can be accessed from

anywhere in the world and caters to a large number of diverse users located inside (students,

staff) and outside (prospective students, visitors, etc.) the University, then it may not be

appropriate to use four to five participants as recommended by Nielsen and Landauer (1993).

Since every evaluation is different in terms of the system and the users, in DUEM, it is up to the

evaluators to decide how many users should be involved. This number may also depend on the

time and resources available for the evaluation. The same recruitment techniques used in

traditional usability testing can be applied in DUEM (e.g. advertisements, posters, word of

mouth, etc.) to recruit users.

DUEM Revision 3 – Number of users involved in evaluation Traditional Usability Testing DUEM

Traditional usability testing applies Nielsen and Landauer’s (1993) model which states that four to five users are sufficient to find 80% of usability problems.

DUEM allows the evaluators to decide how many users to recruit depending on the type of system, the location of the users, the diversity of the users and the size of the user population. The time and resources available may also be taken into account.

- 224 -

Rationale

Since research by Woolrych and Cockton (2001) has shown that Nielsen and Landauer’s (1993) model is not reliable, and every evaluation situation is different, it is not possible to prescribe an exact number of users that should be involved.

In traditional usability testing, participants are often offered some form of compensation as a

token gesture of appreciation. However, offering compensation introduces problems into the

usability testing process because it affects the users’ motives. Several limitations were derived

and described in Section 4.2.4 as a result of this practice, including the following:

Limitation 5: Indifference from users in completing task scenarios. Limitation 6: Users only interested in satisficing solution.

To overcome these limitations, DUEM does not promote providing compensation to users

because of the effect it has on their motives. However, if some form of compensation is

necessary because of difficulties recruiting users, it is anticipated that the high level of user

involvement in DUEM may serve to counter and reduce the effects of the reward on the users’

motives.

DUEM Revision 4 – User compensation Traditional Usability Testing DUEM

Traditional usability testing advocates offering users some form of compensation in return for participating in the evaluation.

DUEM does not advocate offering compensation to users unless necessary. If necessary, it is anticipated that the high level of user involvement will reduce the effects of the compensation on users’ motives.

Rationale

To overcome Limitations 5 and 6 of traditional usability testing, the use of compensation is not advocated in DUEM.

- 225 -

Once the direct and indirect users have been selected and recruited, and a decision made about

offering compensation, the evaluation can proceed with Phase 2 which involves understanding

users’ activities.

4.3.3 Phase 2 – Understanding Users’ Activities

Any evaluation must begin with an investigation of the users’ actual practice as manifested in

their everyday activities and include the context in which users perform activities. Therefore,

the purpose of this phase of DUEM is to develop an in-depth understanding of actual users’

activities as they occur in the users’ context and in doing so establish a common communication

platform for evaluators and users to work together. This is done primarily through interviews,

focus groups and observation. The system itself is not the central focus of this phase. It is only

deemed to be important to the extent to which it supports the users’ activities. Instead, the

emphasis is on constructing a shared understanding of the users’ needs, goals and activities, so

that the system can be evaluated in relation to these activities in Phase 3 of DUEM.

Phase 2 involves conducting a series of interviews and/or focus groups with the recruited direct

and indirect users, and observing the users perform their activities if possible. An equivalent

process in traditional usability testing does not exist. Instead, the evaluators determine what

typical users’ tasks are and make assumptions about how users make use of the system being

evaluated based on their own prior knowledge and experience. Users are not involved directly in

this process. This is a major limitation of traditional usability testing, which in turn results in a

number of other limitations. These were derived and described in Section 4.2.4, and include the

following:

Limitation 2: Failure to incorporate users’ goals and motives into the design of the evaluation.

Limitation 3: Task scenarios are based on system functions, not actual user needs and activities.

- 226 -

Limitation 4: Failure to use language and terminology familiar to typical users in the task scenarios.

Limitation 5: Indifference from users in completing task scenarios. Limitation 6: Users only interested in satisficing solution. Limitation 7: Users constrained by being required to use only the system under evaluation. Limitation 8: Lack of understanding alternative means (tools) for achieving goals.

Limitation 10: Task scenarios do not provide information about how an activity changes over time.

Limitation 11: The wording of scenarios affects users’ performance and the results of the usability test.

Limitation 12: Confusion over scenario viewpoints affects the extent to which users identify with a scenario.

Since users are not interviewed or surveyed about their activities and needs in traditional

usability testing, no primary data is collected. As a result, real users’ goals and motives cannot

be incorporated into the evaluation, the task scenarios are not based on real users’ activities, the

language and terminology do not reflect the words and terms that users’ actually use to describe

their activities, and the evaluators do not have an accurate picture of how users use the system in

real life. The lack of understanding of users’ activities and needs: causes difficulties when

preparing the test materials, affects the evaluation process and outcomes, and results in

indifference from users when completing the task scenarios.

To overcome these limitations, DUEM advocates that users be involved from the outset of the

evaluation and that their involvement begins with interviewing them about their activities and

needs. The main purpose of Phase 2 is to collect primary data about users’ activities and

develop an in-depth understanding of these activities. Due to the problematic nature of gathering

this type of ad hoc data, the Activity Theory principles described previously (in Section 2.12.6)

can be used to make sense of the information gathered and also provide evaluators with a

common vocabulary (Nardi, 1996b) to describe users’ activities. Activity Theory principles,

therefore, provide a single, structured framework for both collecting and mapping the primary

data collected in this phase.

- 227 -

DUEM Revision 5 – User involvement in design of evaluation Traditional Usability Testing DUEM

Traditional usability testing does not involve users in designing the evaluation. Instead, evaluators determine what typical user tasks are and make assumptions about how users make use of the system being evaluated based on their own prior knowledge and experience, and the system functions.

In DUEM, users are involved from the outset of the evaluation so that a true and accurate understanding of their activities and needs can be developed.

Rationale

To overcome Limitations 2 to 8 and 10 to 12, it is necessary to involve users in the evaluation from the outset. Their involvement will contribute to overcoming the limitations because evaluators will have a first-hand understanding of users’ activities. This revision also addresses the issue of co-operation forms in traditional usability testing (identified in Section 4.2.1) where the evaluators drive the entire usability testing process.

The data about users’ activities can be collected through a series of one-to-one interviews or

focus groups of users. The latter is preferable because it provides a forum for discussion and

also for observing the social interactions between users, which is beneficial for developing an

understanding of the users’ social context. According to Kitzinger (1994, 1995) social

interaction between the participants in a focus group emphasises their view of the world, the

language they use about an issue and their values and beliefs about a situation. Furthermore, it

also enables participants to ask each other questions and to re-assess their own understandings

of specific issues and experiences. Evaluators can combine the interviews and focus groups with

observation in the users’ natural environment (if this is possible). Any relevant documentation

or materials should also be examined.

(The use of ethnography to collect primary data would be ideal. However, ethnography is a

long-term research method, which is not feasible for conducting a formative evaluation of a

system considering the scarce resources that most system evaluation projects suffer from

(Seidel, 1998; Nielsen, 2001).)

- 228 -

A set of twelve open-ended questions has been derived from the Activity Theory principles in

Section 2.12.6, to guide the collection of primary data during the interviews and focus groups.

All of the questions centre around the users’ activities, and the system is only discussed as one

of the many tools used to carry out those activities. This is in contrast to traditional usability

testing where evaluators use the system as the starting point for understanding users’ tasks and

developing scenarios for the evaluation (Sweeney et al, 1993). This is reflected in Limitation 3

(Task scenarios are based on system functions, not actual users’ needs and activities) of

traditional usability testing. Singh (2001) highlights this limitation effectively by stating:

“Though providers are intensely interested in the way people use their products and services,

much of their research starts from the products and services they offer, rather than the world of

the customer” (p. 113). Similarly, in traditional usability testing, even though the evaluators are

interested in users and their tasks, the evaluation process always starts from the system itself.

To overcome this limitation, DUEM starts with the users and their activities, sidelining the

system and reducing its role to that of a mediating tool. To ensure a user-centred viewpoint,

none of the questions refer to the system specifically. Once again, Singh (2001) provides a

useful demonstration of this approach. In a study on the use of electronic money, Singh (2001,

p. 115) reported that questions based on users’ activities (such as “How do you pay for

shopping, education, films and theatre?”) were more effective than questions specifically

related to the system or technology (such as “How do you use plastic cards?”).

The twelve questions listed below have been derived from Activity Theory principles (as

described in Section 2.12.6). They are not intended to be used in a literal sense. Rather, they

serve as guidelines for the evaluators in collecting data about users’ activities. They can be re-

phrased to suit the circumstances or refer to specific entities. However, regardless of the way in

which they are used or phrased, the answers to these questions should yield a rich set of

qualitative data about users’ activities.

- 229 -

1. What is the central (principal) user activity (referred to as ‘the activity’ from here onwards)?

2. What is the object of the activity and what are the motives for carrying out the activity?

3. What are the actions used to carry out the activity, and what are the goals associated with each action? Are there any alternative actions?

4. What are the operations used to carry out the actions, and what are the conditions in which the activity takes place?

5. What are the tools used to carry out the activity? Are there any alternative tools? 6. Who are the stakeholders in the activity (community)? 7. What are the roles of the stakeholders in the activity (division of labour)? 8. What rules govern the activity? 9. What are the outcomes of the activity and what will they be used for? 10. What other activities are related to the activity (network of activities)? 11. Are there any known problems in carrying out the activity (breakdowns and

contradictions)? 12. How was the activity carried out previously (historical development)?

Having listed the twelve questions above, each question will now be described in more detail to

demonstrate its purpose and rationale.

Question 1: What is the central (principal) user activity?

Source: [ATP-1] The central activity as the basic unit of analysis.

Identifying the central or principal activity that the users carry out reveals what users actually

do. This is the most fundamental question because it represents the use-situation or the context

in which the system being evaluated resides. The central activity forms the basic unit of analysis

on which subsequent questions are based. Evaluators can use the ‘time period’ to make a

distinction between activities and actions, because activities take place over an extended ‘time

period’. Activities are longer-term formations, while actions are short-term and goal-oriented

(Kuutti, 1996). However, it is possible for an action to become a long-term activity if the

circumstances in which an activity is being carried out change.

Once the evaluators and users agree upon the central activity, it needs to be decomposed into the

relevant elements. It is possible to have more than one central activity if the users’ perform a

- 230 -

variety of activities. In which case, all of the questions below must be answered for each of the

identified activities.

Question 2: What is the object of the activity and what are the motives for carrying out the activity?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-3] The object of the central activity. [ATP-6] The hierarchical structure of the central activity.

All activities are object-oriented. The object of an activity is the shared and unifying purpose

that the users are striving towards. An activity is defined by its object, and the object gives the

activity a direction. However, while the object is common to all the stakeholders, the motives

for carrying out an activity are unique to individuals. Therefore, there could be any number of

visible or hidden motives that individuals have for participating in the activity. An

understanding of the object and the users’ motives allows evaluators to situate the system and its

role in the context of an activity. It draws the attention away from the system itself, and

positions it in relation to what the user is trying to achieve. The answer to the question about the

object and motives addresses the basic issue of “why” the activity exists. It also aims to

overcome Limitation 2 (Failure to incorporate users’ goals and motives into the design of the

evaluation), Limitation 5 (Indifference from users in completing task scenarios) and Limitation

6 (Users only interested in a satisficing solution.) of traditional usability testing because it gives

evaluators an insight into users’ real motives and enables them to incorporate those into the

evaluation.

Question 3: What are the actions used to carry out the activity, and what are the goals associated with each action? Are there any alternative actions?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-6] The hierarchical structure of the central activity.

- 231 -

In order to fully analyse the activity, it is necessary to decompose it into actions or individual

processes that are subordinated to their goals. Actions describe what must be done to carry out

the activity, as well as how it can be done. There can be any number of ways to carry out the

activity, therefore it is important to determine all the different possible alternatives because this

provides evaluators with an insight into preferred ways of performing an activity. Answering

this question results in the users’ goals being clearly identified and taken into consideration,

which contributes towards overcoming Limitation 8 (Lack of understanding of alternative

means (tools) for achieving goals) of traditional usability testing.

Question 4: What are the operations used to carry out the actions, and what are the conditions in which the activity takes place?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-6] The hierarchical structure of the central activity.

Operations are the routines through which actions are carried out and goals are accomplished.

Operations tend to be crystallised into tools (Leont’ev, 1978), implying that the system being

evaluated should function at the level of operations, and that users’ should be able to use the

system without a conscious effort. The conditions in which the operations take place represent

the physical context in which the system is used. The physical context is removed in traditional

usability testing because the evaluation takes place in a usability laboratory (Limitation 20).

The answers to questions 2, 3 and 4 above provide an insight into the hierarchical structure of

the central activity and the relationships between the three different levels (as shown in Figure

4.17). These questions serve to decompose the central activity into actions and operations and

provide data about the users’ motives and goals. This data will later be used to analyse

breakdowns in the users interaction with the system (in Phase 3 of DUEM).

- 232 -

Figure 4.17 Hierarchical structure of an activity (Leont’ev, 1978)

Question 5: What are the tools used to carry out the activity? Are there any alternative tools?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-4] The mediating role of the system and other tools.

The purpose of this question is to identify the complete set of diverse tools that mediate the

activity and that users make use of in carrying out the activity. There may be several alternative

tools that can be used to carry out the same activity and a single activity may require the use of a

number of tools. All of these tools must be identified so that their attributes can be examined

and any preferences by the users for one tool over another considered. The system being

evaluated is one of these tools. However, it is important to identify other tools and their role as

well. For example, manuals, notes, books, forms, writing implements, etc. If possible, these

other tools should be examined. This question aims to overcome Limitation 8 (Lack of

understanding alternative means (tools) for achieving goals) of traditional usability testing.

Question 6: Who are the stakeholders in the activity (community)?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-5] The social context of the central activity.

- 233 -

The purpose of this question is to determine the attributes of the direct and indirect users in the

activity. The direct users are the subjects of the activity. They use the system on a regular basis

to transform the object of the activity into an outcome. The indirect users are the community

consisting of individuals who share the same object, but do not use the system directly. Instead,

they may provide the inputs for the system or use the outputs produced by the system. The

answer to this question provides an insight into the users’ social context, which is removed in

traditional usability testing (Limitation 20).

Question 7: What are the roles of the stakeholders in the activity (division of labour)?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-5] The social context of the central activity.

In Activity Theory terms, this question refers to the division of labour in the activity, i.e. who

does what? It describes the roles of the different stakeholders in relation to the actions that make

up the activity and the division of status or power. This question aims to define and structure the

relationships between the direct and indirect users of the system being evaluated. The answer to

this question provides an additional insight into the users’ social context.

Question 8: What rules govern the activity?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-5] The social context of the central activity.

The relationship between the direct users and the indirect users of the system is mediated by a

set of explicit and implicit regulations and social conventions that govern and constrain the

users’ activity. These need to be identified so that their impact on the usability of the system

being evaluated can be assessed. The answer to this question provides an additional insight into

the users’ social context.

- 234 -

Question 9: What are the outcomes of the activity and what will they be used for?

Source: [ATP-1] The central activity as the basic unit of analysis.

Every activity results in the transformation of the object into an outcome. The outcome can be

material or non-material in nature. It is necessary to identify the outcome of the central activity

and also determine what it will be used for. This is important because the mediating elements in

the activity system (tools, rules, division of labour) may affect the outcome of the activity and

subsequent activities to which the outcome is relevant. As Engeström (1996) points out, the

outcome of the activity may become a tool used in another activity. Therefore, it is important to

develop an understanding of what the users are ultimately trying to achieve so that the system

can be assessed in relation to whether it facilitates the achievement of the outcome.

The answers to questions 5 to 9 above can be mapped onto Engeström’s (1987) activity model

(shown in Figure 4.16) to derive a representation of users’ activities. Once the evaluators have

an understanding of the different elements that make up the activity system, they are able to

identify relationships between them, as well as any contradictions that may occur.

Question 10: What other activities are related to the activity (network of activities)?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-7] The network of activities in which the central activity is situated.

The central activity does not exist in a vacuum. There are a number of other activities which

have an impact on the central activity, such as the tool-producing activity (e.g. systems

development), the rule-producing activity (e.g. policy development), the subject-producing

activity (e.g. formal education), etc. Together all of these activities make up a network or web of

- 235 -

activities (Bødker, 1996). Since the system being evaluated may have an impact on various

activities in this network, it is necessary to identify all of them.

Question 11: Are there any known problems in carrying out the activity (breakdowns and contradictions)?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-6] The hierarchical structure of the central activity. [ATP-7] The network of activities in which the central activity is situated. [ATP-8] Contradictions in the central activity and between the central activity and other activities in the network.

Problems may exist in the users’ activities which affect the way in which the system is used. In

Activity Theory terms these problems are manifested as breakdowns in the users’ activity

caused by a one or more contradictions within and between the elements of the central activity

and its neighbouring activities (as defined by Engeström, 1999). If this is the case, these

breakdowns and contradictions need to be identified so that they can be taken into account when

evaluating the system.

Question 12: How was the activity carried out previously (historical development)?

Source: [ATP-1] The central activity as the basic unit of analysis. [ATP-2] The subject(s) internalisation of the central activity. [ATP-9] The historical development of the central activity and the tools mediating the activity.

A full understanding of the current form of the central activity is only possible if the historical

development of that activity is taken into account. According to Activity Theory, all activities

develop over time in an irregular and discontinuous fashion (Kuutti, 1996). As this occurs, parts

of the older phases of the activity remain embedded in the new activities through the mediating

elements (tools, rules and division of labour), which are shaped by the previous experiences

(Bannon & Bødker, 1991). These remnants may have an impact on the activity in its current

form and cause contradictions in the activity system. Therefore it is necessary to examine the

- 236 -

evolutionary development of the activity, so that it can be situated in a historical context and

any contradictions which arise as a result of the evolutionary development can be identified.

This question also enables evaluators to assess the effects of a previous system (if one existed)

on the current activity, and to determine to what extent users have internalised the activity and

how the users perceive the previous system. (If a previous version of the system exists, the

evaluators can observe how the users make use of it.) This is important in determining the

knowledge that the users have of the activity and the system itself. Although users may be

familiar with the activity itself, they may not have actually used the system to perform the

activity because there are alternative tools. For example, a student may be familiar with the

activity of borrowing books from the library but has only used an in-library catalogue to

perform this activity. In evaluating an alternative tool (for example, an online Internet-based

library catalogue), the student’s knowledge of the activity must be taken into account because

this knowledge of the activity will affect his/her use of the alternative tool. Similarly, a users’

experience with a previous version of the system being evaluated will colour his/her views and

perceptions of the new system.

The data collected from this question is very important because it identifies remnants of the

previous activity (or activities) in the current one, and also provides an insight into the extent to

which the users have internalised both the activity and any previous versions of the system. This

contributes towards overcoming Limitation 9 (Scenarios used in usability testing are discrete

and short-term tasks), Limitation 10 (Task scenarios do not provide information about how an

activity changes over time) and Limitation 16 (Prior use of a system (if one exists) affects the

users’ perceptions of the system being tested, and subsequently the post-test questionnaire

results) of traditional usability testing.

It also helps evaluators assess the differences between novice and expert users. Unlike

traditional usability testing which relies on pre-test questionnaire data to determine which users’

- 237 -

are novices and which ones are experts, DUEM advocates that the level of expertise of a user

can only be determined if both the activity and any previous versions of the system are taken

into account. This is because the system cannot be treated separately from the activity in which

it is used. There is a dialectic relationship between the activity and the system, so both must be

taken into account to understand where a user is situated on the novice to expert continuum.

DUEM Revision 6 – Users’ knowledge Traditional Usability Testing DUEM

In traditional usability testing users are classified as novices or experts depending on the data collected from pre-test questionnaires. The classification of a user is based on his/her previous use of the system being evaluated.

In DUEM, users’ are classified as novices or experts depending on their previous use of the system being evaluated AND their knowledge about the activity where the system is used.

Rationale

Activity Theory proposes that individuals internalise practical external tool-mediated activities over extended periods of time. As this process occurs, the activity itself changes. Computer systems are one of many elements that constitute these activities. Therefore, the internalisation of a system (i.e. a tool) cannot be understood separately from the internalisation of the activity itself.

Answering the above twelve questions in an interview or focus group, coupled with observation,

will enable evaluators to develop an in-depth understanding of users’ activities. This is useful

because it provides a context in which the system can be evaluated and assists in the preparation

of the test materials. Instead of relying on their own prior knowledge and experience, and the

system functions to define users’ tasks, evaluators have primary data about real activities that

users’ engage in.

Since Activity Theory principles are intertwined with one another, and as such cannot be

applied completely independently of each other, the above questions cannot be expected to

produce discrete or independent answers. For example, question 3 asks about the actions that

are used to carry out the activity. In specifying these actions, users will also inevitably specify

the tools they use. This information will also provide an answer to question 5, which asks about

- 238 -

the tools that are used to carry out the activity. Due to the possibility of overlapping or repetitive

answers, the questions should be used mainly as a guide in analysing users activities and

mapping these on to Activity Theory principles. The questions also set a boundary for the

discussion because they are limited to activities as defined by Activity Theory. Without the

structured framework that Activity Theory provides, it would not be possible to establish such a

boundary. The questions in this phase of DUEM enable the evaluators to focus on specific

elements of users’ activities without straying into irrelevant issues.

The discussion that takes place between the evaluators and users in order to obtain answers to

the above questions serves three important purposes. Firstly, it enables both evaluators and users

to jointly construct a shared understanding and interpretation of the system’s role as defined

through the users’ activities. Since the system being evaluated only has a mediating role in these

activities, it does not become the focus at the outset of the evaluation. Instead, the system is

viewed as one of many tools that are used in performing activities, rather than being seen as an

object that needs to be mastered (Singh, 2001). Secondly, the discussion provides the evaluators

with an understanding of the users’ language and terminology. Instead of using the system’s

terminology to describe users’ activities, evaluators are able to become familiar with the

vocabulary that the users’ apply in describing their own activities. This ultimately serves to

facilitate improved communication between the evaluators and users and overcomes Limitation

4 (Failure to use language and terminology familiar to typical users in the task scenarios) and

Limitation 11 (The wording of scenarios affects users’ performance and the results of the

usability test). Finally, due to the broad nature of the questions and the user-centred (rather than

system-centred) focus of the questions, the discussion serves to build trust between evaluators

and users. In traditional usability testing, the users are viewed primarily as subjects or

participants in a testing situation. They do not have an active role in the evaluation process, and

they never meet or interact with the evaluators or designers. The only individual they have

contact with is the usability test facilitator who explains what they are required to do. By

involving the users from the very outset of the evaluation the opportunity exists to establish a

- 239 -

relationship of trust between the evaluators and users, which may contribute to overcoming

other facilitator-related limitations (Limitation 21 and Limitation 22).

DUEM Revision 7 – Understanding users’ activities Traditional Usability Testing DUEM

In traditional usability testing evaluators determine what typical user tasks are and make assumptions about how users make use of the system being evaluated based on their own prior knowledge and experience, and the system’s functions.

In DUEM, a set of twelve questions based on Activity Theory principles has been developed to guide the evaluators in interviews, focus groups and observation aimed at developing an in-depth understanding of real users’ activities.

Rationale

Traditional usability testing is driven by the system (Sweeney et al, 1993) and focused on the system from the outset. To overcome this major limitation, DUEM focuses on the users’ activities at the outset and reduces the role of the system to a mediating tool in those activities. By developing an in-depth understanding of the users’ activities, the system can be evaluated against a backdrop of these activities. This is consistent with distributed usability where usability is a property of the users’ interaction with the system in an activity, and not the system itself.

The data collected from the interviews, focus groups and observation in Phase 2 of DUEM

provides:

• an integrated, holistic view of the users’ central or principal activity (or activities) and

other intersecting activities;

• a description of the various tools, rules and responsibilities that mediate the activity;

• an understanding of the users’ physical and social context; and

• an explanation of the historical development of the activity.

The outcome of Phase 2 is an in-depth understanding of real users’ activities mapped on to

Activity Theory principles. It provides a context against which the system being evaluated can

be tested. This is arguably the most critical phase because DUEM does not evaluate the

usability, but the distributed usability (or usefulness) of a system. Distributed usability views

the usability of a system as being distributed across users’ activities. Therefore, if DUEM is to

be used to assess distributed usability, the evaluators must first become familiar with the

- 240 -

activities across which the usability is distributed. Having done this, the evaluators can proceed

with testing the system.

4.3.4 Phase 3 – Evaluating the System in relation to Users’ Activities

Following the development of the shared understanding of users’ activities in Phase 2 of

DUEM, the evaluation proceeds with Phase 3 which consists of three sub-phases: defining an

evaluation plan; preparing the appropriate evaluation resources and materials; and the testing the

system. The equivalent of these three sub-phases in traditional usability testing are the following

three method components: “Define usability testing plan”, “Prepare test materials” and

“Usability test”. In DUEM these three method components have been revised as sub-phases of a

single phase (method component). This has been done in order to encapsulate the three

processes and ensure closer coupling between them for continuity and consistency. Each of the

processes will now be described as a sub-phase.

4.3.4.1 Sub-phase 3.1 – Defining the Evaluation Plan

While traditional usability testing begins with defining a usability testing plan, in DUEM the

evaluation plan is defined in Phase 3 jointly and iteratively with the users. The evaluation plan

is a document that lists the goals of the evaluation and how the evaluation will be performed to

achieve these goals. It is derived primarily based on the outcome of Phase 2, and together with

users.

DUEM Revision 8 – Defining the evaluation plan Traditional Usability Testing DUEM

In traditional usability testing evaluators define the usability testing plan.

In DUEM, the evaluation plan is defined jointly by the evaluators and users.

- 241 -

Rationale

This revision was designed to overcome Limitation 1 (Failure to involve users in the design of the evaluation) and thus involve users directly in the design of the evaluation. It is anticipated that this will give users a greater sense of ownership over the evaluation, and subsequently have a positive effect on their motives, thus overcoming Limitation 5 (Indifference from users in completing task scenarios) and Limitation 6 (Users only interested in a satisficing solution).

The goals of the evaluation should be specified in relation to these activities, and not the system.

For example, “To assess the functionality of the system” or “To identify usability problems with

the interface” are inappropriate goals because they are directly related to the system only, and

do not make any references to the users’ interaction with the system in performing activities. A

more appropriate goal would be: “To assess how well the system supports [insert details of

users’ activity here]”, because it shifts the goals of the evaluation away from the system and

emphasises the evaluation of the system in relation to the users’ activities. This prevents the

localisation of the evaluation goals in the system and transforms the evaluation from being

system-driven to a focus on the system in actual use.

DUEM Revision 9 – Defining the evaluation goals Traditional Usability Testing DUEM

In traditional usability testing evaluators define the usability testing goals first, and without reference to users’ activities and needs.

DUEM defines the goals of the evaluation after the users’ activities have been defined. This implies that the system is tested in relation to the users’ activities.

Rationale

The very first step in traditional usability testing (refer to Table 4.1) is defining the purpose and goals of the evaluation. These are usually defined in relation to the system only. However, the system does not exist in a vacuum. The system is a tool used by users in their daily activities. Subsequently, the goals of the evaluation cannot be based on the system (i.e. the tool) in isolation from these activities. The users’ activities must be taken into account when defining the evaluation goals.

Once the goals of the evaluation have been agreed upon, the evaluators and users can

collaboratively negotiate how the goals can be achieved (i.e. how the evaluation will be

- 242 -

performed). DUEM is not intended to be a prescriptive method. It aims to provide evaluators

and users with the opportunity to work together in a flexible manner that suits their goals and

needs. Since every evaluation is different in terms of the resources available to the evaluators

(Preece et al, 2002), DUEM makes a provision for these differences by allowing evaluators and

users to select their own means for performing the evaluation. This approach enables evaluators

and users to design an evaluation that is within their needs and constraints.

DUEM Revision 10 – Defining the evaluation means Traditional Usability Testing DUEM

In traditional usability testing the evaluation is carried out in a usability laboratory using task scenarios.

In a DUEM evaluation, evaluators and users can collaboratively negotiate how the evaluation will be carried out, depending on their needs, resources and constraints.

Rationale

Similarly to DUEM Revision 3, which argues that the number of participants in an evaluation will depend on a number of different factors, this revision is also based on the premise that every evaluation is different. Evaluators should be given the flexibility to select the evaluation means that suit their needs, resources and constraints. Subsequently, DUEM does not prescribe the evaluation means.

Although DUEM does not prescribe how an evaluation will be carried out, several options are

presented. These include scenario-based evaluation, free-from evaluation or controlled

evaluation.

Scenario-based evaluation is similar to the traditional usability testing approach of using task

scenarios. However, in DUEM, these scenarios are not developed solely by evaluators based on

their prior knowledge and experience, and on the functions of system. Instead, the primary data

collected in Phase 2 can be used to develop scenarios that are based on real users’ activities.

These scenarios have been termed user activity scenarios. Instead of setting discrete and short

tasks, which prescribe what the participants must do, in DUEM the evaluators and users develop

longer scenarios describing typical users’ activities and their historical development using

- 243 -

language and terminology that the users are familiar with. The evaluators and users could then

work through the scenarios jointly using the system to determine whether it facilitates the

activities described in the scenarios. For example, if the user activity scenario describes an

activity using language and terminology that the users are familiar with, and the system does not

use the same language and terminology, then the system does not facilitate the users’ activity.

The use of user activity scenarios and the direct involvement of users in preparing these

scenarios may contribute towards overcoming the following limitations of traditional usability

testing:

Limitation 2: Failure to incorporate users’ goals and motives into the design of the evaluation.

Limitation 3: Task scenarios are based on system functions, not actual user needs and activities.

Limitation 4: Failure to use language and terminology familiar to typical users in the task scenarios.

Limitation 5: Indifference from users in completing task scenarios. Limitation 6: Users only interested in satisficing solution. Limitation 9: Scenarios used in usability testing are discrete and short-term tasks.

Limitation 10: Task scenarios do not provide information about how an activity changes over time.

Limitation 11: The wording of scenarios affects users’ performance and the results of the usability test.

Limitation 12: Confusion over scenario viewpoints affects the extent to which users identify with a scenario.

In a different situation, the evaluators and users might choose to carry out a free-form

evaluation where the two groups simply use and discuss the system jointly in the users’ natural

environment (if possible), without the need for scenarios to guide the evaluation. This type of

evaluation is flexible and useful because of its ability to uncover problems arising from using a

system, which may not have been anticipated otherwise (Omanson & Schwartz, 1997; Sears,

1997). Users are free to demonstrate how they use the system from their own point of view and

experiences. This approach also enables the evaluators to study the effects of the physical and

social contexts on the users’ interaction (if, for example, the telephone rings, or a colleague

interrupts, etc.). DUEM advocates the inclusion of the users’ natural environment in the

- 244 -

evaluation whenever possible and feasible. This is important because the environment

represents the users’ physical and social contexts and, by omitting these contexts in evaluating

the system, it is highly likely that aspects of the system, which may not be crucial in use, will be

overemphasised (Preece et al, 2002). However, this approach is also generally thought to be

problematic because it does not provide a framework or guidelines to structure the evaluation so

that the system coverage is not systematic or consistent across different users (Omanson &

Schwartz, 1997). To overcome this problem, DUEM places broad constraints on how the

system is used in a free-form evaluation. These constraints are defined by the users’ activities

identified in Phase 2. Users are asked to demonstrate and discuss how they use the system to

carry out the activities that they have identified in the previous phase.

Finally, if the evaluators and users feel that they would benefit from a more controlled

evaluation in a usability laboratory setting, DUEM does not eliminate this evaluation means

altogether. Since DUEM represents an improvement of the traditional usability testing method,

it cannot disregard the use of a laboratory completely. The limitations of using a laboratory have

been documented in Section 4.2.4 (Limitation 20: The users’ physical and social contexts are

removed in a usability laboratory setting). As long as the evaluators and users are aware of these

limitations and take them into account, and so long as the users’ activities defined in Phase 2 are

used as the basis of the evaluation, they may be able to benefit from a controlled evaluation in a

laboratory.

The controlled environment of a laboratory enables evaluators to record and capture the users’

interaction with the system. However, even though DUEM does not exclude the use of a

laboratory, it does not advocate the use of quantitative measures such as the number of clicks a

user makes, or the number of minutes and seconds a user takes to complete a task. The

limitations of these types of measures have been described previously in Section 4.2.4

(Limitation 17: Quantitative measures collected during usability testing may be of limited

value) and documented by Nielsen (2004) who states that “Number fetishism leads usability

- 245 -

studies astray by focusing on statistical analyses that are often false, biased, misleading, or

overly narrow”. Therefore, DUEM does not make any provisions for collecting these types of

quantitative measures.

DUEM Revision 11 – Quantitative measures Traditional Usability Testing DUEM

In a traditional usability test in a laboratory setting, quantitative measures such as the length of time it takes to complete a task and the number of clicks a user makes are recorded.

In DUEM no provision is made for recording quantitative measures if a controlled evaluation is used.

Rationale

This revision was made in response to Limitation 17 (Quantitative measures collected during usability testing may be of limited value) of traditional usability testing, and recent research by Nielsen (2004) which confirms Limitation 17.

4.3.4.2 Sub-phase 3.2 – Preparing the Evaluation Resources and Materials

The second sub-phase of Phase 3 involves the preparation of the evaluation resources and

materials. This step is conceptually equivalent to the third component of traditional usability

testing (Prepare test materials). However, in practice it differs from traditional usability testing

because it does not prescribe the use of specific resources such as a usability laboratory, or

specific materials such as questionnaires. Depending on what the evaluators and users have

negotiated in the previous sub-phase, the preparation of the evaluation resources and materials

can be done in any number of ways. For example, if a decision has been made to employ user

activity scenarios, then these will have to be written and developed jointly and iteratively.

Similarly, if a controlled evaluation will be used, then the appropriate physical resources for

conducting the evaluation will have to be arranged.

- 246 -

For the purposes of analysing the evaluation results in Phase 4 of DUEM, it is recommended

that the evaluation be recorded. The advent of screen capture software such as Camtasia

StudioTM, SnagItTM and MoraeTM, which can be installed on any computer and activated to

record the computer screen without being intrusive has made this process relatively easy. The

software has the ability to capture both audio and video images in a standard digital format that

can later be re-played for analysis. The software can be installed on a user’s own computer in

his/her natural environment. With the introduction of wireless networking technology, it is even

possible to do evaluations if this environment is outdoors.

If the evaluators wish to capture the users’ physical environment, small digital video recording

cameras can be placed in discreet locations around the evaluation location. It is also possible to

use mini ‘cams’ which are connected directly to the computer and are able to capture videos and

still images. Some users may already have these cameras installed on their computers and are

familiar with how they operate. This serves to facilitate the evaluation process because the user

is already comfortable with the use of cameras.

If a decision is made to use video and audio recording equipment in the evaluation, appropriate

measures must be taken to ensure that the privacy of the users is protected. Evaluators and users

can negotiate a mutually agreeable consent form for this purpose.

The use of video and audio recording equipment will depend on the scope and duration of the

evaluation, as well as the resources available. If the evaluation involves a simple system and a

small number of users over a short period of time, it may not be necessary to employ elaborate

recording devices. In such a situation, simply recording the data using a pen and paper may be

sufficient. In the opposite extreme, where the evaluation concerns a large scale organisational

system with possibly hundreds of stakeholders over an extended period, the evaluators must

develop a detailed strategy for collecting and storing the data as part of Phase 3.

- 247 -

One of the most frequently used materials in traditional usability testing are the pre-test and

post-test questionnaires, used to determine the users’ profile and satisfaction with the system,

respectively. However, Section 4.2.4 described several limitations with the use of pre-test and

post-test questionnaires, including:

Limitation 13: Due to a small sample size it is not possible to perform statistical analyses.

Limitation 14: Pre-test questionnaires do not indicate if participants are representative of the user population.

Limitation 15: Post-test questionnaire data is contradictory to user performance.

Limitation 16: Prior use of a system (if one exists) affects the users’ perceptions of the system being tested, and subsequently the post-test questionnaire results.

Due to these limitations, DUEM does not advocate the use of questionnaires. Since the

evaluators and users work together extensively from the outset of the evaluation, there is no

need for a pre-test questionnaire to profile the users. This can be done during the interviews and

focus groups in Phase 2. Similarly, the post-test questionnaire is not necessary because the users

are not required to provide their views about the system or aspects of the system (e.g. ease of

use, navigation, etc.). Instead, the evaluators would benefit more from an interview with users

because they would be able to collect rich qualitative data about the system, instead of discrete

quantitative measures which may not be meaningful (see Limitation 15 above).

DUEM Revision 12 – Use of pre-test and post-test questionnaires Traditional Usability Testing DUEM

In traditional usability testing data is collected about the users’ background in a pre-test questionnaire, and about their satisfaction with the system in a post-test questionnaire.

In DUEM data is collected about the users’ background during interviews and/or focus groups in Phase 2. Data about the users’ satisfaction with the system is collected in Phase 4 through interviews.

Rationale

The aim of this revision is to overcome Limitations 13 to 16 of traditional usability testing by eliminating the use of predominantly quantitative pre-test and post-test questionnaires. Instead, qualitative data about the users is collected through interviews and/or focus groups. This approach enables evaluators to probe further by asking questions.

- 248 -

Finally, in preparing the evaluation resources and materials, the evaluators and users must also

decide what other tools will be used in the evaluation. For example, if the system requires inputs

or produces outputs, these will have to be prepared and made available. This may involve

preparing various documents such as forms, manuals, notes, etc. These materials are important

because they may represent additional tools or alternative tools that a user makes use of while

interacting with the system.

DUEM Revision 13 – Additional materials Traditional Usability Testing DUEM

Traditional laboratory testing usually involves using the system only to complete the task scenarios. It is possible, however, to include other materials if the users require these to use the system.

DUEM views materials other than the system as additional or alternative tools that users make use of in their activities.

Rationale

To overcome Limitation 7 (Users constrained by being required to use only system under evaluation) of traditional usability testing it is important to include additional or alternative tools in the evaluation so that it can be observed how users make use of these.

4.3.4.3 Sub-phase 3.3 – Testing the System

The final sub-phase of Phase 3 involves implementing or executing the evaluation plan.

Following the design of the evaluation plan and the preparation of the evaluation resources and

materials, the evaluators and users need to carry out the evaluation of the system. When, where

and how this takes place will be dependent on what is specified in the evaluation plan.

The outcome of Phase 3 of DUEM is a rich set of qualitative data about the users’ interaction

with the system in the form of written notes based on observations, video and audio recordings

- 249 -

(if used) and interviews with the users. This data is analysed in Phase 4 using Activity Theory

principles, as described in the following section.

4.3.5 Phase 4 – Analysing and Interpreting Results

DUEM is based on distributed usability, which implies that usability is not contained within the

system or at the interface, but it is distributed across an activity or network of activities which

the system mediates. Therefore, when evaluating distributed usability it is insufficient to

examine the system or interface alone. Instead, usability problems in the entire activity system

must be identified. The purpose of Phase 4 is to analyse the data collected in Phases 2 and 3 to

identify the usability problems distributed across the activity system. Activity Theory provides a

framework to achieve this through the notions of breakdowns and contradictions. These notions

have been discussed at length in Chapter 2, however they will be re-visited here briefly before a

full description of Phase 4 of DUEM is presented.

Bødker (1991a) introduced the notion of breakdowns to describe “unforeseen changes in the

material conditions” of an activity (p. 22). A breakdown in the users’ interaction with a system

occurs when an operation is conceptualised to the level of an action, so that the user has to shift

his/her focus from the activity at hand to resolving an unexpected problem with the tool (i.e. the

system). For example, an employee engaged in the activity of preparing a report for her

manager using a word processing software will experience a breakdown if the software has a

complex page numbering function. In order to paginate the report, the employee will have to

shift the focus of her attention away from the report itself to the word processing software. The

conditions in which the employee is performing the activity have changed and this has caused

her to shift the use of the software from a subconscious, operational level, to the conscious level

of an action with the goal of inserting page numbers into the report. This breakdown is shown in

Figure 4.18.

- 250 -

Figure 4.18 Breakdown in report producing activity

According to Kuutti (1996) a breakdown is a manifestation of a contradiction in the activity

system. It is a symptom, while the underlying cause can be found in contradictions, which are

inherent to activity systems. Engeström (1999) identified four levels of contradictions: primary,

secondary, tertiary and quaternary. They are shown in Figure 2.27. In the previous example, the

breakdown was caused by a secondary contradiction between the tool and the object of the

activity because the system did not support the employee’s activity of producing a report. After

the breakdown, the object of the activity changed from producing a report to inserting page

numbers. Since the word processor itself became the object of the activity, the employee was

left with no tools (Hasu & Engeström, 2000).

In traditional usability testing, there are no effective means for defining what constitutes a

usability problem (as explained in Section 2.9). This was identified as a limitation of traditional

usability testing (Limitation 18: No direct means of defining and categorising what constitutes a

Activity

Actions

Operations

Produce Report

Research & collect info Plan report format

Type Use word processing software

Write notes

Employee’s Activity Structure BEFORE the

Breakdown

Activity

Actions

Operations

Produce Report

Research & collect info Plan report format

Use word processing software

Type Write notes

Employee’s Activity Structure AFTER the

Breakdown

Using the complex page number function has shifted the operation to goal-directed action level

- 251 -

usability problem). To overcome this limitation, DUEM makes use of Activity Theory

principles, specifically breakdowns, conceptualisation, and contradictions to define a usability

problem. Since DUEM is based on distributed usability, it is concerned with usability problems

that are distributed across the activity, or distributed usability problems. In DUEM, a

breakdown in the users’ activity which causes the user to conceptualise operations to the level

of actions or activities, and which can be traced to one or more contradictions in the activity is

deemed to be a distributed usability problem. This is because the problem occurs when one or

more contradictions in the activity emerge and cause a breakdown in the users’ interaction with

the system.

It must be noted that according to the above definition, a DUEM evaluation should still identify

usability problems at the interface (i.e. the types of problems that might be identified in a

traditional usability test). These types of usability problems are breakdowns which can be traced

to a primary contradiction (i.e. a contradiction in the system itself). However, in addition to

these problems, DUEM also identifies usability problems which are caused by other elements of

the activity interacting with the system (i.e. secondary, tertiary and quaternary contradictions).

DUEM Revision 14 – Defining a usability problem Traditional Usability Testing DUEM

In traditional usability testing there are no explicit criteria defining when a difficulty users are having or fault with the system constitutes a usability problem (Hertzum & Jacobsen, 2001). Any problem reported is deemed to be a usability problem.

DUEM defines a usability problem as a breakdown in the users’ interaction with the system which causes the user to conceptualise operations to the level of an action or activity. This breakdown is caused by one or more contradictions in the users’ activity.

Rationale

To overcome Limitation 18 (No direct means of defining and categorising what constitutes a usability problem) of traditional usability testing it was necessary to provide a definition of a usability problem. Since DUEM is based on distributed usability (i.e. usability distributed across an activity), a usability problem is defined as a problem which can be traced to contradictions in the activity and which causes a breakdown in the users’ interaction with the system (i.e. tool).

- 252 -

Phase 4 of DUEM involves analysing and interpreting the data collected in the preceding two

phases using breakdowns, conceptualisation and contradictions to identify distributed usability

problems. The use of Activity Theory principles in Phase 2 to develop and understanding of the

users’ activities and map these activities to Activity Theory principles permits the evaluators

and users to undertake this analysis. This implies a continuity in the notation between method

components used in DUEM. When traditional usability testing was decomposed into method

components and examined in Section 4.2.1, it was noted that diverse notations are used from

one method component to another. This reveals a lack of notation continuity from one method

component to another, which can result in inconsistencies, errors and omissions. However, in

DUEM, Activity Theory principles are used in Phase 2 to define users’ activities. In Phase 3

these activities are used in the evaluation as the backdrop against which the system is tested.

Finally, in Phase 4, breakdowns in the users’ interaction with the system are mapped to

contradictions in the users’ activities, thus identifying distributed usability problems. This

approach permits a higher level of continuity from one method component of DUEM to the next

by using the outcome of one component directly as a tool in the next component.

Furthermore, the use of Activity Theory and its principles as an underlying framework for

Phases 2 to 4 of DUEM, overcomes another limitation of traditional usability testing which was

identified in Section 4.2.1. In its current form, traditional usability testing lacks a single

coherent unifying perspective or theoretical basis that reveals what is important or what the

evaluators need to focus on. Using Activity Theory principles to define users’ activities, test the

system in relation to those activities and subsequently analyse the results of the test, implies that

a single unifying perspective exists in DUEM.

- 253 -

DUEM Revision 15 – Notation continuity and unifying perspective Traditional Usability Testing DUEM

In traditional usability testing different notations are used in different method components. This implies a lack of notation continuity from one method component to another. Also, traditional usability testing in its current form does not have a single unifying perspective (theoretical framework).

DUEM makes use of Activity Theory principles to provide notation continuity from one method component to the other (through users’ activities). The use of these principles also implies the existence of a single unifying perspective ((theoretical framework) that underpins the evaluation.

Rationale

To overcome the lack of notation continuity and a unifying perspective, Activity Theory principles have been incorporated into Phases 2, 3 and 4 of DUEM as a means of identifying users’ activities, testing the system in relation to those activities and analysing the results in relation to those activities.

Before proceeding with a description of how to analyse the data collected from Phases 2 and 3,

it should be noted that similarly to the two previous phases, Phase 4 of DUEM requires that

evaluators and users work together. They jointly analyse the collected data to derive a list of

distributed usability problems, which need to be addressed. This approach is used for two

reasons. Firstly, it is intended to overcome Limitation 19 (Failure to involve users in the

analysis of the evaluation results) of traditional usability testing. Involving users’ in the analysis

of the results is beneficial because it ensures that all of the data collected is interpreted correctly

and users’ are able to provide explanations and clarifications if necessary. DUEM recommends

that the analysis of results be carried out in a focus group with users. By using a focus group

users can ask each other questions and re-assess their own understandings of specific issues and

experiences (Kitzinger, 1994; 1995). Secondly, by analysing the data collected and identifying

the distributed usability problems jointly, users can also suggest ways of resolving these

problems and improving the system to the evaluators.

- 254 -

DUEM Revision 16 – User involvement in analysis of results Traditional Usability Testing DUEM

Traditional usability testing does not involve users in analysing the results of the evaluation. Once the users have completed the task scenarios and filled out the questionnaires, their involvement in the evaluation end.

DUEM advocates the involvement of users in the analysis of the evaluation results to ensure that the data collected is interpreted correctly, and to seek potential solutions to distributed usability problems from users.

Rationale

The aim of this revision is to overcome Limitation 19 (Failure to involve users in the analysis of the evaluation results) of traditional usability testing. It also addresses the issue of co-operation forms in traditional usability testing (identified in Section 4.2.1) where the evaluators drive the entire usability testing process.

Identifying distributed usability problems in Phase 4 involves two steps. The first step involves

noting breakdowns observed in the users’ interaction with the system during Phase 3 (sub-phase

3.3). The second step requires that these breakdowns be mapped to contradictions in the activity

system.

In the first step, any breakdowns, which resulted in users’ conceptualising their use of the

system (i.e. operation) to the level of an action or activity, must be noted by evaluators and

users. Depending on how the evaluators and users chose to record their data in Phase 3, the

breakdowns can be identified from the notes taken during Phase 3 or from recorded video

images and audio records. Every time that a user is observed to shift his/her focus away from

the activity in order to resolve an issue with the system (i.e. the tool), a breakdown has occurred

and the evaluators and users must discuss the nature of the breakdown and what occurred when

the breakdown took place.

The severity of the breakdown can also be assessed depending on how high up the activity

hierarchy the operation was conceptualised and how long it took the user to recover from the

breakdown. If the operation is conceptualised to the level of an action (as was the case in the

- 255 -

example of the employee using the word processor described above and shown in Figure 4.18)

and the user is able to recover from the breakdown efficiently, the severity of the breakdown is

deemed to be relatively low. However, if the operation is conceptualised to the level of an

activity so that the users’ entire activity is now directed towards resolving it (i.e. fixing the

problem with the system becomes the object of the users’ activity), and the user requires an

extended period of time to recover from the breakdown (or indeed, is unable to), then the

severity of the breakdown is deemed to be relatively high. Although this can be used as a guide

for determining the severity of the breakdown, the evaluators and users must jointly decide on

the actual severity due to the differences between users’ in their knowledge of the activity itself

and any previous versions of the system. The users’ knowledge and familiarity with the activity

and/or the previous version of the system were determined in Phase 2 by answering Question

12: How was the activity carried out previously?. The more knowledgeable and familiar the

users are with the system, the more likely they will be able to recover from the breakdown

relatively easily. Therefore, to assess the severity of a breakdown, the following must be taken

into account:

• the level of conceptualisation;

• the length of time it takes a user to recover from the breakdown;

• the users’ knowledge of the system and the activity.

After a breakdown is identified, in the second step of Phase 4 it must be traced or mapped to a

primary, secondary, tertiary or quaternary contradiction in the activity (which are described in

Chapter 2 and shown in Figure 2.27). Only those contradictions related directly to the system

(i.e. the tool) are of interest to the evaluators and users. These are shown in Figure 4.19. Any

other contradictions (e.g. a secondary contradiction between the division of labour and the

object of the activity) are only relevant to the extent that they are indirectly caused by the

system, or affect the system use in some way. The contradictions can only emerge through

discussions with the users, therefore each breakdown must be considered carefully by the

- 256 -

evaluators and users in order to determine the levels of contradictions that the breakdown is

caused by.

Figure 4.19 Contradictions directly related to the system (i.e. tool)

A breakdown is not necessarily caused by a single contradiction. There may be several different

contradictions in the activity system which lead to a breakdown in the users’ activity. Therefore,

it is recommended that evaluators and users examine all the possible levels of contradictions to

pinpoint the cause of a breakdown. This can be done by answering the following questions for

each breakdown: Is the breakdown caused by

a) a primary contradiction within the tool?

b) a secondary contradiction between the tool and other elements in the activity?

c) a tertiary contradiction between the activity and another more culturally advanced

activity (or historical activity)?

d) a quaternary contradiction between the activity and other activities in the network?

Tool (System)

Subject Object

Community

Rules Division of Labour

Historical Activity

Tool Producing Activity

LEGEND 1 = primary contradiction 3 = tertiary contradiction 2 = secondary contradiction 4 = quaternary contradiction

1

2 2

2

22

3 4

- 257 -

This type of analysis examines the mediating role of the system (i.e. the tool) in relation to the

activity and determines whether the system is compatible with the other elements of the activity

or other activities. If the system is not compatible with another element, then contradictions

emerge and are manifested through breakdowns. Mapping the breakdowns to contradictions

serves the purpose of identifying distributed usability problems that affect the users’ entire

activity.

Where possible, the two steps above should be carried out in the users’ natural environment so

that the evaluators and users can refer to the system and other relevant tools if necessary.

The two-step analysis above will be demonstrated with an example based on a system described

by Nielsen (1990) called LYRE. The LYRE system, a French hypertext system used for

teaching poetry, allows students to analyse poetry by annotating poems using hypertext anchors.

The system is based on the French tradition of students working within a framework set up by

the teacher. The teacher adds new viewpoints to a poem and the students can then annotate the

poem based on the pre-defined viewpoints. The system does not allow students to add

viewpoints, a facility reserved only for the teachers. The French students’ activity is shown in

Figure 4.20.

Figure 4.20 The French students’ activity

Subject French student

Object Analyse poetry

Tool LYRE

Rules Students can’t add viewpoints

Division of Labour Teacher-driven

Community Teachers and

students

- 258 -

In Scandinavian countries, the focus of teaching is on increasing students’ potential to explore

and learn independently. The object of the Scandinavian students’ is independent exploration.

The division of labour between the students and teachers is less hierarchical and the focus is on

student-centred, rather than teacher driven learning. The rules reflect this division of labour

because students are encouraged to analyse poetry from any perspective they wish. This means

that they are allowed to define their own viewpoints. The Scandinavian students’ activity is

shown in Figure 4.21.

Figure 4.21 The Scandinavian students’ activity

Had the French LYRE system been implemented in Scandinavia, its use would have resulted in

a series of breakdowns caused by primary contradictions within the system itself (students not

being able to access the “viewpoints” menu), secondary contradictions between the tool

(LYRE), the subjects (the students), the object (to explore and learn independently), the rules

(students are not allowed to add their own viewpoints) and the division of labour (between the

teacher and the students). Tertiary contradictions would also have occurred between the object

of the French teaching activity (to analyse poetry) and the object of a culturally more advanced

Scandinavian activity (independent student exploration and learning). These contradictions are

shown in Figure 4.22. If DUEM had been applied to evaluate the LYRE system, the questions in

Phase 2 would have provided an insight into the users’ activities (as shown in Figures 4.20 and

Subject Scandinavian student

Object Independent exploration

Tool LYRE

Rules Students can add own viewpoints

Division of Labour Student-centred learning

Community Teachers and

students

- 259 -

4.21) and this information would have been used in Phase 4 to analyse the breakdowns in the

Scandinavian students’ activity, resulting in the identification of a series of distributed usability

problems caused by the contradictions listed above.

Figure 4.22 Contradictions in the Scandinavian students’ activity

By using DUEM to identify distributed usability problems across the entire activity system, it is

possible to determine how well LYRE supports the users’ activity of poetry analysis, instead of

focusing only on localised usability attributes at the LYRE interface such as ease of use,

learnability, memorability, etc.

In the third and final step of Phase 4, users are consulted on how breakdowns can be avoided

and contradictions resolved so that a solution for addressing the identified distributed usability

problems can be found. This enables evaluators to gain insights into the users’ own opinions by

discussing the causes of the breakdowns (i.e. the contradictions) and the means for resolving

them. Evaluators and users, therefore, jointly identify the distributed usability problems and

propose solutions to address them. This approach increases the likelihood that the users will

accept the proposed solution because they are involved in designing it. The approach also

Subject Scandinavian student

Object Independent exploration

Tool LYRE system

Rules Students can add own viewpoints

Division of Labour Student-centred learning

Community Teachers and students

French students’ activity1

2 2 2 2

3

- 260 -

engenders a sense of responsibility in the users for the design of the system itself, thus

increasing the likelihood of system acceptance (Baronas & Louis, 1998).

An evaluation, which involved users in the analysis of the data collected, will be used to

demonstrate the effectiveness of the approach described above. The evaluation involved a web-

based system of a large government organisation. The system was developed to support

investigative activities by analysts (i.e. the direct users). If an analyst wished to access personal

information about a client, he/she had to use the web-based system to submit a request seeking

permission from his/her supervisor (i.e. the indirect user) to do so. The analyst was required to

complete a request form indicating the nature of the investigation and the personal information

required. This form was then submitted to the supervisor for approval as an e-mail message. The

supervisor would receive the request through e-mail, and respond to the e-mail indicating

whether approval was granted or not.

In certain investigations, analysts required access to personal information urgently. In these

cases, the system described above was ineffective if the analysts’ supervisor was away from

his/her e-mail because the request could not be approved at short notice. To overcome this

problem, the analyst would first contact every supervisor by telephone to determine who was

available to approve the request immediately. Once such a supervisor was identified, the analyst

would complete and submit the request form. An analysis of the data collected from the

evaluation with the users revealed this problem with the web-based system.

How might we understand this situation using DUEM? The object of the analysts’ activity was

to investigate clients. One of the many actions carried out by the analysts to transform this

object into an outcome included obtaining approval to access personal information. At the

operational level, the means by which this action was executed involved using the web-based

system to complete and submit a request form. However, in urgent cases, the analysts had to

shift their focus from the investigative activity to finding an available supervisor by phone. This

- 261 -

caused a breakdown in the interaction with the web-based system because it did not support the

analysts’ activity. This breakdown was mapped to a primary contradiction in the web-based

system itself (the system did not indicate which supervisor was online and available), and

secondary contradictions between the web-based system and the object of the analysts’ activity

(the system did not support the investigative activity effectively in urgent cases), the web-based

system and the division of labour (the system did not provide a facility for overriding the chain

of command in urgent cases when no supervisors were available), and the web-based system

and the rules (the system did not allow for exceptions to the rules in urgent cases when no

supervisors were available). These contradictions are shown in Figure 4.23.

Figure 4.23 Contradictions in the investigation activity

Discussions with the supervisors (i.e. indirect users) during the analysis of the data collected

also revealed a major problem. The supervisors received the request in the form of an e-mail

message. The use of e-mail was intended to facilitate the process of granting approval because

the supervisor did not have to log in to the actual web-based system to approve a request.

However, this approach was ineffective in reality because it was difficult for supervisors to

Subject Analyst

Object Investigate clients (urgently)

Tool Web-based system

Rules Supervisor must approve access to personal information

Division of LabourHierarchical chain of command

Community Analysts and Supervisors

1

2 2 2

- 262 -

distinguish a request from potentially tens of other e-mail messages. The problem was

exacerbated in urgent cases. From the supervisors’ perspective the breakdown in the activity

occurred because the operation of reading and responding to an e-mail, was conceptualised to

the level of a conscious action of searching for the e-mail message. This was due to a primary

contradiction with the web-based system (the system did not flag the requests) and a secondary

contradiction between the web-based system and the object of the supervisors’ activity (the

system did not support the supervisors’ activity of facilitating the investigation process).

The results of the analysis presented above were generated together with the analysts and

supervisors (i.e. the direct and indirect users). The users were also asked to suggest potential

solutions to the problems identified. They provided two suggestions: implementing a function in

the web-based system that indicates to analysts which supervisor is immediately available (i.e.

online and logged in to their e-mail account), and flagging the e-mail message, containing the

request, so that supervisors were able to detect these types of requests easily and distinguish

them from other e-mail messages.

Using the Activity Theory principles of breakdowns, conceptualisation and contradictions to

analyse the data collected in the above example was beneficial because it was possible to

identify problems with the web-based system that were not located at the interface per se. While

the web-based system itself was not difficult to use, that was only the case once the analyst was

able to get around the problem of finding an available supervisor by phone. The analysis

showed, therefore, that the system did not support or mediate the analysts’ activity effectively

due to contradictions in the activity. As a result of this analysis two changes were made to the

web-based system to ensure that it supported the analysts’ and supervisors’ activities

effectively. Also, involving the analysts and supervisors in the data analysis and taking into

account their suggestions for overcoming the problems proved to be beneficial because they

were supportive of the proposed changes.

- 263 -

Before concluding the description of Phase 4, it is important to note that, although DUEM is

structured into four phases, they are not necessarily disjointed. Distributed usability and

Activity Theory principles underpin all the phases and users are involved directly at all times.

This implies that the phases are interlinked and it may not be possible to explicitly indicate

where one phase ends and another one begins. For example, evaluators and users may begin to

identify breakdowns and contradictions in the activities during Phase 2, while discussing users’

activities, as well as in Phase 3, while performing the evaluation. These breakdowns and

contradictions can later be examined in more detail during the last phase. Regardless of which

phase breakdowns and contradictions reveal themselves, the evaluators can map them and

analyse them in relation to Activity Theory principles. Therefore, the last three phases of

DUEM, in particular, should not be treated as being completely independent or rigidly

sequential. A high degree of overlap between them is possible.

Having described DUEM and how it was developed in detail above, Figure 4.24 offers a

graphical representation of the phases in DUEM. The diagram indicates What (is done?), Why

(it is done?), How (it is done?), Who (does it?), Where (it is done?) and the Outcomes for each

phase of DUEM. The ‘wavy’ line at the end of each phase indicates the aforementioned overlap

between the phases. It implies that a distinct straight line between each phase cannot be drawn

because the phases are closely interlinked. Figure 4.24 extends over two pages due to the

lengthy contents.

- 264 -

Figure 4.24 Distributed Usability Evaluation Method (DUEM)

What? Collect primary data from users about users’ activities and map data to Activity Theory principles by answering the following questions: 1. What is the central (principal) user activity (referred to as

‘the activity’ from here onwards)? 2. What is the object of the activity and what are the motives

for carrying out the activity? 3. What are the actions used to carry out the activity, and what

are the goals associated with each action? Are there any alternative actions?

4. What are the operations used to carry out the actions, and what are the conditions in which the activity takes place?

5. What are the tools used to carry out the activity? Are there any alternative tools?

6. Who are the stakeholders in the activity (community)? 7. What are the roles of the stakeholders in the activity

(division of labour)? 8. What rules govern the activity? 9. What are the outcomes of the activity and what will they be

used for? 10. What other activities are related to the activity (network of

activities)? 11. Are there any known problems in carrying out the activity

(breakdowns and contradictions)? 12. How was the activity carried out previously (historical

development)?

Why? To develop understanding of

users’ activities

Who?

How? • Interviews • Focus groups • Observation • Previous system (optional) • Documentation (optional)

Where? In users’ environment (if

possible)

OUTCOMES • Shared understanding of

users’ activities • Understanding of users’

language and terminology • Build trust between

evaluators and users

EvaluatorsUsers

Phase 1: Selecting and recruiting users

What? 1. Decide how many direct (i.e. subject) and indirect (i.e.

community) users to recruit (based on type of system being evaluated, location of users, diversity of users, size of user population, time and resources available for evaluation).

2. Decide if compensation will be offered to users who participate.

3. Recruit users (participants). 4. Re-visit decision about compensation if no compensation is

offered and number of users recruited is insufficient.

Why? To find users who will

participate in evaluation

Who?

How? Standard participant

recruitment techniques (advertisements, posters, word

of mouth, etc.)

Where? In users’ environment

OUTCOMES Group of participants

including direct and indirect users.

Evaluators

Phase 2: Understanding User Activities

- 265 -

Figure 4.24(continued) Distributed Usability Evaluation Method (DUEM)

What? Sub-phase 3.1: Defining the Evaluation Plan • Evaluation goals • How evaluation goals will be achieved (e.g.

scenario-based evaluation, free-form evaluation or controlled evaluation)

• Strategy for collecting and storing data Sub-phase 3.2: Preparing the evaluation materials and resources • User Activity Scenarios (if applicable) • Relevant materials (e.g. forms, manuals,

documents, etc.) • Relevant resources (e.g. equipment) Sub-phase 3.3: Testing the system • Based on sub-phases 3.1 and 3.2

Phase 3: Evaluating the System in relation to Users’ Activities

Why? To evaluate the system in relation to user

activities

Who?

How? • Collaborative negotiation between

evaluators and users with reference to users’ activities defined in Phase 2

• Observation • Interviews • Notes • User Activity Scenarios (optional) • Video and audio recording (optional)

Where? In users’ environment (if possible) / in

laboratory / other environment

OUTCOMES • Rich set of qualitative data about users’

interaction with the system

Phase 4: Analysing and Interpreting Results

What? Step 1: Identify breakdowns (Bødker, 1991a) in users’ interaction with the system Step 2: Map breakdowns to contradictions (Engeström, 1999) Step 3: Discuss possible solutions with users

Why? To identify breakdowns and their causes

(contradictions) and determine distributed usability problems

Who?

How? Analysis and discussion with users

Where?

In users’ environment (if possible)

OUTCOMES • Distributed usability problems • Suggestions for overcoming problems

Evaluators Users

Evaluators Users

- 266 -

4.3.6 Summary of Stage I of Research Methodology

Section 4.3 described steps 6 and 7 of Stage I of the research methodology. The development of

DUEM was discussed in this section, including a description of each phase of DUEM and the

revisions that were made to traditional usability testing in response to the limitations listed in

step 5. Steps 6 and 7 are now complete (as shown in Figure 3.8 below).

The final part of this chapter will briefly compare and contrast DUEM and traditional usability

testing for the purposes of highlighting the differences between them. The aim of this

comparison is not to evaluate DUEM or suggest that it does improve usability testing. This

evaluation will be done in Chapter 5, which describes Stage II of the research methodology

(method validation). However, in light of the differences between DUEM and traditional

usability testing, the potential benefits and limitations of DUEM will be considered briefly at

the end of this chapter.

STAGE I: Method Building Objective: To develop the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Method Theory (Goldkuhl et al, 1998) • Method engineering (Brinkkemper, 1996) Steps: 1. Break down traditional usability testing into method components 2. Examine traditional usability testing method components 3. Examine method framework, perspective and co-operation forms 4. Apply traditional usability testing in evaluation project 5. Determine limitations associated with traditional usability

testing based on application

6. Revise and re-develop method components based on notion of distributed usability and Activity Theory principles (as described in Table 2.1)

7. Integrate method components into DUEM

Figure 3.8 Research Methodology (replicated)

o Distributed Usability

o Activity Theory principles

Informed by:

- 267 -

4.4 DUEM Vs. Traditional Usability Testing

To formally compare and contrast DUEM with traditional usability testing, Goldkuhl et al’s

(1998) Method Theory will be applied to analyse the method components, framework,

perspective and co-operation forms of DUEM. The results of this analysis will then be

compared to the results of analysing traditional usability testing using Method Theory from

Section 4.2.

DUEM can be logically decomposed into four method components, each component mapping

on to one of the four phases that make up DUEM. These four method components are closely

interlinked through distributed usability and Activity Theory principles. The issues discussed at

different phases of DUEM (i.e. the concepts), the questions asked in relation to these issues (i.e.

the procedures) and the form in which the answers to these questions are recorded (i.e. the

notation) are all related. For example, Activity Theory principles are used in Phase 2 to develop

an understanding of users’ activities. The data is collected using a set of twelve questions. This

data is then applied in Phase 3 to define the evaluation plan and test the system. Finally, in

Phase 3 the results of the testing are analysed in relation to the users’ activities defined in Phase

2 to determine breakdowns and contradictions (i.e. distributed usability problems) in the activity

caused by the system. The method components that make up DUEM are shown in Table 4.6.

The procedures, concepts and notation for each method component are also defined.

- 268 -

Table 4.6 Distributed Usability Evaluation (DUEM) method components

Method Component

Concepts (What to talk about?)

Procedures (What questions to ask?)

Notation (How to express answers?)

Selecting and recruiting users

• Direct and indirect users

• User compensation

• Who are the direct users and the indirect users of the system (based on the central activity that the system mediates)?

• Will users be compensated? If so, what form of compensation will be offered?

• Activity system (Engeström, 1987)

Understanding users’ activities

• Activities • Object and motives • Actions and goals • Operations and

conditions • Tools • Stakeholders

(community) • Division of labour • Activity rules • Outcomes • Related activities • Breakdowns and

contradictions • Historical development

1. What is the central (principal) user activity (referred to as ‘the activity’ from here onwards)?

2. What is the object of the activity and what are the motives for carrying out the activity?

3. What are the actions used to carry out the activity, and what are the goals associated with each action? Are there any alternative actions?

4. What are the operations used to carry out the actions, and what are the conditions in which the activity takes place?

5. What are the tools used to carry out the activity? Are there any alternative tools? 6. Who are the stakeholders in the activity (community)? 7. What are the roles of the stakeholders in the activity (division of labour)? 8. What rules govern the activity? 9. What are the outcomes of the activity and what will they be used for? 10. What other activities are related to the activity (network of activities)? 11. Are there any known problems in carrying out the activity (breakdowns and

contradictions)? 12. How was the activity carried out previously (historical development)?

Data collected mapped to Activity Theory (AT) principles, namely: • Hierarchical structure of

activities (Leont’ev, 1981) • Activity system (Engeström,

1987)

Evaluating the system in relation to users’ activities

• Evaluation plan • Evaluation goals • Evaluation means • Evaluation materials

and resources

• What are the goals of the evaluation? • How will the evaluation goals be achieved? • What materials and resources are required for the evaluation? • How will data be collected and stored for analysis?

• Evaluation plan • Notes (mapped to AT

principles) • User activity scenarios based

on Phase 2 data (optional) • Recording (optional)

Analysing and interpreting results

• Breakdowns • Contradictions • Distributed usability

problems • Solutions

• What breakdowns were observed in the users’ interaction with the system? • What contradictions in the activity system have caused a breakdown? (distributed

usability problems) • How can breakdowns be avoided and contradictions resolved?

• Breakdowns (Bødker, 1991) in hierarchical structure of activities (Leont’ev, 1981)

• Contradictions in activities (Engeström, 1999)

- 269 -

Table 4.1 (on page 171) shows the five method components that make up traditional usability

testing, while Table 4.6 indicates the four method components that DUEM consists of. The two

tables differ in several aspects, including the following:

1. The “Select and recruit representative users” method component from traditional

usability testing has been placed at the very beginning of DUEM, before any other

activities take place. This is consistent with the underlying philosophy of DUEM that

users’ be involved in every phase of the evaluation.

2. The “Define usability testing plan” and “Prepare test materials” method components

from traditional usability testing have been integrated as sub-phases (3.1 ad 3.2) into the

“Evaluating the system in relation to users’ activities” method component in DUEM.

3. “Define the evaluation plan” in DUEM does not involve all of the different concepts

that the “Define usability testing plan” method component in traditional usability testing

involves (e.g. list of user tasks, test environment, test equipment, facilitator role,

evaluation measures) because those will depend on what the evaluators and users

negotiate collaboratively. The concepts which the “Prepare test materials” method

component in traditional usability testing refers to (e.g. participant screening,

participant orientation, participant tasks, participant training) are omitted for the same

reason.

4. A new method component titled “Understanding users’ activities” has been introduced,

with a set of concepts, procedures and notations that are based on Activity Theory

principles. This method component is conceptually similar to the concept of “List of

user tasks” in the “Define usability testing plan” method component of traditional

usability testing. However, in DUEM this concept has been expanded into a method

component.

5. Instead of referring to usability problems and recommendations in the “Analysing and

interpreting results” method component, DUEM refers to breakdowns and

- 270 -

contradictions (distributed usability problems), and solutions which are discussed with

users, rather than being recommended by evaluators.

Whereas traditional usability testing suffers from a fragmented framework, the underlying

notion of distributed usability and Activity Theory principles are intended to provide a single

underlying framework for DUEM. This framework reveals itself in the direct relationships

between the method components, and the concepts, procedures and notation within each

component. Unlike traditional usability testing, which doesn’t have a coherent unifying

perspective, Activity Theory is intended to provide a unifying perspective or theoretical basis

for DUEM. As the perspective on which DUEM is based, Activity Theory reveals what is

important and what the evaluation must focus on - users’ activities. In assessing the differences

between the co-operation forms, DUEM advocates that evaluators and users jointly ask and

answer the questions, in contrast to traditional usability testing where only evaluators ask and

answer the questions. Figure 4.2 showed the results of an analysis of traditional usability testing

based on Goldkuhl et al’s (1998) Method Theory. Figure 4.24 shows the results of the same

analysis for DUEM.

- 271 -

Figure 4.24 Analysis of DUEM (based on Goldkuhl et al, 1998)

Having described DUEM and how it was developed to overcome the limitations of traditional

usability testing (Section 4.2), and compared and contrasted DUEM with traditional usability

testing using Method Theory (Section 4.3), it is now appropriate to consider the key differences

between traditional usability testing and DUEM. These are shown in Table 4.7. Table 4.7 is

based on the limitations of traditional usability testing which DUEM is intended to overcome

and the Method Theory comparison of the two methods described above.

- 272 -

Table 4.7 Key differences between traditional usability testing and DUEM

Traditional Usability Testing Distributed Usability Evaluation Method (DUEM)

Number of method components

Five Four

Concepts, Procedures and

Notation

Diverse Based on Activity Theory principles

Theoretical basis (framework &

perspective)

Fragments of scientific experiments

Activity Theory

Co-operation forms Evaluators Evaluators and users Focus of evaluation System User activities

View of users Controlled by evaluators Collaborative participants View of usability Localised at interface Distributed across activity system

Scenarios Defined by evaluators’ Defined based on users’ activities Evaluation location Usability laboratory Negotiated collaboratively

Data collected Mainly quantitative and some qualitative

Qualitative

Evaluation techniques used

Questionnaires Task scenarios

Short interviews

Interviews Focus groups Observation

User activity scenarios (optional) Definition of

usability problem No standard definition Distributed usability problems:

Breakdowns in users’ interaction caused by contradictions in

activity system

4.5 Potential Benefits and Limitations of DUEM

The final part of this chapter will consider some potential benefits and limitations of DUEM. It

should be noted that these benefits and limitations are solely based on the expected performance

of DUEM in practice. The benefits and limitations identified here will be confirmed and/or

refuted in Chapter 5, following the validation of DUEM.

4.5.1 Potential Benefits of DUEM

DUEM is intended to overcome the limitations of traditional usability testing and provide

evaluators with a flexible and practical usability evaluation method that involves users from the

- 273 -

outset. Its flexibility is manifested in the lack of prescriptive rules. Instead, evaluators are

provided with a set of guidelines that include means for identifying direct and indirect users,

issues to consider when deciding how many users to involve, questions to ask in understanding

users’ activities, and three options to consider in selecting the means to carry out the evaluation.

This level of flexibility is intended to allow DUEM to be used in any type of evaluation project,

regardless of the project scope, and the time and resources available. By providing evaluators

with this flexibility they can benefit from having a useful UEM which is also practical because

it can be adjusted to suit their particular needs and circumstances.

DUEM also aims to provide a unifying framework that underpins every phase of the evaluation.

This is achieved by integrating Activity Theory principles into DUEM to: understand users’

activities, evaluate the system in relation to those activities, and identify distributed usability

problems. Distributed usability has also been integrated into DUEM. This notion extends the

traditional view of a system’s usability to include the system’s usefulness in relation to users’

activities. In DUEM, the system is viewed as one of many tools that users’ utilise in their

activities and is evaluated as such. The purpose of doing this was to overcome a major

limitation of traditional usability testing – the focus on the system. The difference between how

the system is viewed in traditional usability testing and how it is viewed in DUEM is shown in

Figure 4.26.

- 274 -

Figure 4.26 System view differences between traditional usability testing and DUEM

Figure 4.26 highlights a key distinction between traditional usability testing and DUEM, namely

the way the system is viewed. While traditional usability testing is more system-focused and

does not evaluate the system in any particular context, DUEM situates the system in the context

of users’ activities as a tool, and evaluates it from this perspective. This distinction draws

attention to a potential benefit of DUEM, which is the evaluation of systems in the context of

users’ activities, thus providing an indication of a system’s usability and usefulness (i.e.

distributed usability).

The strong user involvement and the focus on users’ activities in DUEM are intended to give

users a sense of ownership and control in the evaluation, which in turn should have a positive

effect on their motives for participating in the evaluation and lead to useful outcomes. Unlike

usability testing, these outcomes are not recommendations made by the evaluators, but solutions

proposed by the evaluators and users jointly.

The potential benefits described above will be considered again in Chapter 5 after data from the

validation of DUEM is available to confirm and/or refute them.

Traditional Usability Testing System as main focus of evaluation

Subject

Object

Rules Division of Labour

Community

Tool

Users Users

DUEM System as a tool in users’ activity

- 275 -

4.5.2 Potential Limitations of DUEM

Two sources of DUEM’s potential benefits may also be the sources of its key limitations. These

include user involvement and Activity Theory principles.

Although user involvement is the cornerstone of user-centred design, DUEM requires that users

be involved at all phases of the evaluation. Compared to traditional usability testing, this

increases the workload of users significantly. Kanalakis (2003) argues that too much user

involvement can put users off. Furthermore, Alter (1999) reports that too much user

involvement can be a hindrance in an information systems design and development project,

while Göransson (2004) argues that the involvement of users does not guarantee system

success. Therefore, one potential limitation of DUEM is the high level of user involvement

which creates its own set of problems.

Activity Theory has been described as satisfying a fundamental need for a theory of practice in

HCI (Nardi, 1996a). The use of Activity Theory, as an underlying theoretical framework for

DUEM, stems from the benefits that this theory offers (as described in Section 2.12.4).

However, as Nardi (1996a) admits, Activity Theory terminology can be daunting and confusing,

particularly for those who are not trained in this theoretical framework and those encountering it

for the first time. The use of Activity Theory principles in DUEM implies the need for such

training, for both evaluators and users. This is the second potential limitation of DUEM that

might need to be addressed if evaluators and users find it difficult to understand and

operationalise Activity Theory principles.

- 276 -

4.6 Conclusion

The purpose of this chapter was to describe how the Distributed Usability Evaluation Method

(DUEM) was built. The chapter was structured to follow the steps described in Stage I of the

research methodology as shown in Figure 3.8 (replicated at the beginning of this chapter). To

demonstrate that DUEM represents an improvement of the traditional usability testing method,

it was necessary to begin by analysing traditional usability testing in order to identify its

limitations. An initial analysis was carried out based on Goldkuhl et al’s (1998) Method Theory

(Steps 1 to 3 of the research methodology). This involved decomposing traditional usability

testing into its method components and commenting on the concepts, procedures and notation

of each component. Traditional usability testing was decomposed into five logical method

components and their associated concepts, procedures and notation. The analysis showed that

the users’ goals and motives were omitted and that the notation was inconsistent from one

method component to the next. A review of the framework, perspective and co-operation

forms of traditional usability testing also revealed several deficiencies. The traditional usability

testing method suffers from a fragmented framework, a lack of a unifying theoretical

perspective, and its co-operation forms are evaluator-driven because the evaluators both ask and

answer the questions in the evaluation.

Following this desk review of traditional usability testing, it was applied in an actual evaluation

project to derive an empirical list of limitations that need to be addressed in order to improve

traditional usability testing (Step 4 of the research methodology). The project involved the

evaluation of a university’s web site. Based on the data collected, a total of twenty-two

limitations of traditional usability testing were identified and grouped into three categories (Step

5 of the research methodology). Six limitations were user-related, fourteen were process-

related and two were facilitator-related. The data collected through the Method Theory desk

review and the application of the method in practice, was used to develop DUEM (Steps 6 and 7

of the research methodology). To develop DUEM, the method components of traditional

- 277 -

usability testing were revised and re-developed to incorporate distributed usability and Activity

Theory principles, and address the limitations of traditional usability testing. As a result of this

multifaceted process, DUEM emerged as a four-phased evaluation method, consisting of the

following phases

1. Selecting and recruiting users;

2. Understanding users’ activities;

3. Evaluating the system in relation to users’ activities;

4. Analysing and interpreting the results.

Each of these phases involves users and is underpinned by distributed usability and Activity

Theory principles.

In Phase 1 of DUEM, both direct and indirect users are recruited to participate in the evaluation.

During Phase 2, a team consisting of evaluators and users jointly define the users’ activities

using the activity system notation (as per Engeström, 1987) and the hierarchical structure

notation (as per Leont’ev, 1978). The system being evaluated and its functions are not the focus

of the discussion during this phase. Instead, the focus is on identifying typical users’ activities

and how they are carried out by users using the system and any other tools. The outcome of

Phase 2 is a detailed shared understanding of what users do and how they do it.

In Phase 3, the evaluators and users once again work jointly and iteratively to design and

develop a set of evaluation goals based on the users’ activities identified during Phase 2. The

evaluation goals are not system specific. They are defined in relation to users’ activities. This is

in contrast to traditional usability testing which uses the system as the starting point for

developing evaluation goals. Based on these goals, a means of conducting the evaluation is

collaboratively negotiated between the evaluators and the users, and implemented. The outcome

of Phase 3 is rich qualitative data about the users’ interaction with the system in the form of

written notes based on observations and discussions, and/or video and audio recordings of the

interaction.

- 278 -

The data collected during Phase 3 is analysed in Phase 4 using Activity Theory principles. The

data are used to identify breakdowns in the interaction and map these breakdowns to

contradictions in the activity system. The mapping of breakdowns and contradictions defines the

usability problems, which are distributed across the activity network, rather than being found

only at the system interface. Subsequently, they have been termed distributed usability

problems. Finally, the evaluators and users jointly discuss possible solutions to the usability

problems identified. A graphical representation of DUEM is shown in Figure 4.24.

Following a description of DUEM and how it was developed, the method was compared to

traditional usability testing in order to demonstrate the differences between the two methods.

Once again, Goldkuhl et al’s (1998) Method Theory was used to analyse DUEM and contrast it

to traditional usability testing. The results indicate that DUEM differs from traditional usability

testing in the number and sequence of the method components, the existence of an underlying

framework, an Activity Theory perspective, and collaborative co-operation forms involving

evaluators and users. A summary of the key differences between DUEM and traditional

usability testing was also shown in Table 4.7. Finally, based on the highlighted differences

between the two methods, the potential benefits of DUEM were presented. Two potential

limitations were also raised. However, it is not possible to confirm or refute these benefits and

limitations without evidence. The following chapter is concerned with collecting such evidence

to validate DUEM.

- 279 -

Chapter 5

Stage II: Method Validation

5.1 Introduction

Chapter 4 described the building and development of the Distributed Usability Evaluation

Method (DUEM). It is intended to be a flexible, yet structured usability evaluation method with

a strong focus on users and their activities. The purpose of this chapter is to describe the

validation of DUEM. The validation of DUEM will be carried out by applying DUEM in

practice in the evaluation of a web site. The application process will be documented and the

empirical data collected used as evidence to confirm or refute the claims made about DUEM in

Table 3.6. The chapter is structured to follow the steps of Stage II (Method Validation) of the

research methodology described in Chapter 3 and replicated in Figure 3.8 below.

Figure 3.8 Research Methodology (replicated)

Objective: To validate (evaluate) the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Claims (Mwanza, 2002) Steps: 1. Apply DUEM in practice 2. Document application of DUEM 3. Address DUEM claims using evidence from application to

confirm or refute each claim

STAGE II: Method Validation

- 280 -

The term validation in this chapter refers to the evaluation of DUEM. However, in order to

avoid any confusion that may arise from the using the term evaluation (DUEM is an evaluation

method and this chapter aims to ‘evaluate the evaluation’ method), the term validation will be

used to refer to the evaluation of DUEM. The validation of DUEM will be carried out as an

“ideational” evaluation (Iivari, 2003) of the method. Iivari (2003) suggests that a full empirical

evaluation of a method can be postponed in favour of “ideational” evaluation, which is

sufficient to demonstrate that the method includes novel ideas and addresses significant

theoretical or practical problems with existing methods. Rather than proving the correctness of a

method, the aim of “ideational” evaluation is to determine the general usefulness of a method.

Therefore, the primary objective of the validation process described in this chapter is to

demonstrate that DUEM contains novel ideas (i.e. distributed usability and Activity Theory as

the underlying theoretical framework) and addresses significant theoretical or practical

problems with existing methods (i.e. the limitations of traditional usability testing), making it a

generally useful artefact for HCI practitioners. The use of the term “novel” to describe the

theoretical basis of DUEM is not intended to imply a previously unknown idea, but rather the

application of an existing theory, which has been widely used in other domains, to a new

domain – that of usability evaluation. The key question that this chapter aims to answer is:

“Does DUEM include novel ideas and overcome the limitations of traditional usability

testing?”. If, as a result of the DUEM validation process, the answer to this question is a “Yes”,

then DUEM represents an improvement of traditional usability testing.

To validate DUEM, the method was applied in the evaluation of the same University web site

described in Section 4.2.2. This system was selected for several reasons. By applying DUEM to

the same system that was evaluated using traditional usability testing, it would be possible to

draw comparisons between the processes and outcomes of the two UEMs. It has been stated

previously that comparing UEMs is fraught with difficulties due to the lack of precise and stable

comparison criteria and measures (as described in Section 3.6.2.1). Therefore, the purpose of the

- 281 -

comparison is mainly to highlight differences between the two methods, rather than make

formal and proven assertions about the superiority of one method over the other. Furthermore,

the methods will only be compared when discussing the claims about DUEM in the final part of

this chapter. In order to confirm or refute a DUEM claim, the results of the DUEM application

(in this chapter) will be discussed in relation to the results of the traditional usability testing

application (from Chapter 3). For example, to confirm or refute the claim that “DUEM provides

a means of understanding users’ goals and motives”, the empirical data collected using the

DUEM method will be discussed and compared to the empirical data collected from traditional

usability testing to determine whether DUEM actually addresses the issues of users’ goals and

motives, or whether these issues remain the same in both UEMs.

Other reasons for selecting the same web site include the ease of access to direct and indirect

users of the web site, and the author’s familiarity with the systems, both of which are important

pre-requisites in the DUEM validation process because it is possible to focus on the application

and validation of DUEM, rather than divert attention to logistical issues such as participant

recruitment and learning more about the interface. In a real-life evaluation project, these issues

are critical (and will be referred to in the subsequent discussion), however, for the purposes of

this thesis, they serve to facilitate the main objective at hand – the validation of DUEM.

At the outset it should be noted that the validation of DUEM was not carried out in a real-life

evaluation project. A real-life evaluation project usually involves a team of evaluators working

together to investigate the usability of a specific interface with the ultimate aim of actually

improving it. The outcomes of the DUEM application described in this chapter do not serve this

purpose. The outcomes of the DUEM application are not intended to improve the usability of

the University’s web site. Instead, the outcomes will be chiefly used to validate DUEM and

determine whether it represents an improvement over traditional usability testing. The outcomes

of the evaluation will not be presented to the University’s management, nor will they be

reported anywhere without reference to their purpose described in this thesis.

- 282 -

This chapter begins with a description of the application of DUEM to evaluate the web site of

an Australian university. The DUEM evaluation of the web site followed the phases and sub-

phases of DUEM described in the previous chapter. For a variety of reasons, it was not always

possible to follow these phases and sub-phases completely and accurately. For example, since

the DUEM evaluation described here was not a real-life evaluation, it was not possible to work

with a team of evaluators. Instead the author worked alone with the users. Where these types of

situations arose, they will be highlighted in the description of the DUEM application and their

causes and consequences will be discussed. The implications of these situations on the

validation of DUEM and the outcomes of the evaluation will also be examined.

Following a description of the DUEM application, the claims made about DUEM in Section

3.6.2.2 (and listed in Table 3.6) will be addressed. Empirical data collected from the application

of DUEM will be used as evidence to confirm or refute the eight claims. Each claim is

associated with a list of questions that must be answered in order to determine whether the

claim about DUEM is supported. Since the DUEM claims are based on the eight UEM

challenges identified in Chapter 2, the answers to the questions will also indicate if DUEM

provides a means for overcoming these challenges. Finally, the evidence provided to confirm or

refute the DUEM claims will serve to demonstrate the specific contributions that DUEM makes

to usability evaluation practices.

The chapter concludes with a discussion of the actual benefits and limitations of DUEM based

on the results of the DUEM validation process and the examination of the DUEM claims.

5.2 The Application of DUEM

This section describes Steps 1 and 2 of Stage II of the research methodology as shown in Figure

3.8.

- 283 -

STAGE II: Method Validation

Objective: To validate (evaluate) the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Claims (Mwanza, 2002) Steps: 1. Apply DUEM in practice 2. Document application of DUEM 3. Address DUEM claims (from Chapter 3) using evidence from

application to confirm or refute each claim

Figure 3.8 Research Methodology (replicated)

The two steps have been amalgamated because they are directly related. As DUEM was applied

in practice, the application was documented at the same time.

DUEM was described in detail in the previous chapter (Section 4.3). The four phases of DUEM

were applied in practice in the evaluation of the same University web site used in Chapter 4. At

the time that DUEM was applied, the project described in Section 4.2.2 had ended and the new

University web site had been officially launched. The launched web site differed only slightly in

design to the version which was used in the traditional usability testing evaluation. Only the top

level pages of the web site had been completed, with most lower level pages and web-based

systems (such as the online student enrolment system) still under development. The top level

and lower level pages of the web site are shown in Figure 5.1 below.

- 284 -

Figure 5.1 Top level and lower level pages on the University’s web site

Discussions with the Project Team (involved in the traditional usability testing evaluation)

indicated that, although the top level pages had been launched, minor revisions were being made

to these pages as the need arose and feedback from the university community was received

(Personal communication, 2003). For example, new links were added, modified or removed as

required. The nature of these changes was usually simple and straightforward. Almost all of the

changes were done by one or two of the designers and required minimal effort. However, some

changes required the Project Team to consider alternative designs before implementing them.

Due to these changes, the web site was still considered to be “under development” and

“continuing review” (Personal communication, 2003). As a result the DUEM evaluation of the

web site was a formative, rather than a summative, evaluation. The following sections will

describe how this formative evaluation was undertaken through the four phases of DUEM.

University’s Home Page

Current Students Page

Prospective Students Page Staff Page Etc.

Subject Timetable

Online Enrollment

System Etc.

Top Level Pages

Lower Level Pages

- 285 -

5.2.1 DUEM Phase 1: Selecting and Recruiting Users

The first phase of DUEM involves selecting and recruiting users to take part in the evaluation.

Both direct and indirect users of the University’s web site were sought for participation. Due to

the large number of diverse users located both inside and outside the University (as described in

Section 4.2.3.2), it was not possible to evaluate the web site for every category of users. In order

to maintain consistency for the purposes of comparing DUEM to traditional usability testing,

only students currently enrolled at the University were involved in the evaluation. The web site

was evaluated from their perspective, rather than users who were University staff members or

external parties.

In order to identify the direct users and indirect users of the web site, the students’ central or key

activity must be examined. As mentioned previously, the direct users correspond to the subject

in an activity system (as defined by Engeström, 1987) and the indirect users are represented by

the community in the activity. A student (the subject) performs a variety of activities during the

time he/she spends at the University, however the object of every student’s key or central

activity is study. Transforming this shared object into an outcome over an extended period of

time is what motivates the existence of a student’s activity. One of the many tools that mediate

this activity is the University’s web site. Using the web site, a student is able to find information

about study and degree requirements, subjects and academic policies and procedures, as well as

manage his/her enrolment. Other tools used by a student include course guides, handbooks,

subject outlines, manuals, policy documents, forms, etc. The activity is governed by a set of

academic rules which a student must abide by. The rules include regulations about the number

of subjects and/or credit points a student must complete, the grades a student must achieve to

progress satisfactorily, etc. In the activity system (Engeström, 1987), the community consists of

individuals or stakeholders who share the subject’s object. In the student’s activity, the

community consists of stakeholders who share the subject’s object of studying. Therefore, the

community consists of other students, including full-time and part-time students, coursework

- 286 -

and research students, undergraduate and postgraduate students, domestic and international

students, and school-leavers and mature age students. Other individuals such as academic and

administrative staff members at the University may also be viewed as belonging to the student’s

community. Although the object of the staff members’ activity is not study itself, they share the

subjects’ object in that they facilitate the students’ study. For example, administrative staff

members at the University facilitate the student’s object by providing support functions to

enable the student to enrol in subjects, complete assessments, carry out research, etc. Therefore,

in this activity, other students and staff members at the University constitute the community.

There exists a distinct vertical and horizontal division of labour in the community, with both

students and staff members having specific roles and duties in the activity. These roles are

defined by the University’s organisational structure. The activity described above is shown in

Figure 5.2.

Figure 5.2 A current student’s central activity

In comparison to traditional usability testing, DUEM does not make distinctions between

different types of students such as undergraduate and postgraduate students, domestic and

international students, coursework and research students, etc. Nor does it make assumptions

Subject Student

Object Study

Tools Course guides, handbooks, manuals,

policy documents, web site, forms, etc.

Rules Academic rules, policies and procedures

Division of Labour University’s organisational structure

Community Students &

Staff members

OUTCOME Graduation

- 287 -

about the academic needs of these students for the purposes of deriving student categories such

as the ones shown in Figure 4.6. Instead, all students are understood from the perspective of

their activities, the tools that mediate those activities, the motives of the activities (object) and

the social context in which those activities take place (as represented by the community, rules

and division of labour). By viewing students from the perspective of their real-life activities and

by using a structured theoretical framework such as Activity Theory to do so, evaluators are in

the position to evaluate the distributed usability of the University’s web site in relation to the

users’ activities.

Traditional recruitment techniques were used to invite both direct and indirect users to

participate in the evaluation. Students were primarily recruited using large posters describing

the project, all of which were placed in the Student Services Centre and on noticeboards around

campus (for sample poster refer to Appendix K). Due to ethical considerations, students who

were enrolled in a subject co-ordinated by the author were excluded from the evaluation in order

to avoid any potential dependency situations. Indirect users who were staff members were

recruited via personal contact and mainly over the telephone. This approach was considered to

be the most appropriate because it was more personal than e-mail and provided staff members

with the opportunity to ask questions and inform themselves about the project before making a

commitment. This was important because by agreeing to participate, staff members would be

adding to their daily workload. No form of compensation was offered to any of the users in

order to reduce the effects of providing a reward on the users’ motives.

Following the initial recruitment drive, only two students responded within a period of two

weeks. Although DUEM does not prescribe a specific number of users required, two users were

considered to be insufficient in a University with more than fifteen thousand students. None of

the staff members contacted were prepared to take part in the evaluation, primarily because of

the time commitment required. Unlike traditional usability testing, where participants complete

questionnaires and a set of task scenarios in a relatively short period of time in a usability

- 288 -

laboratory, DUEM requires a longer time commitment from users because it involves them at

every stage of the evaluation. All of the staff members contacted declined to participate because

they were unable to dedicate the time to the project. Furthermore, due to unavoidable

circumstances, the timing of the evaluation coincided with the end of the academic year, which

is traditionally a busy time for both staff members and students.

In order to recruit more users, a decision was made to offer a $20 gift voucher for participating

in the evaluation. Although the same amount was used in the traditional usability testing

evaluation, the extent of the users’ involvement in the DUEM evaluation was greater, making

the amount disproportionate to the level of commitment. The recruitment poster was modified

to reflect the decision to offer a gift voucher and the previous recruitment posters were replaced

with the modified one. Also, a subset of the staff members contacted previously was contacted

once more. To maintain equity amongst the participants, it was also decided that the two

students who had already volunteered would be given a $20 gift voucher.

The second recruitment drive, which lasted three weeks, resulted in an additional nineteen

students agreeing to participate in the evaluation. Table 5.1 provides a summary of the student

participants and their backgrounds.

- 289 -

Table 5.1 Summary of student participants and their backgrounds

Background Background 1 Master of Business Administration

student from the Philippines 12 Bachelor of Psychology/ Bachelor of

Commerce (double degree) student from Australia (fifth year at University)

2 Civil Engineering PhD student from Thailand

13 Master of Computer Engineering student from China

3 Bachelor of Computer Science student from Malaysia

14 Bachelor of Commerce (Marketing and Management) student from Australia

4 Master of Professional Accounting student from China

15 Bachelor of Electrical Engineering student from Australia (currently part-time)

5 Exchange student from Norway studying Bachelor of Information Technology (in Australia for six months)

16 Master of Electrical Engineering (Research) student from China; Transferred from Master by coursework degree

6 Exchange student from USA studying Education (in Australia for six months)

17 Public Health PhD student from China; also completed Master in Public Health at the University

7 Exchange student from USA studying Finance (in Australia for six months)

18 Electrical Engineering PhD student from China; also completed Master (Research) degree at the University

8 Bachelor of Arts (Information Studies) student from Australia

19 Bachelor of Science/ Bachelor of Commerce (double degree) student from Australia (fifth year at University)

9 Bachelor of Software Engineering student from Australia

20 Master of Business Administration student from Australia

10 Bachelor of Information and Communication Technology (ICT) student from Australia

21 Bachelor of Accounting student from Australia

11 Environmental Engineering PhD student from Thailand

The table shows that the participant group was made up of a diverse set of students from almost

every faculty at the University. There was also a mix of undergraduate and postgraduate

students, and coursework and research students. Most of the students enrolled in a research

degree were mature age students (according to the University’s definition of mature age

students). In addition, twelve of the students were international students. However, some of

these students had been studying in Australia for extended periods of time, progressing from an

undergraduate to a postgraduate, or from a coursework to a research degree. The participants

were considered to be representative of the student community at the University and to be able

to provide a detailed insight into the activities that typical students at the University engaged in.

- 290 -

The second recruitment drive was unsuccessful in attracting any staff members to participate.

This was a source of concern because staff members are key members of the students’

community, and it was important to involve them in the evaluation. Therefore, rather than

disregarding staff members completely, a decision was made to involve them on a smaller scale

through interviews at certain stages of the evaluation. This would eliminate the need for staff

members being involved over a prolonged period of time. It was thought to be an appropriate

compromise and in keeping with the participatory values of DUEM. The proposition was made

to a number of staff members and three agreed to participate by being interviewed at later stages

in the evaluation. Two of the staff members were academics from two of the University’s

faculties, and one was a staff member providing administrative support in the Student Services

Centre.

Having recruited a core group of twenty one student participants representing both direct and

indirect users, and a peripheral group of three staff members representing indirect users (who

participated only during certain stages of the evaluation), the formal evaluation process was

ready to begin. The previous discussion has highlighted a number of issues during Phase 1 of

DUEM that need to be addressed, however, these will be analysed and discussed in more detail

later, after the application of DUEM has been described.

5.2.2 DUEM Phase 2: Understanding Users’ Activities

The purpose of Phase 2 is to gain an in-depth understanding of the users’ activities and the

context in which they perform those activities. The system being evaluated is secondary to this

purpose and does not have a central role during this phase. Focus groups, interviews and

observation are used to collect data about users’ activities. A set of twelve questions based on

Activity Theory principles is used to guide the data collection (refer to Section 4.3.3). The

questions can be adapted to reflect the activities in a particular setting. In the current evaluation,

the questions were adapted to reflect student activities in a university setting. Table 5.2 shows

- 291 -

the adapted questions, based on the original set. Once the activity was known (i.e. once the first

question was answered), subsequent references to the activity in the other questions were

replaced with the actual activity named by the student (shown as [italics] in Table 5.2).

Table 5.2 Adapted questions for Phase 2 used to collect data about users’ activities

No Question 1 What is an activity that you, as a student at the university, are involved in? 2 Why do you do [the activity]? What motivates you to be involved in [the activity]? 3 What are the goals of your [activity] and how are they achieved? How is your [activity]

carried out? Are there any other ways of doing [the activity]? 4 What are the necessary conditions for your [activity] to take place? 5 What tools do you use to carry out your [activity]? Are there any other tools? 6 Who else is involved in your [activity]? 7 What are the roles of the different people involved in your [activity]? 8 What are the rules that affect the way your [activity] is carried out? 9 What is the outcome of your [activity]? What is it used for?

10 Are there any other activities which are directly related to your [activity]? 11 Do you have any problems carrying out the [activity]? 12 Have you done this [activity] previously? Has it changed? If so, how?

As indicated in Section 4.3.3, these questions are not intended to produce discrete responses

from the participants. They are used as a guide in Phase 2 of DUEM to understand typical

activities that students are involved in at the University, and they can be adapted during the

interview, depending on the responses. The questions shown in Table 5.3 address two basic

issues: “What does a student do at university?” and “How does a student do these things?”. The

system is not referred to specifically in any of the questions. The primary aim is to create a

shared understanding of typical student activities, so that the role of the web site can be

evaluated in relation to these activities.

The use of the open-ended, adaptable questions shown in Table 5.2 as a guide during the

interview process, implies a general discussion rather than a formal interview with the student

participants. This is important for two reasons. Firstly, by keeping the questions broad and

general in nature, without addressing specific issues or using explicit terminology, the students

are able to express and discuss their responses in terms that they are familiar with. Secondly, a

- 292 -

formal interview, where the roles of the interviewer and the interviewee are clearly delineated,

does not facilitate relationship-building between the evaluator and the students. DUEM is a

collaborative evaluation method which relies on evaluators and users working jointly in a co-

operative fashion. To make this possible, situations such as formal interviews were avoided.

Therefore, although the term interview is used to describe the data collection process, it does not

denote a formal interview, but a joint discussion between the evaluator and the students.

The responses to some of the questions in Table 5.2 indicate the existence of an artifact (tool)

that a student may make use of. For example, in answering the fifth question (“What tools do

you use to carry out your [activity]? Are there any other tools?”), participants may refer to the

web site, textbooks, notes, forms, manuals, and other tools. DUEM advocates that if such tools

exist, it is useful to examine them during Phase 2 to facilitate an in-depth understanding of the

users’ activities. In order to do this, it is necessary to have access to these tools, which are often

situated in the users’ natural environment.

In the University web site evaluation, the location of the interviews posed a minor problem.

Although students are present at the University on a daily basis and use the facilities and

classrooms, they do not have a fixed location on campus that is their own personal space.

Unlike employees in an organisation, students do not have offices or rooms dedicated for their

personal use at the University. For students, this space is usually located at their home where

they carry out most activities (e.g. study, prepare assignments, etc.). Therefore, it was not

possible to interview students in their natural environment where the above mentioned tools

were easily accessible. For the purposes of the evaluation, it was decided that the University

would be considered a natural environment for students. This was done because students were

familiar with the University and spent a significant proportion of their weekly time at the

University. However, it was not possible to identify a specific location at the University that

could be deemed the natural environment for interviewing the students. Instead the student

participants were asked to indicate their preference for the interview location. Some participants

- 293 -

opted to be interviewed at the Library, while others preferred the researcher’s office. It is

important to note that none of the locations selected by participants enabled access to the full

range of tools that a student makes use of in his/her activities.

DUEM proposes the use of focus groups during Phase 2, where possible. This serves the

purpose of observing the social interaction between students and eliciting multiple views from

students which are only possible in a group setting. Each of the student participants was given

the option of participating in a focus group at the time of arranging the interview. Seven of the

participants expressed an interest in taking part in a focus group, instead of a one-to-one

interview. However, due to the student participants’ conflicting timetables, it was only possible

to schedule one focus group with three students.

Each of the one-to-one interviews and the focus group followed the same basic format. The

initial part of the interview involved introductions and familiarising the participants with the

project. Participants were also asked about their background. This was followed by a discussion

of the students’ activities, using the questions shown in Table 5.2 to guide the process. During

the course of the interviews and the focus group, various questions from Table 5.2 had to be re-

phrased so that participants could understand the nature of the question. For example, the

participants found it difficult to respond to the fourth question (“What are the necessary

conditions for your [activity] to take place?”) because they did not understand the reference to

“conditions”. It was necessary to re-phrase the question in terms that were more familiar to the

students. One of the activities identified by students was “doing assignments”. Therefore,

question four was re-phrased to “What do you need in order to do your assignments?”. As

mentioned previously, questions were also re-phrased to refer to the specific activity being

discussed. For example, when discussing the enrolment activity, the questions were re-phrased

to refer to enrolment. Instead of “How is the activity carried out?”, students were asked “How

do you enrol in a subject?” or “What do you do to enrol in a subject?”.

- 294 -

Extensive notes were taken during the interviews and the focus group discussions, which took

place over a period of three weeks. The data collected was qualitative, rich in detail and

revealed a number of activities that students said they engaged in at the University. It provided

the evaluator with an in-depth understanding of the activities that a typical student is involved in

and the language students use to describe those activities. Although the web site was mentioned

by the student participants on many occasions, it was not the focus of the interviews and it was

always discussed only in relation to the activities.

In response to the first question (“What is an activity that you, as a student at the university, are

involved in?)”, the students provided a variety of answers. All of the students interviewed

indicated that they engaged in a large number of activities at the University. To make a

distinction between activities and actions, activities were viewed as taking place over extended

periods of time (Kuutti, 1996; Hasan, 2000). Of the activities mentioned by the students, four in

particular were referred to by every participant. These activities were considered to be the

activities that most students at the University have in common. The activities took place over an

extended period of time and required the use of the University’s web site. Since the purpose of

the evaluation was to assess the distributed usability of the web site in relation to the students’

activities, these four activities were selected. Described in the participants’ own language, they

include the following:

1. enrolling in subjects/courses;

2. taking subjects/doing research;

3. doing assignments/studying for and taking tests and exams/writing a thesis;

4. socialising.

The four activities listed above can be thought of as actions of the “study” activity (shown in

Figure 5.2), which was defined as a key or central student activity in Phase 1. However, the

multilevelness of Activity Theory (Kuutti, 1996) indicates that the hierarchical elements of an

activity (activities, actions and operations) are dynamic and can move up or down as conditions

change (Nardi, 1996b). While the overarching student activity is to study at the University, the

- 295 -

actions that are used to achieve that activity (enrolling in subjects/courses; taking subjects/doing

research; doing assignments/studying for and taking tests and exams/writing a thesis;

socialising) become activities in their own right as conditions change (i.e. students enter the

University) and those four activities become the focus of the students’ daily practice. At the

time of graduation, the four activities may once again be viewed as goal-oriented actions that

students’ undertook to transform the object of their central activity into an outcome (i.e. a

degree).

Although it wasn’t possible to identify a single main activity, all of the above four activities are

interrelated. In Activity Theory terms, the four activities identified by students formed a

network of activities. Each of these activities were analysed using the questions from Table 5.2

in order to collect data about how they were carried out (i.e. the hierarchical structure of the

activity), the tools used to carry them out, the rules that constrained the way that the activities

were carried out, individuals at the University involved in each activity (i.e. the community) and

their role (i.e. the division of labour), the outcomes of the activity, any problems or confusion

that the students had encountered in carrying out the activities (i.e. breakdowns and

contradictions), and how the activity had changed over time (i.e. the historical development).

The diverse backgrounds of the students resulted in multiple viewpoints about the same activity

and different levels of knowledge about each activity. The following sections outline the data

collected about each of the four activities.

5.2.2.1 Activity 1: “Enrolling in subjects/courses”

This activity generated a lot of discussion. There were diverse opinions about what enrolling

entailed and how it was done. The current enrolment procedures require students to ensure that

their enrolment complies with the relevant Course Handbook rules and guidelines, and to enrol

in their subjects each semester using an online enrolment system which can be found on the

University’s web site. (In the case of research students the enrolment was handled by

- 296 -

administrative staff at the Research Office, however the students still used the online enrolment

system to verify their enrolment). All of the students were familiar with the online enrolment

system. However, students were only able to enrol in subjects (and their associated tutorials)

using this system. If they wished to transfer to a different course (e.g. from Nursing to

Education), the students had to get prior approval from the academic advisor (whom the

University calls “Undergraduate/Postgraduate Co-ordinator”), and submit the required paper

documentation to the Student Administration Services before being permitted to enrol in

subjects associated with the new course. There appeared to be a lot of confusion amongst the

students over the difference between subjects and courses. Most of the students used the terms

interchangeably, or referred to courses as “degree” or “faculty” (e.g. “If I changed my

degree…”, “If I wanted to transfer to another faculty…”). Some students used the term “course”

to refer to both subjects and courses.

Also, there are special rules about enrolling in subjects after a certain date or if a student did not

meet the required pre-requisites for a subject. In both instances, students are required to gain

prior approval from an authorised person. Students expressed different views on who this

person was. In some cases, it was the subject lecturer, while in others it was the academic

advisor or the head of the school/department. Four students commented that they found the

inconsistency in this procedure between different subjects and faculties confusing. Also, these

exceptional cases (students enrolling late or being granted pre-requisite waivers) were not

automated and could not be completed by the student independently. Instead, the student was

required to complete a manual form, have it signed by the authorised person and take it to the

Student Administration Services where the staff member would enrol the student. International

students were constrained by a special set of enrolment rules which were imposed by the

government (Department of Immigration) and required that international students enrol in a

specific number of subjects in order to comply with the student visa requirements. The student

participants also commented that they often consulted their friends about enrolment to “find out

what a subject was like” or to “do subjects together”.

- 297 -

Based on the discussions with the students, the data collected was mapped on to the relevant

Activity Theory principles, and Figures 5.3 and 5.4 were derived. Figure 5.3 shows the

hierarchical structure of Activity 1 (Enrolling in subjects/courses) using Leont’ev’s (1981)

notation. Figure 5.4 indicates the elements of the activity system, based on Engeström’s (1987)

notation.

Activity Enrolling in subjects/courses Motive(s) To complete degree/graduate

1. Select subjects/tutorials/courses Goal To make sure subjects satisfy degree requirements

2. Enrol in subjects/tutorials/ courses

Goal To enrol in required subjects

3. Change course/degree/ faculty Goal To match personal interests and skills with degree

Actions

4. Obtain permission from authorised person in exceptional cases

Goal To enrol in required subjects; To complete degree early or on time

1. Use online student enrolment system

Conditions Online enrolment system operational and easy to use

2. Talk to authorised person Conditions Authorised person available

Operations

3. Complete paper form Conditions Paper form available and easy to complete

Figure 5.3 Hierarchical structure of Activity 1: Enrolling in subjects/courses

- 298 -

Figure 5.4 Enrolling in subjects/courses activity

Those students who had been at the University for an extended period of time, having

completed one degree and enrolled in further study, were able to discuss how the enrolment

activity was carried out previously. Prior to the development of the online enrolment system, to

enrol, students were required to complete a lengthy form, listing the subjects they had selected.

These subjects would then have to be signed off by an academic advisor or the subject lecturer

on the actual form. This process could take up to an entire week, if the signing individual was

not available. The student would then have to take the form to the Student Administration

Services and wait in a long queue, at times for hours. When his/her turn in the queue arrived, an

administrative staff member would enter the subjects into the computer-based student

management system to enrol the student. An enrolment record was then printed out for the

student and he/she was asked to check the record before leaving the counter. If a student later

made variations to his/her enrolment, the same process would have to be followed.

Subject Student

Object To enrol in subjects/courses

Tools Online student enrolment system, paper

forms, course handbook

Rules As specified in the course handbook (e.g. must satisfy pre-requisite requirements; must enrol by specific date, etc.); Government imposed rules on international students; Informal rules (e.g. students doing subjects together)

Division of Labour Students enrol independently except where prior approval is required by authorised persons; Only administrative staff can modify course enrolments and exceptional cases; Research office staff enrol research students.

Community Other students;

Academic advisors; Administrative staff;

Government in case of international students

OUTCOME Successful enrolment

- 299 -

Clearly, remnants of the way in which enrolment activity was previously carried out still exist in

the current activity. Specifically, the use of paper forms, the signatures by academic advisors

and the requirement that administrative staff only process certain enrolments, all imply that the

new tool (the online student enrolment system) has not been integrated completely into the

current enrolment activity (shown in Figure 5.4). This caused some participants to remark that

they were initially confused about how to enrol because there was inadequate information about

the correct procedure. In one instance, the participant went to the Student Administration

Services and waited in the queue for an extended period of time, only to be told that she had to

enrol using the student online enrolment system.

5.2.2.2 Activity 2: “Taking subjects/Doing research”

The activities of “Taking subjects” and “Doing research” have been grouped under one heading

because they both constitute what students do on a daily basis, except that the former refers to

students enrolled in undergraduate or postgraduate coursework degrees, while the latter refers to

postgraduate research students. The two activities have also been integrated because some

degrees require undergraduate students to write an honours thesis (which involves research),

while some research students are required to take subjects as part of their degree. While some

subjects are compulsory, others can be taken as electives, which gives students the freedom to

select subjects that are suited to their interests and skills. Research students who are only

engaged in doing research and writing a thesis are enrolled in only one subject for the duration

of their degree. These students work under the direction of one or more supervisors assigned to

them by the University at the time of application.

The student participants varied in their responses to the questions regarding this activity. For

example, students used a diverse range of tools while taking subjects or doing research. These

include, lecture notes, textbooks, a commercial online learning system, materials from the

- 300 -

library, reference books, journals, the Internet, online chat facilities and bulletin boards, e-mail,

databases, etc. Also, depending on the subject, different sets of rules applied about participation

and attendance requirements, assignments, submission of assignments, pre-requisite knowledge,

group work, missing a test or handing in an assignment late, etc. When taking a subject, a

student directly interacts with various individuals, including the lecturer, the tutor and other

students in the class. In the case of research students, these individuals are their supervisors.

However, there is also interaction with staff members in support functions such as the library,

the learning support centre, guest and/or substitute lecturers and tutors, the university bookshop,

etc. The students’ community in this instance is quite large. The role of the different

community members varies depending on the subject. Although the formal role of lecturers and

tutors is to lecture and mark assessment tasks, the in-class roles may differ. Some subjects are

focused on student-centred learning and creativity, while others are driven by the lecturer. The

same issues apply where a student is “Doing research”. Some students work more closely with

their supervisors, while others have more laissez-faire arrangements.

Due to the diversity of ways in which subjects are taught and research undertaken at the

University, the student participants indicated that it was difficult to know what to expect from

subject to subject, and this could lead to confusion. Several undergraduate student participants

expressed frustration with accessing lecture and tutorial materials:

“Some lecturers have a WebCT* site. Others put their notes in the library. And others

have their own [web] site. So if I am doing five subjects, I have to use so many different

things just to get the lecture notes. It’s a waste of my time.”

*WebCT is the commercial online learning system used at the University

Students doing research also expressed some frustration with accessing research materials.

Although the University library houses the majority of the research publications available on

- 301 -

campus, some Faculties have their own resources and collections. Furthermore, the library

stores its collection using various media, including print, micro film and online databases.

Based on the discussions with the students, the data collected was mapped on to the relevant

Activity Theory principles, and Figures 5.5 and 5.6 were derived. Figure 5.5 shows the

hierarchical structure of Activity 2 (Taking Subjects/Doing Research) using Leont’ev’s (1981)

notation. Figure 5.6 indicates the elements of the activity system, based on Engeström’s (1987)

notation.

Activity Taking subjects/Doing research Motive(s) To pass

To increase knowledge To learn

1. Attend lectures and tutorials Goals To learn about topics and solve problems To find out about assignments/tests/final exam To satisfy attendance requirements

2. Complete assignments; do tests and exams; write thesis

Goals To pass the subject To satisfy pass requirements To achieve a specific mark

3. Do required readings Goals To learn To prepare for final exam and tests To do assignments

4. Research specific topics Goals To increase knowledge To collect data To write thesis To do assignments

Actions

5. Conduct surveys/experiments Goals To collect data for thesis To solve research problem

1. Get lecture/tutorial notes and materials

Conditions Lecture notes available and accessible

2. Make notes Conditions Resources available 3. Read Conditions Reading materials and

textbooks available 4. Collect information Conditions Resources available in the

library, in databases, etc.

Operations

5. Write/type/edit Conditions Access to computer

Figure 5.5 Hierarchical structure of Activity 2: Taking subjects/Doing research

- 302 -

Figure 5.6 Taking subjects/Doing research activity

The historical development of this activity could not be mapped clearly due to the diverse nature

of subjects. All subjects change frequently, depending on the lecturer and changes to degree

requirements. However, the student participants commented that in recent years more subjects

were making use of computer technologies, whereas in the past lecturers would mostly provide

materials to students in hard copy made available through the library. Students would have to

physically go to the library to photocopy lecture and tutorial notes or readings. Currently, a

large number of subjects make these materials available through the commercial online learning

system, or by other electronic means. However, some subjects still employ this manual method,

which has caused confusion amongst students, as described above.

Subject Student

Object To pass subjects

Tools Lecture and tutorial notes, textbooks, a commercial online learning system, materials from the library, reference books, journals, the Internet, online chat

facilities and bulletin boards, e-mail, databases, etc.

Rules Vary depending on subject (participation and attendance requirements, assignments, submission of assignments, pre-requisite knowledge, group work, etc.)

Division of Labour Academic staff lecture and do tutorials, mark assessments determine final results; In-class division of labour varies depending on subject (student-centred or lecturer-driven); Research students do research and write thesis under direction from supervisors, and examiners assess thesis.

Community Other students; lecturers, tutors, research student

supervisors, support staff

OUTCOME Subject grade

- 303 -

5.2.2.3 Activity 3: “Doing assignments/Studying for and taking tests and exams/Writing a thesis”

Assignments, tests and exams are generally thought of as assessment tasks. Assessment tasks

are completed by all undergraduate and postgraduate coursework students, and also by those

postgraduate research students who are required to take subjects as part of their degree. Some

postgraduate research students are just enrolled in a single subject for the duration of their

degree. The only assessment task they have to complete is a thesis. The thesis constitutes a

long-term assessment task because, similarly to other assessment tasks, it also requires students

to solve a problem, do research and write it up. Therefore, thesis writing is included together

with the activities of doing assignments and studying for and taking tests and exams.

Although Activity 3 was defined as an action used to achieve the previous activity (Activity 2:

“Taking subjects/Doing research”), the student participants singled out “Doing

assignments/Studying for and taking tests and exams/Writing a thesis” as something they

engaged in on a long-term basis and in a significant way. Based on the ability to analyse

activities at multiple levels, the action of completing assignments; doing tests and exams; and

writing a thesis from Activity 2, was analysed as an activity in its own right (Activity 3) because

it is the focus of the students’ daily practice. At the time of completing a particular subject, this

activity may once again be viewed as goal-oriented action that students’ undertook to transform

the object of the “Taking subjects/Doing research” activity into an outcome (i.e. a subject

grade).

Completing assessment tasks constitutes a large part of the time any student spends at the

University. Assessment tasks take many forms and include assignments, tests, exams and

theses. Student participants who were engaged in coursework referred to this activity simply as

“doing assignments” and “studying for tests and exams” or “taking tests and exams”. Research

student participants described it as “writing my thesis”.

- 304 -

Each subject at the University has different types of assignments. Coursework students talked

about writing essays, literature reviews, reports, reviews, experiments, commentaries, diaries,

journals, research papers and critiques, as well as doing presentations. Some assignments are

done individually by each student, while others are group-based assignments. Similarly, there

are a variety of tests and exams that students take, including written, oral, online, and even

group-based tests and exams. Students are required to complete assignments and take tests and

exams in order to satisfy the requirements to pass a subject. However, this differs depending on

the faculty or the school in which the subject is offered. For example, student participants

enrolled in the Master of Business Administration reported that they had to achieve a minimum

mark of 40% on the final exam in order to pass a subject. In contrast, the undergraduate student

enrolled in the Bachelor of ICT reported that for some subjects, he only had to achieve a total

mark of 50% or above. It was observed that students were generally familiar with the pass

requirements for various subjects, although several commented that the information was not

readily available. In some instances, students contacted the lecturer to find out the exact

requirements.

Research students who were enrolled in a single subject and only required to write a thesis,

discussed the rules associated with this process. They are also required to write literature

reviews, summaries and report to their supervisors. In addition, they have a special set of pass

requirements and must complete a progress report at the end of each year of enrolment in order

to qualify for re-enrolment the following year.

To “do assignments”, “take tests and exams” and “write a thesis” students had to engage in

some form of research or study. This could involve simply reading the lecture notes and the

textbook to solve a problem or prepare for tests and exams. However, some assessments (for

example, diaries, journals and theses) were undertaken over longer periods of time and required

students to use a variety of resources. Therefore, the students often use a large number of tools

- 305 -

to perform this activity. Since this activity is closely related to Activity 2: “Taking

subjects/Doing research”, the same individuals are involved. These include lecturers, tutors,

thesis supervisors, other students, as well as support staff.

The roles of these individuals are better defined in this activity because students complete the

assignments, test and exams, and write theses, while the other community members provide

support to students in completing these assessment tasks. In addition, lecturers and tutors set

and mark the assessments for coursework students, while University appointed examiners assess

research students’ theses. The activity is governed by a variety of formal and informal rules.

Formal rules involve the University’s policies on plagiarism, assessment, examinations, appeals

and special consideration. Research students showed a strong awareness of the rules of their

degree. All of them commented that they had attended the University’s research student

orientation day at which time these rules were discussed. In contrast, few of the coursework

student participants showed an in-depth awareness or knowledge of their rules. They only

referred to the formal rules as prescribed by the lecturer in the Subject Outline, which is a

document handed out to students during their first lecture. None of the student participants had

actually read all of the official University policies and rules about assessment tasks, even though

these documents are widely available in hard copy format and on the web site. Informal rules

are rules set by individual lecturers and tutors for specific subjects. For example, one student

commented that her tutor set rules about forming teams for group assignments, so that there had

to be at least one international student on each team. Another student stated that her lecturer

required the use of specific software to complete assignments.

The student participants also described instances of informal rules taking precedence over

formal rules. For example, when probing the rules further, students were asked “What do you

do if you miss a test or an exam?” and “What do you do if you’re sick and you can’t hand in an

assignment on time?”. The majority of the student participants responded that they would

simply get in touch with the lecturer or tutor, explain the situation and ask for an extension. In

- 306 -

most cases the problem would be resolved informally between the lecturer and the student,

usually through e-mail or over the telephone. However, the University has a formal set of rules

that apply in this situation and these are described under the Special Consideration policy.

According to this policy, a student is required to formally apply for special consideration using

the online student system (which also handles the enrolments) and provide documented

evidence to support his/her claims. This documentation is verified by staff at the Student

Administration Services and entered into the student management system. The system generates

and sends the student’s request to the lecturer via e-mail and the lecturer then has to log in to the

student management system and either approve or deny the request. Discussions with the

student participants revealed that this process was often circumvented by both students and staff

members, possibly because it was thought to be too complex and time-consuming. The informal

method appeared to be more widely used due to its simplicity and fast resolution.

Based on the discussions with the students, the data collected was mapped on to the relevant

Activity Theory principles, and Figures 5.7 and 5.8 were derived. Figure 5.7 shows the

hierarchical structure of Activity 3 (Doing assignments/Studying for and taking tests and

exams/Writing a thesis) using Leont’ev’s (1981) notation. Figure 5.8 indicates the elements of

the activity system, based on Engeström’s (1987) notation.

- 307 -

Activity Doing assignments/Studying for and taking tests and exams/writing a thesis

Motive(s) To get a high mark To pass To increase knowledge

1. Discuss with lecturer/tutor Goals To become familiar with assignment To find out what is needed to get a high mark

2. Research and make notes Goals To prepare assignment 3. Prepare assignment/do

presentation Goals To get a good mark

4. Get feedback Goals To find out where marks were lost To improve for next time

5. Study for tests and exams Goals To increase knowledge 6. Take tests and exams Goals To pass

To get a high mark 7. Prepare research materials Goals To conduct research

Actions

8. Conduct surveys/experiments Goals To collect data for thesis To solve research problem

1. Meet with lecturer/tutor Conditions Lecturer/tutor available 2. Read and make notes Conditions Reading materials and

resources available 3. Write/type/edit Conditions Access to computer and

other materials (books, journal articles, etc.)

4. Collect assignments Conditions Assignment marked by lecturer/tutor Comments and marks provided

Operations

5. Attend tests and exams Conditions Test and exam schedule available

Figure 5.7 Hierarchical structure of Activity 3: Doing assignments/Studying for and taking tests and exams/Writing a thesis

- 308 -

Figure 5.8 Doing assignments/Studying for and taking tests and exams/Writing a thesis activity

Similarly to Activity 2 (“Taking subjects/Doing research”), the historical development of this

activity could not be mapped clearly due to the diverse nature of assessment tasks. Assessments

change frequently, depending on the lecturer and changes to the subject. Currently, a number of

subjects make use of a commercial online learning system and lecturers set assessment tasks

using this tool, such as online tests and student discussions. Similarly, thesis writing differs

depending on the discipline and the student’s supervisors.

Subject Student

Object To pass assessment tasks

Tools Lecture and tutorial notes, textbooks, materials from the library, electronic

readings, reference books, journals, the Internet, online chat facilities and bulletin boards, e-mail, databases

Rules Formal rules (official University assessment and thesis policies); Informal rules set by individual lecturers/tutors/supervisors

Division of Labour Students do assessment tasks; Lecturers and tutors set and mark assessment tasks; Supervisors direct research students and examiners assess theses; Community members support students in completing assessment tasks

Community Other students; lecturers, tutors, research student

supervisors, support staff

OUTCOMEAssignment/ test/exam mark; Thesis pass

- 309 -

5.2.2.4 Activity 4: “Socialising”

Most of the student participants commented that they engaged in various forms of social

interaction at the University. While some of the students attended the movie evening every

week (commonly referred to as “UniMovies”), others were members of University clubs and

associations. Some of the student participants also used the sports facilities on a regular basis,

while others preferred to meet with friends at the food hall, at the bar or in the outdoor areas.

Approximately half of the student participants indicated that they searched for information

about social events on the university’s web site. Five of the students indicated that they

subscribed to a social events newsletter which was sent via e-mail to both staff and students on a

regular basis, informing them about the various social functions and events on campus. Some of

the students also stated that they occasionally picked up a hard copy of the social newsletter

from the food halls. Those student participants who lived at University-managed residence halls

also commented that they would find out about social activities through the noticeboards or on

the residence halls’ web sites.

While the majority of the students engaged in social activities on a regular basis, three of the

student participants did not socialise at the University regularly. One of these three participants

commented that he was “too busy” to do too much socialising, while another remarked that

sometimes it was “too hard” to find information about what’s on. While all of the students had

different motives for socialising, they agreed that it was part of “enjoying the university

experience”. Various other members of the University community engaged in social activities

with the students, including academic and administrative staff members. The rules of the

“Socialising” activity are mostly informal, compared to the other three student activities which

are governed by official University policies, procedures and rules. Although formal rules exist

about student conduct on campus, this activity is predominantly governed by informal rules

related to the social activity itself. For example, the social rules will differ between the weekly

movie evening and a lunch time sports event. Similarly, traditional vertical division of labour

- 310 -

does not apply in this activity because both students and staff engage jointly in social functions

and events. For example, the weekly movie evening is organised jointly by volunteer students

and staff members. Therefore, the division of labour will vary depending on the situation.

Based on the discussions with the students, the data collected was mapped on to the relevant

Activity Theory principles, and Figures 5.9 and 5.10 were derived. Figure 5.9 shows the

hierarchical structure of Activity 4 (Socialising) using Leont’ev’s (1981) notation. Figure 5.10

indicates the elements of the activity system, based on Engeström’s (1987) notation.

Activity

Socialising Motive(s) To meet new people To get together with friends To network To have fun

1. Find out about social events and functions

Goals To decide which social events to go to To find out what’s on Actions

2. Go to social events and functions

Goals To have a good time To meet friends

1. Look for information Conditions Information is available and accessible

2. Decide what to attend Conditions Information is available and accessible Operations

3. Arrange transport Conditions Transport is available and suitable

Figure 5.9 Hierarchical structure of Activity 4: Socialising

- 311 -

Figure 5.10 Socialising activity

5.2.2.5 Summary of DUEM Phase 2: Understanding User Activities

The primary aim of Phase I in DUEM was to develop an understanding of the activities that

students at the University undertake and engage in. This provides a holistic view of what

students actually do on a daily basis, which forms the context for evaluating the selected system.

The previous sections described four typical activities that students are involved in at the

University. These four activities represent the context in which students use the University web

site as one of many tools available to them. Having gained an understanding of what students do

and how they do it, the next phase of DUEM aims to evaluate whether the University web site

supports what students do. Phase 3 aims to answer the following questions:

1. How does the University web site support students in enrolling in subjects/courses?

2. How does the University web site support students in taking subjects/doing research?

3. How does the University web site support students in doing assignments/studying for and

taking tests and exams/writing a thesis?

Subject Student

Object To enjoy University

Tools Newsletters, e-mail, posters,

noticeboards, web site

Rules Formal University rules on student conduct; Informal rules depending on social event type

Division of Labour Vertical division of labour does not apply; Varies according to social event type

Community Other students; Staff members

OUTCOMEA good time

- 312 -

4. How does the University web site support students in socialising?

5.2.3 DUEM Phase 3: Evaluating the System in relation to Users’ Activities

Phase 3 of DUEM consists of three sub-phases: defining the evaluation plan, preparing the

evaluation resources and materials and testing the system. Each of these sub-phases will now be

described in the University web site evaluation.

5.2.3.1 Sub-Phase 3.1: Defining the Evaluation Plan

Following Phase 2, the student participants were once again contacted to arrange a meeting to

jointly define the goals of the evaluation. Once again, it was not possible to arrange a time that

suited all of the participants. Therefore, different meeting times had to be arranged, resulting in

only two group meetings of two and three participants respectively. The other participants

attended individual meetings. Six of the participants were unable to attend the meetings for

various reasons. The inability to meet as a single group at the same time was not conducive to

working as a team.

In addition to the above logistical problems, the process of defining the evaluation plan was also

made difficult by the nature of the evaluation process itself. None of the students had

participated in a usability evaluation before, and while they found it relatively simple to discuss

their activities in Phase 2, they were unfamiliar with the task of defining the evaluation plan.

The participants found the task complex and daunting due to the nature of DUEM which

proposes that the evaluation should be carried out in relation to users’ activities. The

participants were provided with explanations and prompted about what was required. The first

question required the participants to discuss the goals of the evaluation. However, in response to

the question of “What do you want to evaluate?”, the students would say that they wanted to

evaluate the web site. This response brought the focus of the evaluation back to the system,

- 313 -

rather than the system in the context of users’ activities. When prompted for more specific

aspects and issues related to the activities they had discussed in Phase 2, participants once again

referred to the web site. Some of the typical responses from participants include the following:

• “… to find out if the web site is easy to use”;

• “… to find out if all the information is on the web site”;

• “… to see if the web site is any good”.

None of the participants referred to the issue of how well the web site supported their activities

or met their needs.

To resolve the issue of defining the evaluation goals, the evaluator had to steer the discussion

towards users’ activities and the usefulness of the University’s web site in those activities. This

was done by making reference to the activities defined in Phase 2 and providing examples such

as “If you wanted to enrol in a new subject, does the web site help you do this?”. While some

students were able to appreciate the change in perspective from “the web site itself” to “the web

site in relation to students’ activities”, the majority of participants did not appear to understand

the distinction. Finally, the evaluator proposed a single evaluation goal, and sought the students’

feedback on this goal in order to make a decision. The goal was defined broadly and

encompassed all of the users’ activities described in Phase 2. It was stated as the following: “To

evaluate how well the University’s web site supports the students in their daily activities”.

While two students sought clarification and asked questions about this goal (e.g. “What types of

activities?”), none of the participants proposed changes to the evaluation goal and it was

adopted.

The second task in this sub-phase involved collaboratively negotiating how the evaluation goal

would be achieved. Participants were asked to state a preference for one of the three evaluation

options: scenario-based evaluation, free-form evaluation or controlled evaluation. Based on

discussions with the student participants, a decision was made to use free-form evaluation. The

majority of the participants preferred this option because they felt that it was relatively simple

- 314 -

and easy. When questioned why they did not express a preference for the scenario-based or

controlled evaluation, the participants commented that it was “too hard” or involved “too much

extra work”. The participants indicated that they did not wish to develop the user activity

scenarios for the scenario-based evaluation.

A free-form evaluation involves observing and discussing with participants how they use and

interact with the system being evaluated (preferably in the participants’ natural environment)

without imposing any specific tasks to complete. Rather than just demonstrating and describing

how they use the system in general, the participants are asked to demonstrate and describe how

they use the system to carry out the activities defined in the previous phase. This approach

provides a structure to the free-form evaluation, albeit one which is defined by the participants

themselves (through their own activities). At the same time, this structure does not constrain

their interaction with the system in the same way that a task scenario used in traditional

usability testing would because each participant views and carries out the activities differently.

Therefore, the means by which they do these activities is not specifically prescribed.

5.2.3.2 Sub-Phase 3.2: Preparing the Evaluation Resources and Materials

The use of a free-form evaluation implies that those resources and materials which users employ

in their activities are required in order to facilitate the evaluation process. These resources and

materials are generally available in the users’ natural environment. Since DUEM views the web

site as one of the many tools that users employ in performing activities which are also social in

nature, it is preferable to carry out the actual evaluation in the users’ natural environment where

additional tools, resources and materials are easily accessible and any social interactions occur

naturally. However, in this instance, due to the diversity of the tools used in the activities (as

seen in Figures 5.4, 5.6, 5.8 and 5.10) and the inability to access all of the tools at a single time

(for example, some of the tools are only available in the library), it was not possible to

undertake the evaluation in any specific location. It was also not possible to define a natural

- 315 -

environment where students would use the web site on a regular basis. Several potential

locations on campus were considered including the library, the computer laboratories and group

study areas. Although students used the web site in all of these locations, none of them enabled

students to access multiple tools and engage in social interactions in the way they do in their

actual natural environment – a student’s room at home. Since it was not possible to completely

replicate a student’s room at the University, a decision was made to use the evaluator’s office

because it resembled the students’ actual natural environment more closely than any of the other

locations. The office contains the usual furniture and fittings that may be found in a typical

student’s room, including a desk, shelves, a computer, books, and stationery. To replicate a

students’ room more accurately, materials which a student may have at home were placed in the

office (for example, student handbooks and course guides), while students were asked to bring

their textbooks or other materials that they typically used at home, if they wished.

This approach enabled the evaluator to observe how the web site was used in conjunction with

other tools. However, it did not resolve the issue of social interactions in the natural

environment (for example, telephone calls or visits from friends). It was not possible to fully

replicate and observe these interactions naturally during the evaluation. It was only possible to

discuss with students how these social interactions would impact their use of the web site.

5.2.3.3 Sub-Phase 3.3: Testing the System

Having set the evaluation goal, defined the means for achieving this goal and prepared the

required materials, the evaluation sessions were scheduled. Due to the decreasing levels of

interest shown by the participants, a decision was made at this point to combine sub-phase 3.3

and phase 4 of DUEM. This meant that the participants would test the system and comment on

the problems at the same time. Instead of meeting once more after the evaluation sessions to

analyse the results, participants were asked to comment on the various problems that they

- 316 -

encountered using the web site, during the actual evaluation session. In addition, participants

were asked to make suggestions about overcoming those problems and improving the web site.

Eighteen of the student participants evaluated the web site over a period of two weeks. Three

participants were unable to attend. On average, each evaluation session lasted approximately

one hour. Detailed notes were made during the evaluation sessions about the students’

interaction with the web site, their comments and any suggestions they made. It was not

possible to record the evaluation sessions using screen capture software because of privacy

concerns. After the evaluation sessions were finalised, the data was collated and analysed by the

evaluator using Activity Theory principles. This is described in the following section.

5.2.4 DUEM Phase 4: Analysing and Interpreting Results

Phase 4 of DUEM involves analysing and interpreting the rich data collected during the

preceding phases. DUEM proposes that users should be involved in this process. First, the data

collected is examined to identify any breakdowns in the users’ interaction with the system. Each

breakdown is then traced to a contradiction in the activity. This enables evaluators to identify

distributed usability problems.

During the DUEM validation process, it became obvious that the student participants would not

be able to actively take part in the analysis and interpretation of the data collected due to their

decreasing levels of interest, their lack of experience with usability evaluation in general, and

their lack of understanding of Activity Theory principles. However, it was still important to

involve the student participants in some form in analysing the results to ensure that the data

collected was interpreted correctly and from the students’ point of view, rather than from the

evaluator’s perspective. To achieve this, the some of the data analysis and interpretation was

carried out in parallel with the actual data collection. While performing the free-form evaluation

in Phase 3, student participants discussed their interaction with the web site and raised problems

- 317 -

that they experienced, at the same time. Whenever a breakdown in the interaction occurred, the

student was asked to comment on the breakdown and describe possible reasons for its

occurrence. This assisted the evaluator to identify contradictions in the students’ activities.

Armed with an in-depth understanding of the students’ activities from Phase 2 of DUEM, it was

possible to situate each breakdown in the context of a specific activity and identify the source of

this breakdown (i.e. the contradiction).

During the evaluation sessions a number of breakdowns were identified and mapped on to

contradictions in the activity system. Each breakdown and its associated set of contradictions

represented a distributed usability problem. As with the traditional usability testing (described in

Section 4.2.2), the purpose of Phase 4 in the validation of DUEM was not to identify usability

problems with the university’s web site per se, but to demonstrate how these problems are

identified and analysed using DUEM. Therefore, only a subset of the usability problems found

will be presented and analysed in this chapter. One specific breakdown in each of the four user

activities defined in Phase 2 will be discussed and interpreted here, while a more complete set of

distributed usability problems identified can be found in Appendix L.

5.2.4.1 Breakdowns in Activity 1: “Enrolling in subjects/courses”

Section 5.2.2.1 described the student activity of “Enrolling in subjects/courses” and mapped the

elements of the activity on to Leont’ev’s (1978) activity hierarchy and Engeström’s (1987)

activity system. Students use an online enrolment system which is available on the university’s

web site to enrol in their chosen subjects and tutorials each semester. The student participants

described and demonstrated a variety of breakdowns in this activity caused by contradictions

that were directly related to the online enrolment system and the web site, including the

following:

• Students are unable to check the timing of lectures and tutorials when enrolling;

• Students have no way of determining which subjects they are allowed to enrol in;

- 318 -

These breakdowns will now be described and mapped to contradictions in the enrolment

activity.

One student participant described her enrolment activity as “a big ordeal”. Having enrolled in

the prescribed subjects, she found that most of the tutorial timings she had chosen overlapped

with lecture timings, which meant that she was unable to attend the lecture for one subject if she

attended the tutorial for another subject. She demonstrated this problem on the online enrolment

system and indicated that the actual timetable information had to be found in a separate location

on the web site and manually checked before using the online enrolment system to enrol in

subjects. Furthermore, the enrolment system did not provide a message to warn the student of

the overlapping timings. This breakdown in the student’s enrolment activity can be described as

severe because one of the operations of the activity as shown in Figure 5.3, (i.e. using the online

student enrolment system) was conceptualised to the level of an activity. (Conceptualisation

occurs when operations become actions or activities, if the conditions change.) The student had

to find and refer to a subject timetable and then determine which tutorials she could enrol in

prior to using the online enrolment system to select those tutorials. This caused a breakdown in

the interaction with the online student enrolment system because it did not support the student’s

activity. The student had to shift her focus from the enrolment activity to finding the subject

timetable in a different section of the web site, and manually developing a schedule that did not

result in overlapping lectures and tutorials.

Furthermore, the severity of the breakdown was exacerbated by the students’ lack of knowledge

and familiarity with both the activity and the enrolment system because she was an exchange

student from another university in different country. Clearly, the online student enrolment

system did not mediate the student’s activity of enrolling in a way that made it possible to carry

out that activity effortlessly and efficiently, and without the need to focus consciously on the

operational aspects of the activity.

- 319 -

Another student reported enrolling in a second year subject in his first semester because he had

not consulted an academic adviser or the course handbook to determine which subjects to enrol

in. Although the second year subject had no pre-requisites and therefore the student could enrol

in it, he was unaware of the fact that it was more important to complete all the first year subjects

in the first year because these subjects were pre-requisites for other subjects later on. The online

enrolment system did not offer any useful information to the student or check that the student

had consulted an academic advisor prior to selecting his subjects. Other students also reported

not being able to view which subjects were available to them, describing the system as “very

difficult and confusing to use” and only being able to learn how to use it through “trial and

error”.

The University’s course rules change on a frequent basis as new subjects are added or others

deleted. A student may have several different options available when deciding which course

rules to follow. This implies that, depending on the chosen course rules, a student will be

required to complete different subjects. However, the online enrolment system does not provide

the student with options, information or advice on which subjects he/she can enrol in, causing a

breakdown in the interaction with the system. Prior to using the system, a student must consult

several different course handbooks and speak to an academic advisor in order to determine what

subjects to enrol in. Only then is a student able to use the system to enrol into the correct

subjects. Those students who are unaware of the manual component of this activity (i.e. the use

of course handbooks and speaking to academic advisors) experience difficulties in using the

online enrolment system because it does not provide any information about the enrolment

process. Once again, the breakdown is exacerbated for new, first year students who are unaware

of the enrolment procedure and have no history of using the system or being involved in the

enrolment activity itself.

The breakdowns in the students’ interaction with the online enrolment system were mapped to

primary, secondary, tertiary and quaternary contradictions using the convention described in

- 320 -

Section 4.3.5. Discussions with students revealed a number of contradictions in the enrolment

activity. There were primary contradictions within the tool itself (i.e. the online enrolment

system) because it did not provide students with a warning message when they enrolled in

overlapping tutorials, and it did not provide students with information about how to select the

subjects they could enrol in. There were also secondary contradictions between the tool and the

student’s object (to enrol), between the tool and the rules (as represented by the subject

timetable and various course handbooks), and between the tool and the division of labour

(students are required to seek advice from academic advisors prior to enrolling). Furthermore,

there were tertiary contradictions caused by the introduction of a more advanced object into the

enrolment activity system in the form of a new method for enrolling using an online system.

This new method collided with the previous practice of students having to complete a manual

enrolment form together with an academic advisor prior to enrolling. The academic advisor

would sign the form and students would then take it to the Student Administration Services to

enrol. Remnants of the previous activity (seeking advice from an academic advisor) were

present in the current activity, however they were not fully integrated into the new activity,

causing tertiary contradictions in the activity system. These contradictions are shown in Figure

5.11.

- 321 -

Figure 5.11 Contradictions in Activity 1: Enrolling in subjects/courses

One staff member was interviewed about this process to gain an insight into an indirect users’

perspective. He agreed that, although the use of the online enrolment system was easier because

it placed more responsibility with the student, it was still inefficient because the enrolment

system did not facilitate the enrolment activity by providing students with relevant information.

Furthermore, he indicated that staff members who were academic advisors did not have any

tools available to them to assist in the process of giving advice. Just like students, academic

advisors are required to consult the relevant course handbooks.

Based on the breakdowns and contradictions identified during the evaluation, student

participants were asked for potential solutions to the problem. Various suggestions were made

including providing a facility which enabled students to check subject timetables from within

Subject Student

Object To enrol in subjects/courses

Tool Online enrolment system

Rules Subject Timetable Course Handbooks

Division of Labour Academic adviser required to provide advice

Community Other students;

Academic advisors; Administrative staff

Historical activity

1

2 223

LEGEND 1 = primary contradiction 3 = tertiary contradiction 2 = secondary contradiction 4 = quaternary contradiction

- 322 -

the online enrolment system (i.e. without having to access a different web page on the

University web site), providing a function which warns a student that he/she is about to enrol in

a tutorial which overlaps with another tutorial class, integrating course handbooks into the

online enrolment system so that it provides a student with a list of subjects or a proposed

schedule based on the student’s course (e.g. if a student is enrolled in a Bachelor of Accounting

course, the system lists the compulsory subjects he/she has to enrol in and offers a

recommended schedule for completing the subjects), and providing an online chat facility or

bulletin board which students could use to contact academic advisors if necessary. One student

pointed out that the library offers such a facility, whereby students can chat to a University

librarian online from 8am - 12pm and 2pm - 3pm on weekdays.

5.2.4.2 Breakdowns in Activity 2: “Taking subjects/Doing research”

Section 5.2.2.2 described the student activity of “Taking subjects/Doing research” and mapped

the elements of the activity on to the activity hierarchy and activity system. Students make use

the University’s web site to access lecture and tutorial materials, which they then use to study.

Research students mainly use the library pages and online catalogue, which can be accessed

through the University’s web site to do research. The student participants described and

demonstrated a variety of breakdowns in this activity caused by contradictions that were directly

related to the use of the University web site for this purpose, including the inability to access

relevant materials easily and directly.

Most students at the University take more than one subject every semester. Students with a full-

time load usually take four or five subjects. These subjects may be offered by a single faculty,

however it is also possible that they are offered and taught by several faculties. Each faculty has

its own set of procedures to provide lecture and tutorial materials to students. Most faculties

make use of a commercial online learning system to provide information, materials and various

resources to students enrolled in a particular subject. However, some academic staff members

- 323 -

prefer not to use this system, opting instead to develop their own subject web site. Each

academic at the University is given space on the web server which he/she can use to create and

upload a web site. If an academic chooses to do this for the subjects that he/she teaches, students

must remember the Uniform Resource Locator (URL) in order to access the web site because

there is no mechanism for linking an academic staff member’s personal web site to the main

University web site. Other academics opt not to use the online medium to distribute and provide

lecture and tutorial materials at all. Instead, they place copies of the relevant materials in the

University library or in various other libraries local to a particular faculty. It is possible that a

single subject can make use of a multitude of methods to provide students with the relevant

subject materials.

The student participants indicated that these diverse arrangements posed difficulties in finding

relevant materials as well as accessing them. The initial problem encountered by a student was

to find out where the subject materials for a particular subject could be accessed. The only

means of finding out this information was to attend the first lecture. Although students are able

to find out staff contact details and textbook and assessment information on the University’s

web site for a particular subject, they were unable to find out any information about the subject

materials. All of the student participants doing coursework commented on this issue, indicating

that it is of great significance to them. The participants suggested that during the first week of

the semester they spent a considerable amount of time searching for and accessing subject

materials because the University web site failed to provide this information along with all of the

other subject information. This implied a breakdown in the activity of taking subjects because

instead of focusing on the activity (taking a subject or studying), the students focused on the

operation of accessing subject materials (as shown in Figure 5.5). This operation was

conceptualised to the level of an activity.

Discussions with the student participants revealed several contradictions that lead to the

breakdowns. The primary contradiction within the tool (i.e. the University web site) occurred

- 324 -

because the web site failed to provide all of the information relevant to a subject. Only some of

the information was available to students. Secondary contradictions also occurred between the

tool and the object because students were unable to direct their activity towards the object of

passing subjects and studying. Instead finding the subject materials became the object of their

activity. There were further secondary contradictions between the tool and the division of labour

because academic staff members were unable to link their subject web sites to the University’s

web site. Only certain staff members had the rights to do this. This implied that students had to

remember a complex URL to access the information they required. The contradictions in the

activity system are shown in Figure 5.12.

Figure 5.12 Contradictions in Activity 2: Taking subjects/Doing research

Student participants made several suggestions to resolve the problems identified including

providing information through the University’s web site about the location of subject materials

and how they can be accessed along with all the other information about a subject, and

Subject Student

Object To pass subjects

Tool University Web Site

Rules Vary depending on subject

Division of Labour Academic staff members do not have rights to link subject web sites to University web site

Community Other students; lecturers, tutors,

support staff

1

LEGEND 1 = primary contradiction 3 = tertiary contradiction 2 = secondary contradiction 4 = quaternary contradiction

22

- 325 -

standardising access to subject materials (for example, making it compulsory for all subjects to

use the commercial online learning system to provide materials).

5.2.4.3 Breakdowns in Activity 3: “Doing assignments/Studying for and taking tests and exams/Writing a Thesis”

Section 5.2.2.3 described the student activity of “Doing assignments/Studying for and taking

tests and exams/Writing a Thesis” and mapped the elements of the activity on to the activity

hierarchy and activity system. Students make use of the University’s web site in various ways to

perform this activity. The activity is related to assessment tasks which are governed by specific

University policies and rules. The student participants described and demonstrated a variety of

breakdowns in this activity caused by contradictions that were directly related to the use of the

University web site, including breakdowns when applying for a make-up test or exam, or

handing in an assignment late.

Several of the student participants indicated that they had missed an assessment task for reasons

beyond their control, such as illness, a death in the family, etc. In these situations, students are

required to submit a formal Special Consideration request using a function available through the

online enrolment system described previously. However, none of the students who had missed

assessment tasks were aware of this formal procedure, or indeed what the term “Special

Consideration” meant. Students would instead seek “an extension” or a “make-up test” directly

from the subject co-ordinator or lecturer either by e-mail or by contacting him/her personally.

The students commented that most academic staff members allowed this informal procedure,

rather than following the more formal Special Consideration request process. One staff member

who was interviewed briefly about this process commented that he followed the formal process

where a subject had a large number of students enrolled and it was not possible to maintain an

accurate record of informal requests. He indicated a preference for the informal method because

- 326 -

it was quicker and easier than having to log in to the student management system and approve

or reject a formal request.

The use of two distinct methods for applying for special consideration caused confusion

amongst students who were unsure which rules to follow in different subjects. All of the student

participants who had applied for special consideration in the past indicated that they had

initially approached the academic staff member who would then either approve or reject the

request, or direct them to use the online enrolment system to apply. If the academic staff

member had a preference for the latter, a student would first have to locate the Special

Consideration request function on the University’s web site. The function is available through

the online enrolment system, however, it is not immediately apparent. To apply, a student has to

complete an online form and indicate on the form whether he/she would be providing

supporting documentation with his/her request. The student would then have to take the form to

the Student Administration Service and hand it in along with the supporting documentation

which would be verified and processed by the administrative staff. The system does not provide

any facilities to upload a scanned copy of the supporting documentation along with the

application. The requirement to manually hand in the supporting documentation can pose

problems for students who apply for Special Consideration at the end of the week and are not

able to hand in the documentation until the following week or at some other opportune time.

During the period immediately before final exams, the Student Administration Service is often

inundated with Special Consideration requests to process. This delays the response because an

academic staff member is not notified about the student’s request until the administrative staff

process it. According to one administrative staff member, it can take up to a week to verify and

process a Special Consideration request. In the meantime the student remains unaware of the

outcome of the request and unsure how to proceed because the system does not provide any

feedback about the status of the application.

- 327 -

There are several contradictions in this activity system. The primary contradictions are

manifested in the location of the Special Consideration function which is only available through

the online enrolment system on the University’s web site and not immediately recognisable.

Students are often not aware that such a function exists or what purpose it serves. The use of the

term “Special Consideration”, which is unfamiliar to students, only serves to obscure the

function further. (One student participant referred to it as the “misadventure form”.) Finally, the

function does not provide students with the ability to upload a scanned copy of the supporting

documentation or obtain a status report on the progress of their application.

Secondary contradictions also exist: between the tool and the object of the students’ activity,

between the tool and the rules because it does not support the informal rules which students and

academic staff members prefer, as well as between the tool and the division of labour because

the roles of the different community members (students, academic staff and administrative staff

members) are not organised to efficiently support the students’ activity.

Furthermore, there were tertiary contradictions caused by the introduction of a more advanced

object into the activity system in the form of a new method for applying for Special

Consideration. This new method collided with the previous practice of simply attaching

appropriate supporting documentation to an assignment or contacting the academic staff

member by e-mail to seek an extension or make-up test. Some remnants of this previous activity

are still present in the current activity because students have to provide supporting

documentation in hard copy, however they were not fully integrated into the new activity,

causing tertiary contradictions in the activity system. These contradictions are shown in Figure

5.13.

- 328 -

Figure 5.13 Contradictions in Activity 3: Doing assignments/studying for and taking tests and exams/Writing a Thesis

The student participants made suggestions to resolve the problems caused by the contradictions

identified in Figure 5.13 including, providing an uploading facility so that they are able to

upload supporting documentation rather than hand it in manually, providing a function which

enables them to track the progress of the application or sending the application with the

supporting documentation to the academic staff member instead of the administrative staff.

Some students also expressed a preference for the informal rules and suggested that the formal

Special Consideration request was too inefficient and should be eliminated. This preference was

supported by comments made by those student participants who had never applied for special

consideration. When asked what they would do if this situation arose, they indicated that they

would contact the academic staff member directly. None of them were aware of the formal

Subject Student

Object To get a good mark

Tool Special Consideration function

(inside online enrolment system)

Rules Informal rules vs. Formal rules

Division of Labour Students apply for special consideration; Administrative staff verify application; Academic staff approve/reject application

Community Other students;

Academic advisors; Administrative staff

Historical activity

LEGEND 1 = primary contradiction 3 = tertiary contradiction 2 = secondary contradiction 4 = quaternary contradiction

1

2 2 23

- 329 -

Special Consideration request function. This would appear to suggest a low level of

internalisation of this aspect of their activity by students.

5.2.4.4 Breakdowns in Activity 4: “Socialising”

Section 5.2.2.4 described the student activity of “Socialising” and mapped the elements of the

activity on to the activity hierarchy and activity system. Most of the University’s students

engage in some form of socialising by attending social events and functions. Students can make

use of the University’s web site to find out about upcoming social events and functions. They

can also receive notifications via e-mail or by reading noticeboards and picking up flyers and

advertisements placed around the campus. The student participants indicated one key

breakdown in this activity – the lack of a single point of access to information about social

activities on campus.

The University has a web site which provides information about various activities and social

events on campus. The web site has a section titled “What’s On” which provides a list of all the

upcoming social events for the week. However, students do not have a means of accessing this

web site directly. They have to be familiar with the URL in order to do so. The “Current

Students” page on the University’s web site contains a link called “What’s On (Events

calendar)”. However, when a student selects this link, he/she is shown the Events Calendar page

(refer to Figure 5.14). Intended to provide information about various events and activities on

campus at a glance, the calendar is a poorly used resource. Very few students and staff members

use it to add events.

- 330 -

Figure 5.14 Events Calendar

One student participant commented that he never searched for information about social events

on the University web site because it was “too hard to find”. Instead he browsed the

noticeboards and picked up flyers from the food hall. Other students found out about social

events from friends. One of the most frequently attended social events on campus is the

UniMovies night every Wednesday. UniMovies is one of the many clubs and societies that exist

on campus. As a result, it cannot be given more advertising and exposure than other clubs and

societies on campus, despite its popularity. To avoid a breach of the rules governing clubs and

societies, the information about the UniMovies is not available on a high-level page of the

University web site. All of the student participants who were aware of the UniMovies

commented that it was too difficult to find on the web site.

The University web site does not provide adequate support to facilitate the student activity of

socialising. Students have to rely on alternative sources of information because of several

contradictions that exist in the activity system, including primary contradictions within the tool

itself because the information on the web site is poorly structured. as well as secondary

contradictions between the tool and the student’s object and the tool and the rules governing

clubs and societies. These contradictions are shown in Figure 5.15.

- 331 -

Figure 5.15 Contradictions in Activity 4: Socialising

To overcome these problems, student participants suggested making the information more

easily accessible through the University’s home page and the online enrolment system which

students use on a frequent basis. Those student participants who were aware of the University’s

web site which provides information about various activities and social events on campus also

commented that it was not easily accessible, even though it contained useful information.

Suggestions were made to provide a direct link to this web site from various other pages which

were used by students on a daily basis. This includes the web-based e-mail system which most

students accessed daily, as well as web pages developed and hosted by the University’s

residence halls. Each residence hall has its own web site and several student participants

indicated that their residence hall’s web site was their default home page.

Subject Student

Object To enjoy University

Tool University web site; Events calendar

Rules Formal rules governing clubs and societies on campus

Division of Labour Varies according to social event type

Community Other students; Staff members

LEGEND 1 = primary contradiction 3 = tertiary contradiction 2 = secondary contradiction 4 = quaternary contradiction

1

2 2

- 332 -

5.2.4.5 Summary of DUEM Phase 4: Analysing and Interpreting Results

In summary, carrying out the actual evaluation and the analysis and interpretation of the results

in parallel proved to be beneficial because student participants were able to refer directly to the

system, demonstrate the problems they experienced and then discuss the causes of these

problems from their point of view. In addition, the overall amount of time spent on the

evaluation was reduced.

Section 5.2 described Steps 1 and 2 of Stage II of the research methodology as shown in Figure

3.8. These steps are now complete.

STAGE II: Method Validation Objective: To validate (evaluate) the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Claims (Mwanza, 2002) Steps: 1. Apply DUEM in practice 2. Document application of DUEM 3. Address DUEM claims (from Chapter 3) using evidence from

application to confirm or refute each claim

Figure 3.8 Research Methodology (replicated)

Having applied DUEM in practice in order to validate it, the following section (which

represents Step 3 of Stage II of the research methodology shown above) will discuss this

process in relation to the eight claims made about DUEM in Chapter 3. Data collected from the

validation process will be used as evidence to confirm or reject each claim.

- 333 -

5.3 DUEM Claims

A set of eight claims about DUEM was derived in Chapter 3 based the eight challenges to

UEMs identified in a review of the literature. DUEM was developed in response to those

challenges. To demonstrate whether DUEM has succeeded in overcoming the challenges, this

section will address each of the eight claims and evidence will be sought from the results of

applying DUEM in practice to either confirm or reject each claim.

5.3.1 [DUEM-1]: DUEM is user focused and user activity driven

To examine this claim, the following questions must be addressed:

• Is DUEM user driven, instead of system driven? • Is DUEM focused on the usefulness of the system, and not on the system itself? • Does DUEM provide a framework for analysing purposeful user activities that the system

supports?

DUEM involves users from the outset of the evaluation. Prior to evaluating the system, DUEM

requires that evaluators develop an in-depth understanding of users’ activities using Activity

Theory principles as a framework to analyse these activities. This is in contrast to other UEMs

which begin with an examination of the system itself, and an assumption about the types of

tasks that a user may perform with the system. Understanding users’ activities before attempting

to evaluate a system which is intended to support those activities is critical because “some of a

tool’s affordances emerge during use, unanticipated by designers” (Nardi & O’Day, 1999).

Therefore, without an understanding of the users’ activities (i.e. the use-situation), it is not

possible to evaluate a system effectively. Activity Theory provides a framework for analysing

the use-situation so that the usability and the usefulness of a system can be assessed. Rather

than simply evaluating a system at the interface, Activity Theory places the system in the

context of an activity as a mediating tool. This implies that the system as a tool facilitates the

- 334 -

activity and should be evaluated on how well it facilitates the activity, rather than just how well

it performs.

In the validation of DUEM described previously, students (users) were involved from the outset

by describing their daily activities in Phase 2. In later phases of the evaluation, the University

web site was evaluated in the context of those activities. This approach enables evaluators to

gain an insight into the users’ activities and the language and terminology that users employ.

For example, students described their activities as “doing assignments” or “taking subjects”

rather than using the more formal terms employed by the University of “assessment tasks” and

“enrolment”. The system could then be evaluated in relation to the users’ activities and their

language and terminology. Working with the users from the outset of the evaluation and

defining the users’ activities as the starting point indicates that DUEM is a user focused and

user activity driven evaluation method.

However, it must be noted that defining the users’ activities in Phase 2 of DUEM may be a

complex and challenging task for evaluators unless they are very familiar with Activity Theory

principles. This is because the participants can list a large number of activities that they take

part in on a daily basis. The evaluators must decide which of these activities are genuine, long-

term activities, and which ones are actions. Activity Theory does not provide any support for

making this distinction, other than to state that activities are long-term formations, while actions

are short-term processes (Kuutti, 1996).

5.3.2 [DUEM-2]: DUEM provides a means of understanding users’ goals and motives

To examine this claim, the following questions must be addressed:

• Does DUEM reflect and incorporate the motives and goals of users into the evaluation? • Does DUEM assess usability in relation to the users’ motives and goals to determine the

usefulness of the system?

- 335 -

DUEM incorporates users’ motives and goals into the evaluation process by examining these in

relation to the users’ activities, and not in relation to the use of the system. Leont’ev’s (1981)

hierarchical structure of an activity provides a means for achieving this. Questions relating to

users’ motives and goals are integrated into Phase 2 of DUEM during which users describe their

activities. The motives and goals are also taken into account in later phases when assessing the

usefulness of the system. The object of an activity is a reflection of an activity’s motives.

Therefore, if there are secondary contradictions between the system (the tool) and the object

(the ultimate motive) of the activity, it implies that the system does not facilitate the

achievement of the users’ goals, and subsequently may not be useful to the user.

In the validation of DUEM described previously, student participants discussed their motives

and goals in the context of their activities during Phase 2 (as shown in Figures 5.4 to 5.10). In

Phase 4 of DUEM, secondary contradictions were shown to exist between the University web

site and the objects of the students’ activities (as shown in Figures 5.11 to 5.15), which implied

that the students did not find the web site useful in performing certain activities. The web site

did not facilitate the achievement of their goals and satisfy their motives in an efficient and

transparent way.

Despite the ability of DUEM to take into account users’ motives and goals in relation to their

activities, problems with users’ motives remain during the evaluation process. The validation of

DUEM described above still demonstrates that it is necessary to use extrinsic motivators (e.g.

gift vouchers) to motivate users to participate in an evaluation. Even with the use of these

extrinsic motivators, it is still not possible to ensure that users will remain motivated for the

duration of the evaluation. As indicated previously, some student participants failed to take part

in different phases of the evaluation. Although it was proposed during the method building of

DUEM that the active involvement of users would serve as a strong motivating factor, it appears

that this alone is insufficient. As Nardi and O’Day (1999) point out, it is necessary to involve

users in “an activity which they find meaningful and intrinsically interesting” (p. 163). The

- 336 -

evaluation of the University’s web site was not a meaningful and interesting activity to the

student participants and it can be argued that their involvement was primarily driven by the

extrinsic motivator. Therefore, the question of “How to motivate users to be actively involved in

the evaluation?” remains unanswered.

5.3.3 [DUEM-3]: DUEM involves users directly in the design of the evaluation and in the analysis of the evaluation results

To examine this claim, the following questions must be addressed:

• Does DUEM place the user (not the system) at the centre of the evaluation process? • Does DUEM involve users in the design of the evaluation process in a well-managed way? • Does DUEM provide the means for users and evaluators to collaborate effectively?

Users are involved actively in all the phases of a DUEM evaluation, from the planning and

design of the evaluation, to the analysis of the results. Their role is not limited to being a subject

of the evaluation process. In DUEM users are active participants and collaborators in the

evaluation. The use of Activity Theory principles provides a framework to involve users in a

structured way by asking specific questions about the users’ activities and analysing the results

of the evaluation by identifying breakdowns and contradictions in the activities caused by the

tool.

The validation of DUEM described previously indicates that the direct involvement of students

in the design of the evaluation was beneficial. However, there were also a number of problems

associated with involving students. In addition to the issue of motivating students to participate,

there were problems with establishing team dynamics. DUEM advocates that evaluators and

users work together collaboratively as a team and negotiate the design of the evaluation.

However, during the validation process, this proved to be a difficult task. Due to the diversity of

the student group involved, it was not possible to find a convenient time to arrange team

meetings. Furthermore, some participants were unable to participate at various phases, resulting

- 337 -

in discontinuity in their involvement and understanding of the evaluation process. Participants

also required assistance during Phase 3 because they were unfamiliar with the process of

evaluation, and had difficulties in setting the goals of the evaluation. This implies that some

form of training (for example, a workshop) may be necessary to familiarise users with the

evaluation process prior to Phase 3.

Furthermore, participants showed decreasing levels of interest as the evaluation progressed. The

extrinsic motivator (i.e. the gift voucher) and the attempts to involve users in the evaluation

actively (thus giving them a sense of ownership over the evaluation) were inadequate to

motivate the students to participate. Involving users in all phases of DUEM requires a longer

time commitment from the users when compared to traditional usability testing. Unless users

demonstrate a strong commitment to the evaluation for personal reasons, it is necessary to

ensure that they are appropriately compensated for their involvement. This compensation does

not have to be material. Instead, users may be recognised in a public forum for their

participation.

Some of the problems that arose with the involvement of users in the University web site

evaluation can be attributed to the size of the team involved in the validation process. In an

attempt to capture the activities of a range of diverse students across the University, twenty-one

students were recruited for the evaluation. This proved to be a difficult team size to co-ordinate

and organise. To improve this aspect of DUEM, it is necessary to carry out further validation

and determine the optimal size of an evaluation team. Furthermore, it is important to incorporate

team-building activities into the evaluation in order to facilitate the development of the team and

collaboration amongst the team members.

Although DUEM advocates the involvement of users in the analysis of the evaluation results,

the University web site evaluation indicated that this may not be feasible as a step that is

separate from the evaluation sessions in Phase 3. Instead of requiring an additional time

- 338 -

commitment from all of the users to participate in Phase 4, it may be more appropriate to invite

only a small sub-set of the participants to take part in the analysis of the results. This ensures

that the users’ perspective is still represented during the analysis. The views of those

participants who are not involved in Phase 4 can be noted during the evaluation sessions (i.e.

sub-phase 3.3 of phase 3), which was done in the University web site evaluation.

Although DUEM involves users in both the design of the evaluation and the analysis of the

evaluation results, evidence from the University web site evaluation indicates that the way in

which this involvement is managed requires further refinement. This may involve suitable user

training prior to the evaluation as well as providing adequate compensation to the users for their

involvement.

5.3.4 [DUEM-4]: DUEM provides an understanding of users’ knowledge

To examine this claim, the following questions must be addressed:

• Does DUEM take into account the users’ knowledge about the activity that the system being evaluated supports?

• Does DUEM provide a framework for capturing this knowledge? • Does DUEM assess the usefulness of the system in relation to the users’ knowledge of the

activity that the system supports?

Unlike other UEMs, DUEM does not conceptualise the users’ knowledge in terms of the system

alone. For example, in traditional usability testing users are placed on a novice to expert

continuum based on their experience and knowledge of the system being evaluated. While

DUEM takes the users’ knowledge into account, it also considers the users’ familiarity with the

activity which the system supports. For example, if the system being evaluated is a web-based

search engine, DUEM will take into account the users’ knowledge of the search engine and of

the searching activity. The twelve questions used in Phase 2 of DUEM to gain an understanding

of users’ activities are also a framework for capturing the users’ knowledge about a particular

activity. Question 12 (“How was the activity carried out previously?”) in particular, is not only

- 339 -

aimed at understanding the historical development of an activity, but also the extent to which

users have internalised that activity and any previous versions of the system being tested.

In the University web site evaluation, some of the student participants were able to provide

insights into how the enrolment activity was carried out prior to the introduction of the online

enrolment system. Those students who had been at the University for an extended period of

time were familiar with the need for contacting an academic advisor prior to enrolment and had

less difficulty using the online enrolment system. However, those students who had no

knowledge of how the activity was carried out previously (for example, the exchange students

from other countries or students who were in their first year) related that they had more

difficulty enrolling because they were not fully aware of the enrolment procedure. The online

enrolment system did not provide these students with support in the form of information and

enrolment assistance. Therefore, the system was not deemed to be useful to those students who

had a limited knowledge of the enrolment activity.

Taking into consideration the users’ knowledge about the system, and the activity that the

system supports or facilitates is an important contribution of DUEM. DUEM provides

evaluators with a framework of twelve questions to develop an understanding of how well the

users’ have internalised an activity and the way the system is used in that activity.

5.3.5 [DUEM-5]: DUEM provides a framework for including contextual factors in an evaluation

To examine this claim, the following questions must be addressed:

• Does DUEM identify all the different system stakeholders and their activities? • Does DUEM reflect the social nature of system use?

Although DUEM does not prescribe the number of users involved in the evaluation, it does

advocate the involvement of both direct and indirect users in the evaluation. Direct and indirect

- 340 -

users represent the community or the stakeholders involved in a particular activity. The

involvement of both types of users implies that the system is evaluated from different

perspectives, and “not just a few people who play the most visible and important roles” (Nardi

& O’Day, 1999, p. 215). In the University web site evaluation, both direct and indirect users of

the system were identified in Phase 1. However, due to problems with recruiting users to take

part in the evaluation, it was not possible to involve indirect users who were staff members of

the University for the entire duration of the evaluation. Their involvement was confined to brief

interviews in Phase 4 during the analysis and interpretation of the results.

DUEM requires a significant time commitment from users and as a result it can be difficult to

recruit direct and indirect users. This may be even more problematic in a business environment

in particular where time is precious. One way of overcoming this problem may be to allocate

staff members a specific portion of their workload to participate in the evaluation. Therefore,

although all the different stakeholders can be identified in a DUEM evaluation, practical issues

emerge in ensuring their involvement and participation over a longer period of time.

The University web site evaluation also raised the issue of how many users to involve in the

evaluation. As mentioned previously, DUEM does not prescribe the number of users, leaving

this decision to the evaluators, depending on their resources and circumstances. However, the

University web site evaluation clearly showed that if there are a large number of users involved,

it is necessary to develop guidelines to manage and organise the involvement of these users

effectively, in particular where arranging and scheduling focus groups and interviews is

concerned.

To reflect and incorporate the social nature of system use, DUEM proposes that the evaluation

should takes place in the users’ natural environment whenever possible, so that the effects of the

social context can be taken into account. The University web site evaluation showed that this

depends on the type of users involved. For example, student users do not have permanent

- 341 -

physical space on campus, which represents the environment in which they use the web site

most frequently. This environment is usually the students’ home where they have access to the

web site and any other tools that they use in conjunction with the web site. However, since it

was not possible to carry out the evaluation at the students’ home, it was necessary to simulate

their natural environment in the evaluator’s office. Although the student participants were

provided with the majority of materials they would find at their desk at home, it was observed

that they did not feel comfortable in the simulated environment. This highlights the issue that

evaluating the users’ natural environment is only possible for certain types of users. For

example, staff members in an organisation have their own office or desk space where they

would use the system being evaluated. In this instance, it is possible to evaluate in their natural

environment due to ease of access. In contrast, it is more difficult to evaluate in the customers’

natural environment, because they may use the system (for example, the organisation’s web

site) from a variety of locations, including their home, an Internet café, a friend’s house, etc.

Although DUEM takes into account contextual factors in an evaluation by involving all of the

system’s stakeholders and advocating evaluation in the stakeholders’ natural environment, there

are practical issues that remain unaddressed. These issues include providing suitable

compensation to the stakeholders in return for their involvement and finding a suitable location

for the evaluation when it is not possible to evaluate in the users’ natural environment.

5.3.6 [DUEM-6]: DUEM provides a means of understanding how the system and system use co-evolve over time

To examine this claim, the following questions must be addressed:

• Does DUEM explain how systems and users’ activities co-evolve over time? • Does DUEM identify remnants of previous system use in current use-situations? • Does DUEM evaluate the usefulness of a system in relation to ongoing activities over a

prolonged period of time?

- 342 -

In Phase 2, DUEM contains a set of twelve questions which can be adapted to suit the needs of

a particular system or evaluation. These questions all relate to users’ activities, however, one

question in particular seeks to understand the evolutionary development of activities. This is

intended to provide evaluators with an insight into the historical development of the use-

situation and the role of the system over time in this use-situation. This information is used to

determine the users’ knowledge of the activity and also later in Phase 4 to analyse the impact of

any remnants of previous versions of the activity on the current form.

In the University web site evaluation, student participants were asked to describe how they

carried out each of the four activities in the past. Only those participants who had knowledge

about this were able to answer this question. The information was used to create a historical

context for the activity and analyse the impact of the previous activity (and any previous

versions of the system which mediated this activity) on the current one. An example of this

impact is shown in Figures 5.11 and 5.13, where tertiary contradictions arose between the

current activity and the historical activity as a result of introducing a new tool into the

enrolment activity and the assessment activity.

DUEM appears to provide a useful structure for understanding how the system and the activity

it mediates co-evolve over time because it incorporates Activity Theory principles directly in

Phase 2 of the evaluation. One of these principles is the developmental nature of activities,

which implies that it is necessary to examine the historical evolution of an activity and its tools

in order to understand users’ activities.

- 343 -

5.3.7 [DUEM-7]: DUEM offers a common vocabulary for describing evaluation processes and defining evaluation outcomes

To examine this claim, the following questions must be addressed:

• Does DUEM use consistent terminology to describe the evaluation process and outcomes? • Does DUEM provide an unambiguous definition of a usability problem? • Does DUEM allow evaluators to clearly identify and rate usability problems?

DUEM makes use of Activity Theory terminology to describe various concepts and issues in the

evaluation, including concepts such as an activity and its object, tools, rules, division of labour,

motives, actions, goals, operations and conditions. This terminology is used to describe the

users’ activities. In the University web site evaluation, these concepts were not raised

specifically. Instead, the twelve questions in Phase 2 were adapted and re-phrased to match the

evaluation context (i.e. a University). This was necessary because none of the participants had

prior training in Activity Theory. At times the participants required further clarifications, as

well. For example, when asked the question: “What rules apply to the enrolment activity?”, it

was necessary to qualify this question with further prompting, such as: “For example, can you

just enrol in any subject you like or are there certain rules you have to follow?”.

Despite having to re-phrase and clarify the questions in Phase 2, the answers from the

participants were useful and could easily be mapped to the Activity Theory principles by the

evaluator. It was important to do this because the mapping of the data collected from Phase 2

was critical in Phase 4 when analysing and interpreting the results of the evaluation and

identifying distributed usability problems.

DUEM defines a usability problem as a breakdown in the users’ interaction with the system

which can be attributed to one or more contradictions in the activity. This definition enables

evaluators to identify distributed usability problems across an activity and subsequently

determine the overall usefulness of the system and not just the usability at the system interface.

- 344 -

Depending on the level of conceptualisation of the breakdown, the severity of the problem can

also be determined. For example, if an operation is conceptualised to the level of an activity so

that the object of the users; activity becomes resolving the cause of the breakdown, this

indicates a severe problem.

In the University web site evaluation, it was possible to identify breakdowns and map them to

contradictions in the activity system jointly with users during Phase 3. However, the terms

“breakdown” and “contradiction” were not used specifically in order to avoid any confusion that

they may cause. Instead, breakdowns were discussed more generally as problems that the users’

had with the system. Users would first demonstrate the problems (breakdowns) they experience

or have experienced in the past using the University web site, and the potential causes

(contradictions) of these problems would then be discussed at length. The evaluator was

responsible for mapping the problems and their causes to breakdowns and contradictions in the

activity.

Clearly, DUEM offers a common vocabulary for describing evaluation processes and defining

outcomes (i.e. usability problems) that is beneficial for evaluators because users’ activities are

described using Activity Theory principles, and the same principles are applied in identifying

distributed usability problems. However, the University web site evaluation showed that the

users’ may not be able to internalise these principles to use them effectively. Therefore, it is

counterproductive to train users in Activity Theory principles. Instead, the evaluators should

have an understanding of the principles and how to apply them in an evaluation as a guide to the

types of questions that need to be asked and addressed.

5.3.8 [DUEM-8]: DUEM is a theory informed method

To examine this claim, the following questions must be addressed:

- 345 -

• Is DUEM based on a theoretical framework that enables evaluators to explain and analyse how systems are used in real activities?

• Does DUEM allow evaluators to design evaluation processes that reflect actual system use?

DUEM is based on distributed usability and on the framework of Cultural Historical Activity

Theory. The use of these constructs in DUEM implies that it is a theory informed method which

enables evaluators to collect and analyse data in a structured, integrated and meaningful way.

The questions derived for Phase 2 of DUEM are based on Activity Theory principles, and the

data generated from those questions is later used in analysing the users’ interaction with the

system in order to evaluate the system in relation to users’ activities. Therefore, a single

theoretical framework underpins DUEM from the outset of the evaluation to the determination

of usability problems. The benefits of having this theoretical scaffolding manifest themselves in

the integrated approach of DUEM, whereby the output of each phase can be directly transferred

into the next phase. Indeed, as mentioned previously, the phases of DUEM are so closely

intertwined through Activity Theory principles that it is not possible to explicitly indicate where

one phase ends and another one begins.

Activity Theory was selected as the underlying framework because of its “essential humanness”

Nardi (1996a). This can be seen in Phase 2 of DUEM where users’ activities are analysed in-

depth and form the context for evaluating the system. These activities are described by the users

themselves and represent actual system use. Furthermore, DUEM attempts to include all of the

different stakeholders in the evaluation to ensure that a broad range of user perspectives is

represented.

Although DUEM is a theory informed method, unlike most other UEMs, Activity Theory is a

complex framework and its principles may take some time to internalise. In order to use

Activity Theory effectively, as part of a DUEM evaluation, evaluators require prior training,

education and exposure to this framework.

- 346 -

Section 5.3 described Step 3 of Stage II of the research methodology as shown in Figure 3.8.

This step is now complete.

STAGE II: Method Validation Objective: To validate (evaluate) the Distributed Usability Evaluation Method Methods/Techniques/Tools Used: • Claims (Mwanza, 2002) Steps: 1. Apply DUEM in practice 2. Document application of DUEM 3. Address DUEM claims (from Chapter 3) using evidence from

application to confirm or refute each claim

Figure 3.8 Research Methodology (replicated)

The following sections will discuss the actual benefits and limitations of DUEM based on the

data collected from the DUEM validation process.

5.4 Actual Benefits of DUEM

Based on the validation of DUEM (i.e. the University web site evaluation) and the claims about

DUEM discussed in the previous section, the following statements can be made about the actual

benefits of DUEM:

1. DUEM is a user focused evaluation method with high levels of user involvement. DUEM

begins with and focuses on the users and their activities, and evaluates the system in relation

to those activities. Data about users’ activities is collected from the users themselves before

the actual evaluation of the system proceeds. The system itself is understood in the context

of the users’ activities as a tool that mediates, facilitates and supports those activities. At no

time is the system considered in isolation from this context. Furthermore, the user focus is

extended through the direct involvement of users in the design of the evaluation, the

- 347 -

evaluation itself, as well as the analysis of the evaluation results. This level of involvement

provides users with the opportunity to give feedback and comment on various aspects of the

evaluation.

2. DUEM is a flexible usability evaluation method. DUEM is not intended to be prescriptive. It

provides the evaluators and users with the ability to adjust the evaluation to suit their needs

and constraints. However, at the same time, it provides them with a structured framework to

guide the evaluation process. This framework is manifested through the Activity Theory

principles that have been incorporated into DUEM. Regardless of the logistical design of

the evaluation (i.e. how, when and where the system will be tested), evaluators and users

work jointly to identify the users’ activities and evaluate the system in relation to those

activities.

3. DUEM evaluates the usability and usefulness of the system. DUEM is based on distributed

usability. Distributed usability extends the traditional view of a system’s usability to include

the system’s usefulness in relation to users’ activities.

4. DUEM defines a usability problem. Usability problems in DUEM are considered to be

distributed across the activity that users engage in and manifested as a contradiction or

disturbance in this activity caused by the system.

5. DUEM is based on an underlying theoretical framework. The integration of Activity

Theory principles into DUEM provides this UEM with strong theoretical scaffolding, which

enables evaluators and users to evaluate the system in relation to the users’ real-life

activities. Activity Theory views the system as a mediating tool in these activities and

assesses its usability and usefulness from this perspective. Activity Theory also provides a

framework for understanding and analysing users’ activities and identifying usability

problems based on this framework.

The benefits of DUEM described above indicate that the method has resulted in an

improvement of traditional usability testing. However, it is also important to consider the

limitations of DUEM in its current form so that DUEM itself can be improved on.

- 348 -

5.5 Actual Limitations of DUEM

Based on the validation of DUEM (i.e. the University web site evaluation) and the claims about

DUEM discussed in the previous section, the following statements can be made about the actual

limitations of DUEM:

1. DUEM requires a significant commitment from users. While user involvement in design

and evaluation is highly desirable, it is also important to ensure that this involvement does

not impose on the users. In its current form DUEM requires that users be involved at all

phases of the evaluation. The results of the University web site evaluation indicate that users

must be proportionately compensated in return for this level of involvement. Failure to

compensate users results in disinterest and unwillingness to participate. (The issue of

compensation itself is problematic because it represents and extrinsic motivator which also

affects the users’ interest in the evaluation.) Also, the users involvement must be managed

in a more structured way in Phase 3 of DUEM, and sub-phase 3.1 in particular. For

example, the University web site evaluation showed that users need guidance in defining the

evaluation plan. Furthermore, the management of the user involvement needs to be

consistent with the number of participants involved in the evaluation, and it would be useful

to incorporate team-building strategies into DUEM for more effective management.

2. Activity Theory can be complex and confusing. Although Activity Theory provides an

underlying theoretical framework for DUEM, its principles and constructs are complex and,

at times, confusing. Users should not be required to learn Activity Theory, however,

evaluators must have knowledge of its key principles if DUEM is to be applied effectively.

This is necessary because Activity Theory principles are used in all phases of the

evaluation.

3. It is not always possible to evaluate in the users’ natural environment. DUEM favours an

evaluation in the users’ natural environment for two reasons: to study the effects of the

users’ social context on their use of the system, and to enable users to access other tools that

- 349 -

they use in conjunction with the system so that the role of these tools can be examined.

However, as the University web site evaluation demonstrated, this is not always possible.

The limitations of DUEM in its current form described above represent issues that need to be

resolved in order to improve the method. Although DUEM has overcome most of the limitations

of traditional usability testing, and a number of the UEM challenges listed in Chapter 2, in the

process it has introduced its own set of limitations and problems. Future research will be

focused on resolving and overcoming these limitations.

5.6 Conclusion

The purpose of this chapter was to validate the Distributed Usability Evaluation Method

(DUEM) by applying it in practice to evaluate a system. This application process was described

in detail and the data collected was used as evidence to support or reject the claims made about

DUEM in Chapter 3. The outcome of the validation indicates that DUEM offers some important

benefits to evaluators. However, at the same time it suffers from several limitations which need

to be overcome through further refinement of the method.

The validation of DUEM was an “ideational” evaluation (Iivari, 2003), which aims to

demonstrate that the artifact includes novel ideas and addresses significant theoretical or

practical problems with existing artifacts. The objective of an “ideational” evaluation is to

determine “how well” the artifact works, rather than “how or why” it works. The results of the

validation presented in this chapter suggest that DUEM works sufficiently well to be deemed an

improvement over traditional usability testing. It overcomes most of the UEM challenges

identified in Chapter 2 successfully because it is based on distributed usability and Activity

Theory principles. Furthermore, it involves users at every stage of the evaluation process and

permits evaluators a high level of flexibility in how they choose to evaluate a system. As a

- 350 -

theory informed method which involves users in a significant way, DUEM’s approach is novel

compared to the existing user-based UEMs identified and described in Chapter 2.

DUEM was validated in the complex multi-stakeholder environment of a university and used to

evaluate the usability and usefulness of a complex system – the University’s web site. This

environment highlighted several limitations of DUEM which need to be addressed through

further refinement and development of the method. These include the need for user training in

order to compensate users for their commitment and facilitate more effective user involvement,

the need for training evaluators in Activity Theory if they are to apply it effectively, and the

need for strategies to address the problem of not always being able to evaluate in the users’

natural environment.

According to Mwanza (2001), “validating a theory informed method can prove to be a very

complex task due to the fact that the contribution of such methods is usually viewed in the

context of its role within the wider systems development process” (p. 196). Only the individuals

involved in a systems development process are able to determine if a particular method is

suitable for the context and purpose of use and the crux of the validation is ultimately

determined by the “extent to which the method relates to concepts of the underlying theoretical

framework” (ibid, p. 208). A single validation of DUEM can only begin to demonstrate its

validity and usefulness. Nonetheless, this chapter has been able to demonstrate that DUEM is an

operational UEM which is based on a theoretical framework and makes use of the principles

and concepts of that framework to evaluate a system from the users’ point of view. Continued

testing of DUEM in a range of different real-life systems development and evaluation projects

will no doubt confirm its usefulness in different contexts and situations.

- 351 -

Chapter 6

Conclusion

“It is tremendously valuable to wonder about why things are the way they are.”

(Nardi & O’Day, 1999, p. 69)

6.1 Introduction

This chapter provides a summary of the research presented in this thesis. The research set out to

develop and validate a usability evaluation method which is based on distributed usability (as

described by Spinuzzi, 1999) and the principles of Cultural Historical Activity Theory (as

described in the works of Vygotsky (1978), Leont’ev (1978; 1981), Bødker (1991a) and

Engeström(1987, 1999)). The method was named the Distributed Usability Evaluation Method

(DUEM). This chapter summarises the key issues addressed by the research presented in the

thesis in relation to the goals of the research and the contribution of the research. The chapter

also provides an overview of the research limitations and a personal reflection on the research

process. The chapter concludes with suggestions for future research directions in light of the

research limitations and the personal reflection.

6.2 Summary of Key Issues

Humans use diverse computer-based systems and technology on a daily basis in their personal

and work activities, with the expectation that these systems and technology will facilitate the

activities they are used for. The activities represent the basic context in which systems exist and

operate. It is not possible to divorce the systems from the activities because, on their own, the

- 352 -

systems are meaningless. It is only in the context of use that they become meaningful tools. In

order to evaluate these tools, therefore, the context in which they are used (or the use-situation)

must be taken into account.

However, current evaluation methods are focused primarily on evaluating the usability of

systems in isolation from the context in which these systems are used. This is because they are

based on traditional usability, which is localised at the system interface. Traditional usability

has been criticised by a number of authors and researchers as being inadequate for this reason.

They have argued the need for incorporating the use-situation or context into the design,

development and evaluation of systems in order to build systems that are both usable and useful.

This thesis set out to develop an evaluation method which incorporates the use-situation into the

evaluation process, resulting in an assessment of a system that takes into account the context in

which it is used.

The starting point for developing this evaluation method was the troubled notion of usability,

which forms the basis of current evaluation methods. It was necessary to find an alternative

view of usability, one that encompasses the notion of usefulness, to use as the basis for

developing the method. Spinuzzi’s (1999) notion of distributed usability offered a suitable basis

for this purpose and was adopted. Distributed usability views usability as being spread out or

distributed across the entire activity that a human may engage in. This implies that the elements

of the activity itself can have an impact on the usability of a system. Cultural Historical Activity

Theory was applied to define the elements of the activity because it offered a suitable

framework for analysing the elements and examining the relationships between them. The result

of applying distributed usability and Cultural Historical Activity Theory principles to develop

an evaluation method which takes into account the context in which a system is used was the

Distributed Usability Evaluation Method (DUEM).

- 353 -

DUEM consists of four highly integrated and potentially overlapping phases which involve

users directly in every stage of the evaluation process. The phases are integrated through

Activity Theory principles, which are used at the outset to define the context in which humans

use systems. This context is manifested in the activities that humans engage in. Once evaluators

have defined the activities jointly with the system users and mapped these to Activity Theory

principles, the system can be tested against this backdrop, rather than in isolation from it. The

data collected from the testing is used to identify distributed usability problems which highlight

breakdowns in the users’ activities that are caused by “deeper discoordinations” (Spinuzzi,

1999) or contradictions between the system and the other elements of the users’ activity.

The validation of DUEM indicated that evaluating a system in relation to users’ activities, and

involving users in all aspects of the evaluation was beneficial. Rather than dealing with issues at

the system interface, the DUEM evaluation identified a number of distributed usability

problems that could be traced to contradictions in the users’ activities caused by the system.

These problems did not only indicate the usability of the system, but its usefulness as well. For

example, the exchange student whose enrolment was a “big ordeal” thought that the online

student enrolment system was fairly easy to use (i.e. usable). However, the system was not

useful because it did not support her enrolment activity by informing her that all the subjects she

had enrolled in overlapped.

Although the validation of DUEM indicated the benefits of the method, it also showed that

DUEM had its own set of limitations. The level of user involvement in DUEM is significant

which implies that users require some form of incentive to participate effectively. Also, their

involvement needs to be managed more carefully where large groups of users are involved. For

example, it would be useful to integrate team building techniques and strategies into DUEM.

Being part of a team may motivate the users’ to work together and sustain their interest for

longer. Finally, the principles of Activity Theory that DUEM relies upon can be complex and

confusing. Although it is possible to undertake a DUEM evaluation without requiring users to

- 354 -

have an understanding of these principles, the evaluators must be trained in Activity Theory.

Training is a pre-requisite for applying any UEM. While DUEM requires training in Activity

Theory, other UEMs require training in cognitive psychology.

In summary, this thesis achieved the three research goals it set out to accomplish:

1. To build a UEM based on distributed usability and informed by Activity Theory, that will

overcome the UEM challenges identified in Chapter 2.

2. To apply the UEM in practice.

3. To validate the UEM by assessing whether it overcomes the UEM challenges identified in

Chapter 2.

In doing so, it has made a contribution to the existing knowledge and research about usability

evaluation methods in general by developing a flexible and practical user-centred method that

assesses the usability and usefulness of systems. The Distributed Usability Evaluation Method

also represents an operationalisation of distributed usability and Activity Theory, and as such,

contributes to HCI and Activity Theory researchers. Furthermore, DUEM offers evaluators a set

of practical, analytical tools in the form of twelve questions to examine and model users’

activities, and a means for identifying and analysing distributed usability problems. Although it

is intended to be used primarily as a formative evaluation method, DUEM can also be applied in

summative evaluations. Finally, DUEM represents an important contribution to both researchers

and HCI practitioners alike because it proposes a whole new way of thinking about systems and

their effectiveness in users’ activities.

6.3 Limitations of Research Study

Apart from the limitations of DUEM itself, described in the previous chapter (Section 5.5), the

research study presented in this thesis has some inherent limitations, including the following:

• DUEM was validated in only one environment and using only one system – a university

web site. Universities are multi-stakeholder environments and their web sites are often

- 355 -

complex systems used by the stakeholders to perform a large number of different activities.

Although DUEM was applied to a complex, multi-stakeholder system, it is unclear how

DUEM would perform in alternative environments and systems (e.g. environments and

systems of lower complexity, with users being fewer in number and performing fewer

activities). Since the validation of DUEM was an “ideational” evaluation, the aim was to

demonstrate that the method includes novel ideas and addresses limitations of existing

methods. Therefore, the validation of DUEM, by applying to a university web site, has

achieved this aim. However, further validation is required to provide additional evidence of

DUEM’s validity in diverse environments.

• The validation of DUEM was limited to the activities performed by current students at the

University. Other stakeholders in the University were unable to participate fully in the

evaluation which implies that it was not possible to incorporate their perspective to a large

extent during the validation process.

• The researcher’s familiarity with the University, its procedures and systems is an advantage

that external evaluator’s of the University’s web site would not have possessed, and

therefore, may have influenced the evaluation process.

• DUEM represents one of only a handful of methods in HCI that have operationalised

Activity Theory. As such, it is still too early to make any conclusive claims about the

application of Activity Theory in HCI research and practice.

6.4 Personal Reflection

Almost twenty years ago, Winograd and Flores (1986) argued for the need to develop an

understanding of what computer systems do in the context of human practice and human

activities. They maintained that systems and technology could not be treated in isolation from

this context: “It is clear […] that one cannot understand a technology without having a

functional understanding of how it is used. Furthermore, that understanding must incorporate a

- 356 -

holistic view of the network of technologies and activities into which it fits” (p. 6). Even though

almost twenty years have passed, we still haven’t completely adopted a holistic understanding

of technology use as an underlying philosophy in HCI methods. This thesis is intended to be a

single step towards this understanding. It is only a small step, but nevertheless a significant one

because with each step, we are closer to our goal of putting users ahead of technology. At times

it may appear to be an uphill struggle to keep pace with the rate of technological developments,

as new devices and gadgets emerge on a daily basis. However, it is important that the HCI

community perseveres in its quest towards a holistic understanding of how these devices and

gadgets are used by humans in their daily lives, and, as Nardi and O’Day (1999) point out at the

start of this chapter – why.

The inspiration to write this thesis stems from my personal experiences with HCI evaluation

methods (and usability testing in particular) over a number of years, and an inherent

dissatisfaction with the results that these methods produce. It stems from a desire to place the

user and his/her needs at the very core of the evaluation process and to develop systems that fit

into the fabric of everyday life (Beyer & Holtzblatt, 1998). Having evaluated numerous systems

and web sites, it became apparent to me that the outcomes of the evaluations rarely served to

truly improve the daily work activities of the people who used these systems and web sites.

Changing the label of a link or re-positioning a button on the screen perhaps did save these

people some time, but it never addressed their real needs, which often ran beyond issues at the

interface.

The Distributed Usability Evaluation Method has been brewing in my mind for years. It was a

hint rather than a thought. A hint that had no name, no structure and no means of realising itself

until I came across Activity Theory. From the cross-fertilisation of my own thoughts and this

remarkable theoretical framework, a full-blown method evolved. This method has now been

formalised in this thesis. It has been presented at a number of conferences at different stages of

its evolution, and it inherently contains the ideas of many like-minded colleagues who have

- 357 -

offered their comments and shared the same passion for Activity Theory and for listening to

users. Building a method is a challenging task and a long-term commitment to future research.

This thesis does not, by any means, represent the end of the evolution. In fact, this thesis

presents a first version of DUEM, which now needs to be refined and developed further so that

it can evolve to become a mature, comprehensive and eminently useful evaluation method to the

HCI community.

6.5 Future Research Directions

There are a number of future research directions that can be taken in order to evolve DUEM

further. They are based on the research outcomes described previously and include the

following:

1. Further improvements of DUEM: The validation of DUEM highlighted several limitations

with the method that need to be addressed in order to improve DUEM. These improvements

may include managing users’ involvement more effectively and developing incentives for

users that are proportionate to the level of involvement required.

2. Further validation DUEM: Since DUEM was only validated once in an “ideational

evaluation” by the author to demonstrate its usefulness, it is necessary to conduct further

validation studies involving evaluators other that the author, including evaluators who have

had no previous exposure to Activity Theory and require some form of training in this

framework.

Also, DUEM was validated by evaluating a university web site. To fully assess the

usefulness of DUEM it is necessary to validate it using a variety of different computer-

based systems (ranging from simple static web pages to complex organisation-wide

- 358 -

interactive systems) in diverse environments and contexts. The outcome of this process may

be a series of variations on DUEM.

3. Automation of DUEM: Some phases and sub-phases of DUEM may be automated. For

example, the data collection in Phase 2 can be automated so that evaluators can input all of

the data collected and a software can automatically map it to the relevant Activity Theory

principles. Similarly, the software can be used in Phase 4 to identify contradictions in the

users’ activities based on the data collected during Phases 2 and 3.

4. Development of DUEM training kits: Since DUEM relies on the use of Activity Theory

principles to analyse the users’ activities and identify distributed usability problems, it is

necessary to develop resources for evaluators who are unfamiliar with this theory. This

would involve operationalising Activity Theory principles to a practical level and

demonstrating how these principles can be applied effectively in HCI evaluation.

5. Development of team building strategies: Since DUEM requires that evaluators and users

collaborate over an extended period of time, it would be useful to develop and integrate

team building strategies into DUEM, particularly in Phase 1 after the users are selected and

recruited.

6.6 Conclusion

This thesis set out to develop a usability evaluation method that would evaluate the usability and

the usefulness of computer-based systems, using distributed usability and Activity Theory

principles. The end result is DUEM – a novel usability evaluation method that involves users

from the outset of the evaluation and assesses the system in relation to their activities. DUEM is

a method that aims to expand our horizons to think about useful systems (Nardi, 1996b). It

incorporates the four recommendations made by Nardi (1996c) to embrace Activity Theory in

- 359 -

HCI: adopt a long-term research time frame in order to fully understand users’ objects; pay

attention to broad patterns of activity rather than narrow episodes; use a varied set of data

collection techniques, and commit to understanding things from users’ points of view. These are

important aspects of HCI methods because “new technologies create new possibilities for

knowing and doing” (Nardi & O’Day, 1999, p. 125), and “we need to carefully consider [these]

technologies to make sure they will fit well with our practices and values” (ibid, p. 184). DUEM

aims to provide us with the means to do so.

- 360 -

References

ACM SIGCHI (1992) Curricula for Human-Computer Interaction. ACM Press, New York.

Alter, S. (1999) Information Systems: A Management Perspective. Prentice Hall.

Bailey, R. W. (1993) Performance vs. Preference. “Designing for Diversity”, Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, Human Factors and Ergonomics Society (HFES), pp. 282-286.

Bailey, R. W. (2001) Heuristic Evaluations vs. Usability Testing Part I [online]. Usability Newsletter, Available from http://webusability.com/usability_article_Heuristic_Eval_ vs_Usability_Testing_part1_2001.htm, [Accessed May 12, 2004].

Bailey, R. W., Allan, R. W. and Raiello, P. (1992) Usability Testing vs. Heuristic Evaluation: A Head-to-head Comparison. “Innovations for Interactions”, Proceedings of the Human Factors and Ergonomics Society 36th Annual Meeting, Human Factors and Ergonomics Society (HFES), pp. 409-413.

Bannon, L. J. (1990) From Human Factors to Human Actors: The Role of Psychology and Human-Computer Interaction Studies in System Design. In Greenbaum, J., & Kyng, M. (Eds) Design at Work: Cooperative Design of Computer Systems. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 25-44.

Bannon, L. and Bødker, S. (1991) Beyond the Interface: Encountering Artifacts in Use. In Carroll, J. M. (Ed) Designing Interaction: Psychology at the Human-Computer Interface, Cambridge University Press, pp. 227-253.

Bardram, J. E. (1997) Plans as Situated Action: An Activity Theory Approach to Workflow Systems. Proceedings of Fifth European Conference on Computer Supported Co-operative Work (ECSCW '97), pp. 17-32.

Baronas, A.K. and Louis, R. (1988) Restoring a Sense of Control During Implementation: How User Involvement Leads to System Acceptance. MIS Quarterly, March, pp. 111-124.

Baskerville, R. L. and Wood-Harper, A. T. (1996) A Critical Perspective on Action Research as a Method for Information Systems Research. Journal of Information Technology. 11(3), pp. 235-246.

Bertelsen, O. W. (1996) Contradictions in the Festival Project: Activity Systems, Obstacles and Dynamic Forces in Design. “The Future”, Proceedings of the 19th Information Systems Research Seminar in Scandinavia (IRIS 19), pp. 597-612.

Bertelsen, O. W. (1997) Understanding Objects in Use-oriented Design. “Social Informatics”, Proceedings of the 20th Information Systems Research Seminar in Scandinavia (IRIS 20), pp. 311-324.

Bertelsen, O. W. (1998) Elements of a Theory of Design Artefacts: A Contribution to Critical Systems Development Research. PhD Thesis, Aarhus University, Denmark.

Bertelsen, O. W. (2000) Design Artefacts: Towards a Design-oriented Epistemology. Scandinavian Journal of Information Systems, 12(1), pp. 15-27.

- 361 -

Bevan, N. and Macleod, M. (1994) Usability measurement in context. Behaviour & Information Technology, 13(1&2), pp. 132-145.

Beyer, H. and Holtzblatt, K. (1998) Contextual Design: Defining Customer-Oriented Systems. Morgan Kaufmann, San Francisco.

Bias, R. G. (1991) Walkthroughs: Efficient Collaborative Testing. IEEE Software, 8(5), pp. 94-95.

Bias, R. G. (1994) The Pluralistic Usability Walkthrough: Coordinated Empathies. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 63-76.

Bias, R. G. and Mayhew, D. J. (1994) Cost-justifying Usability. Academic Press, Boston.

Blackler, F. (1993) Knowledge and the Theory of Organisations: Organisations as Activity Systems and the Reframing of Management. Journal of Management Studies, 30(6), pp. 863-884.

Blackler, F. (1995) Knowledge, Knowledge Work and Organizations: An Overview and Interpretation. Organization Studies, 16(6), pp. 1021-1046.

Blackmon, M. H., Polson, P. G., Kitajima, M. and Lewis, C. (2002) Cognitive Walkthrough for the Web. “Changing the World, Changing Ourselves”, Proceedings of CHI 2002, ACM Press, New York, pp. 463-470.

Blatt, L. A. and Knutson, J. F. (1994) Interface Design Guidance Systems. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 351-384.

Bødker, S. (1991a) Through the Interface: A human activity approach to user interface design. Lawrence Erlbaum Associates, Hillsdale, NJ.

Bødker, S. (1991b) Activity Theory as a Challenge to Systems Design. In Nissen, H. E., Klein, H. K. & Hirscheim, R. (Eds) Information Systems Research: Contemporary Approaches and Emergent Traditions. Elsevier Science, Amsterdam, pp. 551-564.

Bødker, S. (1996) Applying Activity Theory to Video Analysis: How to Make Sense of Video Data in HCI. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction, MIT Press, Cambridge, MA, pp. 147-174-.

Bødker, S. (1997) Computers in Mediated Human Activity. Mind, Culture and Activity, 4(3), pp. 149-158.

Bødker, S. (2000) Scenarios in User-Centred Design: Setting the stage for reflection and action. Interacting with Computers, 13(1), pp. 61-75.

Bødker, S. and Buur, J. (2002) The Design Collaboratorium: A place for usability design. ACM Transactions on Computer-Human Interaction, 9(2), pp. 152 - 169.

Bradford, J. S. (1994) Evaluating High-level Design: Synergistic Use of Inspection and Usability Methods for Evaluating Early Software Designs. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 235-253.

Brinkkemper, S. (1996) Method Engineering: Engineering of Information Systems Development Methods and Tools. Information and Software Technology, 38(4), pp. 275-280.

Brinkkemper, S., Saeki, M. and Harmsen, F. (1999) Meta-Modelling Based Assembly Techniques for Situational Method Engineering. Information Systems. 24(3), pp. 209-228.

Brooks, F. P. (1988) Grasping Reality through Illusion: Interactive Graphics Serving Science. Proceedings of CHI’88, ACM Press, New York, pp. 1-11.

Brooks, P. (1994) Adding Value to Usability Testing. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 255-271.

- 362 -

Butler, K. A. (1996) Usability Engineering turns 10. interactions, 3(1), pp. 58-75.

Card, S. K., Moran, T. P. and Newell, A. (1980) The keystroke-level model for user performance time with interactive systems. Communictions of the ACM, 23(7), pp. 396-410.

Card, S. K., Moran, T. P. and Newell, A. (1983) The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.

Card, S. K., Moran, T. P. and Newell, A. (1986) The Model Human Processor. In Boff, K. R., Kaufman, L. & Thomas, J. P. (Eds) Handbook of Perception and Human Performance, Volume 2: Cognitive processes and performance. John Wiley & Sons, New York.

Carroll, J. M. (1995a) Scenario-based Design: Envisioning work and technology in system development. John Wiley & Sons, New York.

Carroll, J. M. (1995b) Introduction: The scenario perspective on system development. In Carroll, J. M. (Ed), Scenario-based Design: Envisioning work and technology in system development. John Wiley & Sons, New York, pp. 1-17.

Carroll, J. M. (2000a) Five reasons for scenario-based design. Interacting with Computers, 13(1), pp. 43-60.

Carroll, J. M. (2000b) Making Use: Scenarios and Scenario-Based Design. Symposium on Designing Interactive Systems, ACM Press, New York, p. 4.

Carroll, J. M. (2000c) Making Use: Scenario-based design of human-computer interactions. MIT Press, Cambridge, MA.

Carroll, J. M. and Rosson, M. B. (1992) Getting Around the Task Artifact Cycle: How to make claims and design by scenario. ACM Transactions on Information Systems, 10(2), pp. 181-212.

Cavaye, A. L. M. (1996) Case Study Research: A multi-faceted research approach for IS. Information Systems Journal. 6(3), pp. 227-242.

Chin, J. P., Diehl, V. A. and Norman, K. L. (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Proceedings of CHI’88, ACM Press, New York, pp. 213-218.

Christiansen, E. (1996) Tamed by a Rose: Computers as Tools in Human Activity. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA, pp. 175-198.

Chrusch, M. (2000) The Whiteboard: Seven great myths of usability. interactions, 7(5), pp. 13-16.

Cluts, M. M. (2003) The Evolution of Artifacts in Cooperative Work: Constructing meaning through activity. Proceedings of the International SIGGROUP Conference on Supporting Group Work, ACM Press, New York, pp. 144-152.

Cole, M. (1988) Cross-cultural Research in the Sociohistorical Tradition. Human Development, 31, pp. 137-151.

Cole, M. (1996) Cultural Psychology: A once and future discipline. Harvard University Press (Belknap Press), London.

Cordes, R. E. (2001) Task-Selection Bias: A Case for User-Defined Tasks. International Journal of Human Computer Interaction, 13(4), pp. 411-419.

Cuomo, D. L. (1994) A method for assessing the usability of graphical, direct-manipulation style interfaces. International Journal of Human-Computer Interaction, 6(3), pp. 275-297.

Cuomo, D. L. and Bowen, C. D. (1994) Understanding usability issues addressed by three user-system interface evaluation techniques. Interacting with Computers, 6(1), pp. 86-108.

- 363 -

Darke, P., Shanks, G. and Broadbent, M. (1998) Successfully completing case study research: Combining rigour, relevance and pragmatism. Information Systems Journal. 8(4), pp. 273-289.

Dayton, T., Tudor, L. G. and Root, R. W. (1994) Bellcore's user-centred-design support center. Behaviour & Information Technology, 13(1&2), pp. 57-66.

Desurvire, H. W. (1994) Faster, cheaper!! Are usability inspection methods as effective as empirical testing?. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 173-202.

Desurvire, H. W., Kondziela, J. M. and Atwood, M. E. (1992) What is gained and lost when using evaluation methods other than empirical testing. “People and Computers VII”, Proceedings of HCI 92, Cambridge University Press, pp. 89-102.

Dix, A., Finlay, J., Abowd, G. and Beale, R. (1998) Human Computer Interaction (2nd ed). Prentice Hall.

Doubleday, A., Ryan, M., Springett, M. and Sutcliffe, A. (1997) A comparison of usability techniques for evaluating design. Proceedings of DIS 97, ACM Press, New York, pp. 101-110.

Draper, S. W. (1992) Activity theory: The new direction for HCI?. International Journal of Man-Machine Studies, 37(6), pp. 812-821.

Dumas, J. S. and Redish, J. C. (1993) A Practical Guide to Usability Testing. Ablex Publishing, Norwood, NJ.

Dutt, A., Johnson, H. and Johnson, P. (1994) Evaluating Evaluation Methods. “People and Computers IX”, Proceedings of HCI 94, Cambridge University Press, pp. 109-121.

Ehn, P. (1988) Work-oriented design of computer artifacts. Arbetslivscentrum, Falköping, Sweden.

Ellison, M. and McGrath, M. (1998) Business Process Modelling using Activity Theory: An Approach to Data Capture and Analysis. In Hasan, H., Gould, E., Larkin, P. & Vrazalic, L. (Eds) Information Systems and Activity Theory: Volume 2 Theory and Practice, University of Wollongong Press, pp. 143-172.

Engeström, Y. (1987) Learning by Expanding: An activity-theoretical approach to developmental research. Orienta-Konsultit Oy, Helsinki, Finland.

Engeström, Y. (1990) Learning, working and imagining: Twelve studies in activity theory. Orienta-Konsultit Oy, Helsinki, Finland.

Engeström, Y. (1999) Activity Theory and Individual and Social Transformation. In Engeström, Y., Miettinen, R. & Punamäki, R.L. (Eds) Perspectives on Activity Theory. Cambridge University Press.

Engeström, Y., Engeström, R. and Vahaaho, T. (1999b) When the center does not hold: The importance of knotworking. In Chaiklin, S., Hedegaard, M. & Jensen, U. J. (Eds) Activity Theory and Social Practice. Aarhus University Press, Aarhus, Denmark.

Engeström, Y. and Middleton, D. (1996) Cognition and Communication at Work. Cambridge University Press.

Engeström, Y., Miettinen, R. and Punamäki, R.L. (1999a) Perspectives on Activity Theory. Cambridge University Press.

Ereback A.L. and Hook, K. (1994) Using Cognitive Walkthrough for Evaluating a CSCW Application. “Celebrating Interdependence”, Proceedings of CHI’94, ACM Press, New York, pp. 91-92.

- 364 -

Ericsson, K. A. and Simon, H. A. (1984) Protocol analysis: Verbal reports as data. MIT Press, Cambridge, MA.

Fath, J. L., Mann, T. L. and Holzman, T. G. (1994) A Practical Guide to Using Software Usability Labs: Lessons Learned at IBM. Behaviour & Information Technology, 13(1&2), pp. 25-35.

Finkelstein, A. and Dowell, J. (1996) A Comedy of Errors: The London Ambulance Service Case Study. Proceedings of the 8th International Workshop on Software Specification and Design (IWSSD-8), IEEE CS Press, pp. 2-4.

Fisher, J. (2001) User Satisfaction and System Success: Considering the Development Team. Australian Journal of Information Systems, 9(1), pp. 21-29.

Fitzgerald, B. and Howcroft, D. (1998) Towards Dissolution of the IS Research Debate: From Polarisation to Polarity. Journal of Information Technology. 13(4), pp. 313-326.

Fitzpatrick, R. (1999) Strategies for Evaluating Software Usability [online], Department of Mathematics, Statistics and Computer Science, Dublin Institute of Technology. Available from: http://www.comp.dit.ie/rfitzpatrick/papers/chi99%20strategies.pdf [Accessed May 10, 2004].

Fitzpatrick, R. and Dix, A. (1999) A Process for Appraising Commercial Usability Evaluation Methods. “Creating New Relationships”, Proceedings of HCI International ’99, Lawrence Erlbaum Associates, Hillsdale, NJ.

Fjeld, M., Lauche, K., Bichsel, M., Voorhorst, F., Krueger, H. and Rauterberg, M. (2002) Physical and Virtual Tools: Activity Theory Applied to the Design of Groupware. Computer Supported Cooperative Work, 11(1-2), pp. 153-180.

Gable, G. (1994) Integrating Case Study and Survey Research Methods: An Example in Information Systems. European Journal of Information Systems, 3(2), pp. 112-126.

Galliers, R.D. and Land, F. (1987) Choosing appropriate information systems research methodologies. Communications of the ACM, 30(11), pp. 900-902.

Gifford, B. and Enyedy, N. (1999) Activity Centered Design: Towards a theoretical framework for CSCL. Proceedings of the 3rd International Conference on Computer Support for Collaborative Learning (CSCL), pp. 189-196.

Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N. and Wichansky, A. M. (2002) Eye Tracking in Web Search Tasks: Design Implications, Proceedings of Eye Tracking Research and Applications (ETRA) Conference, pp. 51-58.

Goldkuhl, G., Lind, M. and Seigerroth, U. (1998) Method Integration: The Need for a Learning Perspective. IEE Proceedings - Software, 145(4), pp. 113–118.

Good, M. D. (1989) Seven experiences with contextual field research. ACM SIGCHI Bulletin, 20(4), pp. 25-32.

Göransson, B. (2004) User-Centred System Design: Designing Usable Interactive Systems in Practice. PhD Thesis, Uppsala University, Finland.

Gorry, G. A. and Scott Morton, M. S. (1971) A Framework for Management Information Systems. Sloan Management Review, 13(1), pp. 55-70.

Gould, J. D. and Lewis, C. (1985) Designing for Usability: Key Principles and What Designers Think. Communications of the ACM, 28(3), pp. 300-311.

Gray, W. D. and Salzman, M. C. (1998) Damaged merchandise? A review of experiments that compare usability evaluation methods. Human-Computer Interaction, 13(3), pp. 203-261.

Greenbaum, J. and Kyng, M. (1991) Design at Work: Cooperative design of computer systems. Lawrence Erlbaum Associates, Hillsdale, NJ.

- 365 -

Hackman, G. S. and Biers, D. W. (1992) Team usability testing: Are two heads better than one?. “Innovations for Interactions”, Proceedings of the Human Factors and Ergonomics Society 36th Annual Meeting, Human Factors and Ergonomics Society (HFES), pp. 1205-1209.

Hammontree, M., Weiler, P. and Nayak, N. (1994) Remote usability testing. interactions, 1(3), pp. 21-25.

Hartson, H. R. (1998) Human-Computer Interaction: Interdisciplinary roots and trends. The Journal of Systems and Software, 43(2), pp. 103-118.

Hartson, H. R. and Hix, D. (1989) Toward Empirically Derived Methodologies and Tools for Human-Computer Interface Development. International Journal of Man-Machine Studies, 31(4), pp. 477-494.

Hartson, H. R., Castillo, J. C., Kelso, J., Kamler, J. and Neale, W. C. (1996) Remote evaluation: The network as an extension of the usability laboratory. “Common Ground” Proceedings of CHI’96, ACM Press, New York, pp. 228-235.

Hartson, H. R., Andre, T. S. and Williges, R. C. (2001) Criteria for evaluating usability evaluation methods. International Journal of Human-Computer Interaction, 13(4), pp. 373-410.

Hasan, H. M. (2000) Designing Systems to Support Complex Work: The Application of Activity Theory to HCI. “Interfacing Reality in the New Millennium”, Proceedings of OZCHI 2000, University of Technology Sydney.

Hasu, M. and Engeström, Y. (2000) Measurement in Action: An Activity-Theoretical Perspective on Producer-User Interaction. International Journal of Human Computer Studies, 53(1), pp. 61-89.

Haunold, P. and Kuhn, W. (1994) A keystroke level analysis of a graphics application: Manual map digitizing. “Celebrating Interdependence”, Proceedings of CHI’94, ACM Press, New York, pp. 337-343.

Hempel, C. G. (1966) Philosophy of Natural Science. Prentice Hall.

Hertzum, M. and Jacobsen, N. E. (2001) The Evaluator Effect: A chilling fact about usability evaluation methods. International Journal of Human-Computer Interaction, 13(4), pp. 421-443.

Hill, R., Capper, P., Hawes, K. and Wilson, K. (2001) Using Activity Theory and Developmental Work Research as Tools to Analyse Contradictions in Information and Operational Systems: a Cast Study of DHL. In Hasan, H., Gould, E., Larkin, P. & Vrazalic, L. (Eds) Information Systems and Activity Theory: Volume 2 Theory and Practice. University of Wollongong Press, pp. 99-116.

Holleran, P. A. (1991) A Methodological Note on Pitfalls in Usability Testing. Behaviour & Information Technology, 10(5), pp. 345-357.

Hollnagel, E. (1993) Human Reliability Analysis Context and Control. Academic Press.

Hong, J. I., Heer, J., Waterson, S. and Landay, J. A. (2001) WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. ACM Transactions on Information Systems, 19(3), pp. 263-285.

Howard, S. and Murray, M. D. (1987) A taxonomy of evaluation techniques for HCI. Proceedings of INTERACT '87, IOS Press, pp. 453-459.

Hutchins, E. (1995) Cognition in the Wild. MIT Press, Cambridge, MA.

IEEE (1990) IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. Institute of Electrical and Electronics Engineers, New York.

Iivari J. (1991) A paradigmatic analysis of contemporary schools of IS development. European

- 366 -

Journal of Information Systems. 1(4), pp. 249-272.

Iivari, J. (2003) Towards Information Systems as a Science of Meta-Artifacts. Communications of the AIS, 12, pp. 568-581.

Il'enkov, E. V. (1977) Dialectical logic: Essays in its history and theory. Progress, Moscow.

ISO 9241-11 (1998) Ergonomic requirements for office work with visual display terminals (VDTs) - Part 11: Guidance on usability [online]. International Organization for Standardization. Available from: http://www.iso.ch/iso/en/ISOOnline.frontpage [Accessed March 23, 2003].

Ives, B., Hamilton, S. and Davis, G. B. (1980) A Framework for Research in Computer-Based Management Information Systems. Management Science, 26(9), pp. 910-934.

Ivory, M. Y. and Hearst, M. A. (2001) The State of the Art in Automated Usability Evaluation of User Interfaces. ACM Computing Surveys, 33(4), pp. 1-47.

Jacobsen, N. E., Hertzum, M. and John, B. E. (1998) The Evaluator Effect in Usability Tests. “Making the Impossible Possible”, Proceedings of CHI’98, Conference Summary, ACM Press, New York, pp. 255-256.

Jarvenpaa, S. (1988) The Importance of Laboratory Experimentation in Information Systems Research. Communications of the ACM, 31(12), pp. 1502-1504.

Jarvenpaa, S., Dickson, G. and DeSanctis, G. (1985) Methodological Issues in Experimental IS Research: Experiences and Recommendations, MIS Quarterly, 9(2), pp. 141-156.

Järvinen, P. (1999) On Research Methods. Opinpaya Oy, Tampere, Finland.

Jeffries, R., Miller, J. R., Wharton, C. and Uyeda, K. M. (1991) User interface evaluation in the real world: A comparison of four techniques. “Reaching Through Technology”, Proceedings of CHI’91, ACM Press, New York, pp. 119-124.

John, B. E. and Kieras, D. E. (1996) Using GOMS for user interface design and evaluation: Which technique?. ACM Transactions on Computer-Human Interaction, 3(4), pp. 287-319.

John, B. E. and Marks, S. J. (1997) Tracking the effectiveness of usability evaluation methods. Behaviour & Information Technology, 16(4&5), pp. 188-202.

John, B. E. and Packer, H. (1995) Learning and using the cognitive walkthrough method: A case study approach. “Mosaic of Creativity”, Proceedings of CHI’95, ACM Press, New York, pp. 429-436.

Jordan, P. W. (1998) An Introduction to Usability. Taylor & Francis Books.

Kahn, M. K. and Prail, A. (1994) Formal Usability Inspections. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 141-171.

Kanalakis, J. (2003) Developing .NET Enterprise Solutions. Apress.

Kantner, L. and Rosenbaum, S. (1997) Usability Studies of WWW Sites: Heuristic Evaluation vs. Laboratory Testing. Proceedings of SIGDOC, ACM Press, New York, pp. 153-160.

Kaplan, A. (1964) The Conduct of Inquiry. Crowell, New York.

Kaptelinin, V. (1992) Human Computer Interaction In Context: The Activity Theory Perspective, Proceedings of the 2nd East-West Conference on Human-Computer Interaction, ICSTI, Moscow, pp. 7-13.

Kaptelinin, V. (1996) Activity Theory: Implications for Human-Computer Interaction. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA, pp. 103-116.

- 367 -

Kaptelinin, V. (2002) Making Use of Social Thinking: The Challenge of Bridging Activity Systems. In Dittrich, Y., Floyd, C. & Klischewski, R. (Eds) Social Thinking: Software Practice. MIT Press, Cambridge, MA, pp. 45-68.

Kaptelinin, V. and Nardi, B. (1997) Activity Theory: Basic Concepts and Applications. “Looking to the Future”, Proceedings of CHI’97, ACM Press, New York.

Kaptelinin, V., Nardi, B. and Macaulay, C. (1999) The Activity Checklist: A tool for representing the “space” of context. interactions, 6(4), pp. 27-39.

Karat, C. M. (1994) A Comparison of User Interface Evaluation Methods. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 203-233.

Karat, J. (1997) User-centred software evaluation methodologies. In Helander, M. G., Landauer, T. K. & Prabhu, P. V. (Eds) Handbook of Human-Computer Interaction (2nd ed). Elsevier Science, Amsterdam, pp. 689-704.

Karat, J. and Bennett, J. L. (1991) Working within the design process: Supporting effective and efficient design. In Carroll, J. M. (ed), Designing Interaction: Psychology at the Human-Computer Interface, Cambridge University Press, pp. 269-285.

Karat, C. M., Campbell, R. L. and Fiegel, T. (1992) Comparison of empirical testing and walkthrough methods in user interface evaluation. “Striking a Balance” Proceedings of CHI‘92, ACM Press, New York, pp. 397-404.

Kato, T. (1986) What "question-asking protocols" can say about the user interface. International Journal of Man-Machine Studies, 25(6), pp. 659-673.

Keen, P. G. W. (1987) MIS Research: Current Status, Trends and Needs. In Buckingham, R. A., Hirschheim, R. A., Land, F. F. & Tully, C. J. (Eds) Information Systems Education: Recommendations and Implementation. Cambridge University Press, pp. 1-13.

Kieras, D. E. and Polson, P. G. (1985). An Approach to the Formal Analysis of User Complexity. International Journal of Man-Machine Studies, 22(4), pp. 365-394.

Kitzinger, J. (1994) The Methodology of Focus Groups: The importance of interaction between research participants. Sociology of Health, 16(1), pp. 103-121.

Kitzinger, J. (1995) Introducing focus groups. British Medical Journal, 311, pp. 299-302.

Klein, H. K. and Myers, M. D. (1999) A Set of Principles for Conducting and Evaluating Interpretive Field Studies in Information Systems. MIS Quarterly, 23(1), pp. 67-93.

Kling, R. and Iacono, S. (1989) The Institutional Character of Computerized Information Systems. Office: Technology & People, 5(1), pp. 7-28.

Korpela, M., Soriyan, H. A. and Olufokunbi, K. C. (2000) Activity analysis as a method for information systems development: General introduction and experiments from Nigeria and Finland. Scandinavian Journal of Information Systems, 12, pp. 191-210.

Korpela, M., Mursu, A. and Soriyan, H.A. (2002) Information systems development as an activity. Computer Supported Cooperative Work, 11(1-2), pp. 111-128.

Kumar, K. and Welke, R. J. (1992) Methodology Engineering: A proposal for situation-specific methodology construction. In Cotterman, W. W. & Senn, J. A. (Eds) Challenges and Strategies for Research in Systems Development. John Wiley & Sons, pp. 257-269.

Kurosu, M., Matsuura, S. and Sugizaki, M. (1997) Categorical Inspection Method - Structured Heuristic Evaluation (sHEM). Proceedings of IEEE International Conference on Systems, Man and Cybernetics, IEEE, pp. 2613-2618.

Kuutti, K. (1991) Activity Theory and its applications to information systems research and development. In Nissen, H. E., Klein, H. K. & Hirscheim, R. (Eds) Information

- 368 -

Systems Research: Contemporary Approaches and Emergent Traditions. Elsevier Science, Amsterdam, pp. 529-549.

Kuutti, K. (1992) HCI Research Debate and Activity Theory Position, Proceedings of the 2nd East-West Conference on Human-Computer Interaction, ICSTI, Moscow, pp. 13-22.

Kuutti, K. (1996) Activity Theory as a Potential Framework for Human-Computer Interaction Research. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA, pp. 17-44.

Kuutti, K. (1999) Activity Theory, Transformation of Work and Information Systems. In Engeström, Y., Miettinen, R. & Punamäki, R. L. (Eds) Perspectives on Activity Theory. Cambridge University Press, pp. 360-376.

Kuutti, K. and Arvonen, T. (1992) Identifying Potential CSCW Applications by Means of Activity Theory Concepts: A Case Example, Proceedings of Computer-Supported Cooperative Work (CSCW’92), ACM Press, New York, pp. 233-240.

Kuutti, K. and Molin-Juustila, T. (1998) Information System Support for "Loose" Coordination in a Network Organization: An Activity Theory Perspective. In Hasan, H., Gould, E. & Hyland, P. (Eds) Information systems and Activity Theory: Tools in Context. University of Wollongong Press, pp. 73-92.

Kuutti, K. and Virkkunen, J. (1995) Organisational Memory and Learning Network Organisations: the Case of Finnish Labour Protection Inspectors. Proceedings of the 28th Annual Hawaii International Conference on Systems Science (HICSS), IEEE, pp. 313-322.

Landauer, T. (1991) Let's get real: A position paper on the role of cognitive psychology in the design of humanly useful and usable systems. In Carroll, J.M. (Ed) Designing Interaction: Psychology at the Human-Computer Interface, Cambridge University Press, pp. 60-73.

Landauer, T. K. (1995) The Trouble with Computers: Usefulness, Usability, and Productivity. MIT Press, Cambridge, MA.

Landry, M. and Banville, C. (1992) A Disciplined Methodological Pluralism for MIS Research. Accounting, Management and Information Technology. 2(2), pp. 77-97.

Larkin, P. A. J. and Gould, E. (2000) Activity Theory Applied to the Corporate Memory Loss Problem. “Doing IT Together”, Proceedings of the 23rd Information Systems Research Seminar in Scandinavia (IRIS 23).

Lave, J. (1988) Cognition in practice. Cambridge University Press.

Lavery, D. and Cockton, G. (1997) Cognitive walkthrough usability evaluation materials. Technical Report 1997-20, University of Glasgow, Glasgow, UK.

Leont’ev, A. N. (1974) The Problem of Activity in Psychology. Soviet Psychology, 13(2), pp. 4-33.

Leont’ev, A. N. (1978) Activity, Consciousness, and Personality. Prentice Hall.

Leont’ev, A. N. (1981) Problems of the Development of the Mind. Progress Publishers, Moscow.

Lewis, J. R. (1994) Sample sizes for usability studies: Additional considerations. Human Factors, 36(2), pp. 368-378.

Lewis, J. R. (1995) IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction, 7(1), pp. 57-78.

Lewis, C., Polson, P., Wharton, C. and Rieman, J. (1990) Testing a walkthrough methodology for theory-based design of walk-up-and-use interfaces. “Empowering People” Proceedings of CHI’90, ACM Press, New York, pp. 235-242.

- 369 -

Lewis, C. and Wharton, C. (1997). Cognitive Walkthroughs. In Helander, M. G., Landauer, T. K. & Prabhu, P. V. (Eds) Handbook of Human-Computer Interaction (2nd ed). Elsevier Science, Amsterdam, pp. 717-732.

Lim, C. P. and Hang, D. (2003) An activity theory approach to research of ICT integration in Singapore schools. Computers & Education, 41(1), pp. 49-63.

Lin, H. X., Choong, Y.Y. and Salvendy, G. (1997) A proposed index of usability: A method for comparing the relative usability of different software systems. Behaviour & Information Technology, 16(4 & 5), pp. 267-278.

Lind, M. (2003) Commentary paper: Methods and Projects as Action Patterns for Enhancing Improvisation. In Goldkuhl, G., Lind, M. & Ågerfalk, P. (Eds) Proceedings of Action in Language, Organisations and Information Systems (ALOIS), Linköping University, Linköping, Sweden, pp. 123-138.

Lindgaard, G. (1994) Usability Testing and System Evaluation: A Guide for Designing Useful Computing Systems. Chapman & Hall.

Lund, A. M. (1997) Expert ratings of usability maxims. Ergonomics in Design, 5(3), pp. 15-20.

Luria, A. R. (1979) The Making of Mind. Harvard University Press, Cambridge, MA.

Mack, R. L. and Nielsen, J. (1994) Executive summary. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 1-23.

March, S. T. and Smith, G. F. (1995) Design and Natural Science Research on Information Technology. Decision Support Systems, 15(4), pp. 251-266.

Marx, K. (1915) Capital Vol I, Chicago.

Mayo, E. (1933) The Human Problems of an Industrial Civilization. Viking, New York.

Mayhew, D. J. (1999) The usability engineering lifecycle: A practitioner's handbook for user interface design. Morgan Kaufmann, San Francisco.

Miettinen, R. (1999) The riddle of things. Activity theory and actor network theory as approaches of studying innovations. Mind, Culture, and Activity, 6(3), pp. 170-195.

Molich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D. and Kirakowski, J. (1998) Comparative evaluation of usability tests. “Capitalizing on Usability”, Proceedings of the Usability Professionals’ Association (UPA) Conference, pp. 189-200.

Molich, R., Thomsen, A. D., Karyukina, B., Schmidt, L., Ede, M., van Oel, W. and Arcuri, M. (1999) Comparative evaluation of usability tests. “The CHI is the Limit”, Proceedings of CHI’99, Extended Abstracts, ACM Press, New York, pp. 83-84.

Monk, A., Wright, P., Haber, J. and Davenport, L. (1993) Improving your human-computer interface: A practical technique. Prentice Hall.

Muller, M. J., Wildman, D. M. and White, E. A. (1993) Taxonomy of PD Practices: A Brief Practitioners Guide. Communications of the ACM, 36(4), pp. 26-27.

Mwanza, D. (2001) Where Theory Meets Practice: A Case for an Activity Theory based Methodology to guide Computer System Design. Proceedings of INTERACT '01, IOS Press, pp. 342-349.

Mwanza, D. (2002) Towards an Activity-Oriented Design Method for HCI Research and Practice. PhD Thesis, The Open University, UK.

Myers, M. D. (1999) Investigating information systems with ethnographic research. Communications of the AIS, 2.

- 370 -

Nardi, B. A. (1992) Studying Task-Specificity: How we could have done it right the first time with Activity Theory. Proceedings of East-West HCI Conference. St Petersburg, ICSTI, pp. I-6 - I-12.

Nardi, B. A. (1996a) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA.

Nardi, B. A. (1996b) Activity Theory and Human-Computer Interaction. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA, pp. 7-16.

Nardi, B. A. (1996c) Studying Context: A Comparison of Activity Theory, Situated Action Models, and Distributed Cognition. In Nardi, B. A. (Ed) Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge, MA, pp. 69-102.

Nardi, B. A. and O’Day, V. L. (1999) Information Ecologies: Using Technology With Heart. MIT Press, Cambridge, MA.

Nielsen, J. (1989) Usability engineering at a discount. Proceedings of HCI International ‘89, pp. 394-401.

Nielsen, J. (1990) Designing for International Use. “Human Factors in Computing Systems”, Proceedings of CHI'90, ACM Press, New York, pp. 291-294.

Nielsen, J. (1992) Finding usability problems through heuristic evaluation. “Striking a Balance”, Proceedings of CHI’92, ACM Press, New York, pp. 373-380.

Nielsen, J. (1993) Usability Engineering. Academic Press, Boston.

Nielsen, J. (1994a) Heuristic evaluation. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 25-62.

Nielsen, J. (1994b) Usability laboratories. Behaviour & Information Technology, 13(1&2), pp. 3-8.

Nielsen, J. (2000) Designing Web Usability: The Practice of Simplicity. New Riders Publishing, Indianapolis.

Nielsen, J. (2001) Usability Metrics [online], Alertbox, January 21, Available from: http://www.useit.com/alertbox/20010121.html [Accessed May 18, 2004].

Nielsen, J. (2002) Top Ten Guidelines for Homepage Usability [online], Alertbox, May 12, Available from: http://www.useit.com/alertbox/20020512.html [Accessed May 12, 2004].

Nielsen, J. (2004) Risks of Quantitative Studies [online], Alertbox, March 1, Available from: http://www.useit.com/alertbox/20040301.html [Accessed April 21, 2004].

Nielsen, J. and Landauer, T. K. (1993) A mathematical model of the finding of usability problems. “Bridges Between Worlds”, Proceedings of INTERCHI’93, ACM Press, New York, pp. 206-213.

Nielsen, J. and Molich, R. (1990) Heuristic evaluation of user interfaces. “Empowering People”, Proceedings of CHI’90, ACM Press, New York, pp. 249-256.

Nielsen, J. and Phillips, V. L. (1993) Estimating the relative usability of two interfaces: Heuristic, formal, and empirical methods compared. “Bridges Between Worlds”, Proceedings of INTERCHI’93, ACM Press, New York, pp. 214-221.

Nolan, R. L. and Wetherbe, J. C. (1980) Toward a Comprehensive Framework for MIS Research. MIS Quarterly, June, pp. 1-19.

Norman, D. A. (1986) Cognitive engineering. In Norman, D. A. & Draper, S. W. (Eds), User centered system design: New perspectives on human-computer interaction. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 31-61.

- 371 -

Norman, D. A. (1988) The Psychology of Everyday Things. Basic Books, New York.

Nunamaker, J. F., Chen, M. and Purdin, T. D. M. (1991) Systems Development in Information Systems Research. Journal of Management Information Systems, 7(3), pp. 89-106.

Olson, J. R. and Olson, G. M. (1990) The growth of cognitive modeling in human-computer interaction since GOMS. Human-Computer Interaction, 5(2&3), pp. 221-265.

O'Malley, C. E., Draper, S. W. and Riley, M. S. (1984) Constructive interaction: A method for studying human-computer-human interaction. Proceedings of INTERACT ‘84, IOS Press, pp. 269-274.

Omanson, R. C. and Schwartz, A. L. (1997) Usability Testing of Web Sites at Ameritech. Usability Testing World Wide Websites Workshop, “Looking to the Future”, Proceedings of CHI’97, ACM Press, Available from: http://www.acm.org/sigchi/chi97/proceedings/workshop [Accessed April 2, 2004].

Oppermann, R. and Reiterer, H. (1997) Software Evaluation using the 9241 Evaluator. Behaviour & Information Technology, 16(4/5), pp. 232-245.

Pavlov, I. P. (1960) Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Dover, New York.

Pejtersen, A. M. and Rasmussen, J. (1997) Effectiveness testing of complex systems. In Salvendy, G. (Ed) Handbook of Human Factors and Ergonomics. John Wiley & Sons, New York, pp. 1514-1542.

Pinsonneault, A. and Kraemer, K. L. (1993) Survey Research Methodology in Management Information Systems: An assessment. Journal of Management Information Systems, 10(2), pp.75-105.

Polson, P., Lewis, C., Rieman, J. and Wharton, C. (1992) Cognitive Walkthroughs: A method for theory-based evaluation of user interfaces. International Journal of Man-Machine Studies, 36(5), pp. 741-773.

Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S. and Carey, T. (1994) Human-Computer Interaction. Addison-Wesley, Wokingham, UK.

Preece, J., Rogers, Y. and Sharp, H. (2002) Interaction Design: Beyond human-computer interaction. John Wiley & Sons, New York.

Reiterer, H. and Oppermann, R. (1993) Evaluation of user interfaces: EVADIS II - A comprehensive evaluation approach. Behaviour & Information Technology, 12(3), pp. 137-148.

Rieman, J., Davies, S., Hair, D. C., Esemplare, M., Polson, P. G. and Lewis, C. (1991) An automated cognitive walkthrough. “Reaching Through Technology”, Proceedings of CHI’91, ACM Press, New York, pp. 427-428.

Robert, D. (1997) Creating an Environment for Project Success. Information Systems Management, 14(1), pp. 73-77.

Robey, D. (1996) Research Commentary. Diversity in Information Systems Research: Threat, promise, and responsibility. Information Systems Research. 7(4), pp. 400-408.

Rosenbaum, S. (1989) Usability Evaluations versus Usability Testing: When and Why?. IEEE Transactions on Professional Communication, 32(4), pp. 210-216.

Rosson, M. B. and Carroll, J. M. (2002) Usability engineering: Scenario-based development of human-computer interaction. Morgan Kaufmann, San Francisco.

Rowley, D. E. and Rhoades, D. G. (1992) The cognitive jogthrough: A fast-paced user interface evaluation procedure. “Striking A Balance”, Proceedings of CHI’92, ACM Press, New York, pp. 389-395.

- 372 -

Rubin, J. (1994) Handbook of usability testing. John Wiley & Sons, New York.

Sazegari, S. (1994) Designing a Usability Lab: A Case Study from Taligent. Behaviour & Information Technology, 13(1&2), pp. 20-24.

Scriven, M. (1967) The methodology of evaluation. In Tyler, R., Gagne, R. & Scriven, M. (Eds) Perspectives of curriculum evaluation. Rand McNally, Chicago, pp. 39-83.

Sears, A. (1997) Heuristic Walkthroughs: Finding the problems without the noise. International Journal of Human-Computer Interaction, 9(3), pp. 213-234.

Seidel, E. J. (1998) When Testing for Ease of Use and Testing for Functionality Diverge. Usability Interface, 5(2).

Shackel, B. (1986) Ergonomics in Design for Usability. In Harrison, M. D. & Monk, A. F. (Eds) “People and Computers: Designing for Usability”, Proceedings of HCI 86, Cambridge University Press, pp. 45-64.

Shneiderman, B. (1992) Designing the user interface: Strategies for effective human-computer interaction (2nd ed). Addison-Wesley, Reading, MA.

Simon, H. A. (1969) The Sciences of the Artificial. MIT Press, Cambridge, MA.

Singh, S. (2001) Studying the User: A matter of perspective. Media International Australia. 98, pp. 113-128.

SINTEF Group (2001) The Foundation for Scientific and Industrial Research at the Norwegian Institute of Technology [online]. Available from: http://www.oslo.sintef.no/ avd/32/3270/brosjyrer/ engelsk/6.html [Accessed March 17, 2001].

Smith, S. L. and Mosier, J. N. (1986) Guidelines for designing user interface software. Report ESD-TR-86-278, The MITRE Corporation, Bedford, MA.

Spencer, R. (2000) The streamlined cognitive walkthrough method. “The Future is Here” Proceedings of CHI 2000, ACM Press, New York, pp. 353-359.

Spinuzzi, C. (1999) Grappling with distributed usability: A cultural-historical examination of documentation genres over four decades. Proceedings of the 17th Annual International Conference on Computer Documentation (SIGDOC), ACM Press, New York, pp. 16-21.

Spool, J. M., Scanlon, T., Schroeder, W., Snyder, C. and DeAngelo, T. (1999) Web Site Usability: A Designer’s Guide, Morgan Kaufmann, San Francisco.

Star, S. L. (1989) The Structure of Ill-Structured Solutions: Heterogeneous Problem-Solving, Boundary Objects and Distributed Artificial Intelligence. In Gasser, L. & Huhns, M. (Eds) Distributed Artificial Intelligence Vol 2. Morgan Kaufmann, San Francisco, pp. 37-54.

Star, S. L. (1997) Working Together: Symbolic Interactionism, Activity Theory and Information Systems. In Engeström, Y. & Middleton, D. (Eds) Communication and Cognition at Work. Cambridge University Press, pp. 296-318.

Stair, R. M. and Reynolds, G. W. (2003) Principles of Information Systems: A Managerial Approach (6th ed). Thomson - Course Technology.

Suchman, L. (1987) Plans and Situated Actions: The Problem of Human Machine Communication. Cambridge University Press.

Sutcliffe, A. G. (1995) Human-Computer Interface Design (2nd ed). Macmillan Press Ltd, Basingstoke, UK.

Sutcliffe, A. G. (2002) Assessing the Reliability of Heuristic Evaluation for Website Attractiveness and Usability. Proceedings of the 35th Hawaii International Conference on System Sciences (HICSS), IEEE, pp. 137-146.

- 373 -

Sutcliffe, A. G., Ryan, M., Doubleday, A. and Springett, M. V. (2000) Model mismatch analysis: towards a deeper explanation of users’ usability problems. Behaviour & Information Technology, 19(1), pp. 43-55.

Sweeney, M., Maguire, M., and Shackel, B. (1993) Evaluating user-computer interaction: a framework. International Journal of Man-Machine Studies, 38(4), pp. 689-711.

Szczur, M. (1994) Usability testing on a budget: A NASA usability test case study. Behaviour & Information Technology, 13(1&2), pp. 106-118.

Thimbleby, H. (1990) User Interface Design. Addison Wesley, Harlow, UK.

Thomas, J. C. and Kellogg, W. A. (1989) Minimizing ecological gaps in interface design. IEEE Software, 6(1), pp. 78-86.

Thomas, P. and Macredie, R. D. (2002) Introduction to The New Usability. ACM Transactions on Computer-Human Interaction, 9(2), pp. 69-73.

Tiedtke, T., Martin, C. and Gerth, N. (2002) AWUSA – A Tool for Automated Website Usability Analysis [online]. PreProceedings of the 9th International Workshop on the Design, Specification and Verification of Interactive Systems, Available from: www.uni-paderborn.de/cs/ag-szwillus/lehre/ss02/seminar/ semband/MarcoWeissenborn/WebUsability Tools/AWUS1605.pdf [Accessed May 17, 2004].

Tufte, E. R. (1997) Visual Explanations: Images and quantities, evidence and narrative. Graphics Press, Cheshire, Connecticut.

Tullis, T. S. (1993) Is user interface design just common sense?. Proceedings of HCI International ’93, pp. 9-14.

UoW Website Review Project Plan (2002) Internal document, University of Wollongong.

Virzi, R. A. (1992) Refining the test phase of usability evaluation: How many subjects is enough?. Human Factors, 34(4), pp. 457-468.

Virzi, R. A. (1997). Usability inspection methods. In Helander, M. G., Landauer, T. K. & Prabhu, P. V. (Eds) Handbook of Human-Computer Interaction (2nd ed). Elsevier Science, Amsterdam, pp. 705-715.

Virzi, R. A., Sorce, J. F. and Herbert, L. B. (1993) A comparison of three usability evaluation methods: Heuristic, think-aloud, and performance testing. Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, Human Factors and Ergonomics Society (HFES), pp. 309-313.

Vrazalic, L. (2001) Techniques to Analyse Power and Political Issues in IS Development. In Hasan, H., Gould, E., Larkin, P. & Vrazalic, L. (Eds) Information Systems and Activity Theory: Volume 2 Theory and Practice. University of Wollongong Press, pp. 39-54.

Vygotsky, L. S. (1978) Mind in society. Harvard University Press, Cambridge, MA.

Vygotsky, L. S. (1987) The Collected Works of L.S. Vygotsky, Vol 1. Plenum, New York.

Walsham, G. (1995) The emergence of interpretivism in IS research. Information Systems Research, 6(4), pp. 376-394.

Wartofsky, M. W. (1979) Models: Representation and the Scientific Understanding. D. Reidel Publishing Company, Boston.

Wertsch, J. V. (1981) The Concept of Activity in Soviet Psychology. M.E. Sharpe, New York.

Wharton, C., Bradford, J., Jeffries, R. and Franzke, M. (1992). Applying cognitive walkthroughs to more complex user interfaces: Experiences, issues, and recommendations. “Striking a Balance”, Proceedings of CHI’92, ACM Press, New York, pp. 381-388.

- 374 -

Wharton, C., Rieman, J., Lewis, C. and Polson, P. (1994) The cognitive walkthrough method: A practitioner's guide. In Nielsen, J. & Mack, R. L. (Eds) Usability Inspection Methods. John Wiley & Sons, New York, pp. 105-140.

Whitefield, A., Wilson, F. and Dowell, J. (1991) A framework for human factors evaluation. Behaviour & Information Technology, 10(1), pp. 65-79.

Whiteside, J., Bennett, J. and Holtzblatt, K. (1988) Usability Engineering: Our Experience and Evolution. In Helander, M. (Ed) Handbook of Human-Computer Interaction. Elsevier Science, Amsterdam, pp. 791-817.

Whiteside, J. and Wixon, D. (1987) The dialectic of usability engineering. Proceedings of INTERACT '87, IOS Press, pp. 17-20.

Wiklund, M. E. (1994) Usability in practice: How companies develop user-friendly products. AP Professional, Boston.

Winograd, T. and Flores, F. (1986) Understanding Computers and Cognition: A New Foundation for Design, Ablex Publishing, Norwood, NJ.

Wixon, D. and Wilson, C. (1997) The usability engineering framework for product design and evaluation. In Helander, M. G., Landauer, T. K. & Prabhu, P. V. (Eds) Handbook of Human-Computer Interaction (2nd ed). Elsevier Science, Amsterdam, pp. 653-688.

Woolrych, A. and Cockton, G. (2001) Why and when five test users aren't enough. “Interaction without Frontiers”, Proceedings of IHM-HCI 2001, pp. 105-108.

Wundt, W. (1907) Outlines of psychology. Engelmann, Leipzig.

Yourdon, E. (1989) Structured walkthroughs (4th ed). Yourdon Press, Englewood Cliffs, NJ.

Zirkler, D. and Ballman, D. R. (1994) Usability testing in a competitive market: Lessons learned. Behaviour & Information Technology, 13(1&2), pp. 191-197.

- 375 -

Appendix A

Task Scenarios

Domestic Student Tasks Task 1 You have no assignments due this week so you decide to go to the UniMovies. But first you would like to find out what’s showing. You begin your search on the Uni’s home page. Task 2 This is your first year at Uni. It is Week 4 into the Autumn Session and you are typing an essay that is due the next day. You have found a relevant quote to support your argument and would like to include it in the essay, but you are not quite sure how this is supposed to be done. You remember the lecturer mentioning that you had to use the appropriate referencing style. There is probably more information about this on the Uni’s web site. You begin your search on the Uni’s home page. Task 3 You were sick on Monday and missed a test in one of your subjects. You believe this is a legitimate reason for missing an assessment, so now that you are feeling better, you would like to apply for Special Consideration. Using the Policy Directory, find out what you need to do. Begin on the Uni’s home page. Task 4 Having just arrived at Uni for the first time, you are met with a long queue of cars outside the main gate. You pay the $3 parking fee, and then drive around for 40 minutes looking for a car park. You decide that this may not be the best way to get to Uni, after all, and promise yourself you’ll look into other transport options. You begin your search on the Uni’s home page. Task 5 You are saving up to go skiing in New Zealand in July 2004. Although that is eight months away, you would like to book early to avoid disappointment. However, you are unsure when the mid-year recess is on in 2004. You begin your search on the Uni’s home page. Task 6 Things haven’t been great lately with your finances and, unfortunately, part-time work is hard to come by at the moment. You seem to be doing OK academically so perhaps a scholarship would help. You’re sure that these are available but you want more information about them. You begin your search on the Uni’s home page. Task 7 Ever since you arrived at Uni, you’ve been having difficulties with studying and completing assessment tasks. You finally decide to get some help with this and want to find out who to contact. You begin your search on the Uni’s home page.

- 376 -

International Student Tasks Task 1 Having settled into Uni and life in a foreign country, you have decide that a part-time job would be a good idea, since you don’t have any classes on Thursdays and Fridays. The Uni’s web site may have information for students to help them find a part-time job. You begin your search on the Uni’s home page. Task 2 You are nearing the end of your degree, and realise that you have 2 more subjects to do the following session before you can graduate. However, your student visa runs out this session, and you need to extend it. You need to find out how this can be done. You begin your search on the Uni’s home page. Task 3 It is your last session at Uni and you are enrolled in your final 4 subjects. At the end of Week 2 you realise that one of the subjects you are doing is actually not required for your particular course. Having withdrawn from the subject, you now wish to get a refund for the fees that you have already paid. You need to know how to apply for a refund. You begin your search on the Uni’s home page. Task 4 It is exam time and you feel that you need a foreign translation dictionary to help you understand and answer the questions in the exam. But first you need to find out if you can take one with you into the final exam. You begin your search on the Uni’s home page. Task 5 Having just recently arrived in Australia, you would like to meet more people of your nationality to make friends and get some help with adjusting to the lifestyle. Someone mentioned to you that the Uni has various clubs that you can join to do this, but you need to know more. You begin your search on the Uni’s home page. Task 6 You have decided that on-campus accommodation at International House is not for you because you prefer to cook your own meals. So you would like to move to a sharing accommodation arrangement because you can’t afford to rent a house or unit by yourself. You want to find out what accommodation is available. You begin your search on the Uni’s home page.

- 377 -

Research Student Tasks Task 1 You have just arrived at Uni as a research student. You already know that special resources, services, assistance and facilities are available to research students, but you would like more information about what you are entitled to. You begin your search on the Uni’s home page. Task 2 You are having problems with your thesis supervisor. You feel that she is not providing you with an adequate level of supervision. You think that the Uni may have a formal procedure that must be followed in these cases to resolve the problem. You begin your search on the Uni’s home page. Task 3 You are about to begin writing your thesis, but feel that you would benefit from some assistance. Perhaps there is more information about this or person who can help you on the Uni web site. You begin your search on the Uni’s home page. Task 4 You have almost finished writing your thesis and you would like to find out more information about submitting the thesis for examination. You begin your search on the Uni’s home page. Task 5 Having just arrived at Uni for the first time, you are met with a long queue of cars outside the main gate. You pay the $3 parking fee, and then drive around for 40 minutes looking for a car park. You decide that this may not be the best way to get to Uni, after all, and promise yourself you’ll look into other transport options. You begin your search on the Uni’s home page. Task 6 You are saving up to go skiing in New Zealand in July 2004. Although that is eight months away, you would like to book early to avoid disappointment. However, you are unsure when the mid-year recess is on in 2004. You begin your search on the Uni’s home page. Task 7 Things haven’t been great lately with your finances and, unfortunately, part-time work is hard to come by at the moment. You seem to be doing OK academically so perhaps a scholarship would help. You’re sure that these are available but you want more information about them. You begin your search on the Uni’s home page.

- 378 -

Appendix B

Pre-Test Questionnaire

Personal Information 1. Gender ❏ Female ❏ Male 2. Age ❏ Under 21 ❏ 21 to 25

❏ 26 to 35 ❏ 36 to 45

❏ 46 to 55 ❏ Over 55 3. Highest level of education:

❏ Primary school

❏ High school

❏ Higher School Certificate (HSC)

❏ Apprenticeship or trade (TAFE) qualification

❏ Bachelor degree

❏ Master degree

❏ PhD

4. What is your usual (full-time) occupation? _____________________________ 5. If you are a student, what degree are you pursuing? _____________________ 6. What is your first language? _________________________________________ Computer Experience 1. Do you own a computer? ❏ Yes ❏ No 2. How long have you been using computers?

❏ Less than a year

❏ 1 – 2 years

❏ 3 – 5 years

- 379 -

❏ 6 – 10 years

❏ More than 10 years

❏ I have never used a computer 3. On average, how often do you use a computer?

❏ Everyday

❏ 5 – 6 times per week

❏ 3 – 4 times per week

❏ 1 – 2 times per week

❏ Less than once a week

❏ Never 4. Which type of computer system do you usually use?

❏ PC/IBM ❏ Apple 5. What do you usually use a computer for (please tick all that apply)?

❏ Surfing the Internet

❏ E-mail

❏ Word processing (typing)

❏ Games

❏ Education (online courses)

❏ Accessing and searching databases

❏ Multimedia (CD-ROMs)

❏ Programming

❏ Specialised software (finance, statistics, graphics, etc.)

❏ Specialised administrative software at work

❏ Other: ________________________________________________________ Internet Experience 1. How long have you been using the Internet?

❏ Less than a year

❏ 2 – 4 years

❏ 5 – 7 years

❏ 8 – 10 years

❏ More than 10 years

- 380 -

2. On average, how often do you use the Internet?

❏ Everyday

❏ 5 – 6 times per week

❏ 3 – 4 times per week

❏ 1 – 2 times per week

❏ Less than once a week

❏ Never 3. On average, how many hours per week do you spend ‘surfing’ the Internet?

❏ 1 – 5 hours per week

❏ 6 – 10 hours per week

❏ 11 – 15 hours per week

❏ 16 – 20 hours per week

❏ 21 – 25 hours per week

❏ More than 25 hours per week

❏ I do not use the Internet 4. How would you rate your ability to use the Internet?

❏ Beginner/novice

❏ Mediocre/average

❏ Fairly Competent

❏ Expert 5. What do you usually use the Internet for (please tick all that apply)?

❏ General ‘surfing’

❏ Searching for specific information for personal use (e.g. study)

❏ Searching for specific information for my work

❏ E-mail

❏ Downloading (software, music, videos, etc.)

❏ Online chat

❏ Online shopping

❏ Other: ________________________________________________________ 6. Which Internet Browser do you usually use to access the Internet? Please tick one

only.

❏ Netscape Navigator

- 381 -

❏ Internet Explorer

❏ Other: ________________________________________________________ 7. Where do you usually (i.e. most of the time) access the Internet from? Please tick one

only.

❏ Home

❏ Work

❏ School/TAFE/university

❏ Internet café

❏ Other: ________________________________________________________ University Web Site Experience 1. Have you ever used the University web site before?

❏ Yes ❏ No 2. How often do you use the University web site?

❏ Everyday

❏ 3 – 4 times per week

❏ 1 – 2 times per week

❏ 5 – 6 times per month

❏ 3 – 4 times per month

❏ Less than once a month

❏ Never

- 382 -

Appendix C

Post-Test Questionnaire

Please indicate how strongly you agree or disagree with the following statements about the UoW web site by ticking in the appropriate box to indicate your opinion. 1 The web site was easy to use.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏

2 The web site does all the things I would like it to do.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 3 I find the information on this web site useful.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 4 Learning to use the web site would be/was easy for me.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 5 This web site contained the information I required.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 6 I find it easy to locate the information I need on this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏

- 383 -

7 I am able to find the information I need quickly using this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 8 My interaction with this web site was clear and understandable.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 9 I find this web site flexible to use.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 10 It would be easy for me to become skilful at using this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 11 Overall, I am satisfied with how easy it is to use this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 12 The language used on this web site is simple and easy to understand.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 13 The language used on this web site is appropriate for someone like me.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 14 The language on this web site is consistent.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 15 I feel comfortable using this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏

- 384 -

16 Whenever I make a mistake using the web site, I am able recover easily and quickly.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 17 The organisation of information on this web site is clear.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 18 It is easy to get lost using this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 19 The links on this web site are clear and understandable.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 20 I am able to navigate through this web site easily.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 21 I always knew where I was at this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 22 This web site has all the information, functions and capabilities I expect it to have.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 23 I found using this web site was a frustrating experience.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 24 This web site does all the things I would like it to do.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏

- 385 -

25 This web site is designed for all types and levels of users.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏ 26 Overall, I am satisfied with this web site.

Strongly Disagree Disagree Neutral Agree Strongly Agree

❏ ❏ ❏ ❏ ❏

- 386 -

Appendix D

Recruitment Poster – Traditional Usability Testing

Do you want to find out what the new Uni of Wollongong web site will look like and receive a $20 gift voucher in return???

We are currently designing the new Uni web site which will be launched early next year. We need students to take part in the evaluation of the web site to provide us with feedback about the design. The evaluation is done on campus and it takes approx 45 minutes to do. All you are required to do is fill out a questionnaire and do a few tasks using the new web site (for example, find out what’s on at the UniMovies next week). In return, you will receive a $20 gift voucher to spend at the UniShop on anything you like. We need: • 8 to 10 domestic (undergraduate/postgraduate coursework) students, • 4 to 5 international students, and • 4 to 5 research students from any faculty! If you fall into one of the above categories and are interested in participating, please send an e-mail to [email protected] or call 9999 9999.

- 387 -

Appendix E

Information Sheet

INFORMATION SHEET Web Site Evaluation

Project Description The primary objective of this project is to evaluate the functionality, usability and structure/navigation of the first prototype of the new University web site. The evaluation is designed to: • Gauge how well the web site meets the needs of the users; • Determine how the users feel about the web site; • Identify specific problems and difficulties experienced by users while using the web site; • Highlight areas for improvement in the web site; • Draw attention to areas that require further investigation and evaluation. The project involves participants using the web site to find out general information about the University. Each participant will carry out a number of task using the web site. Evaluation Process In order to establish the participants’ background and level of computer experience for analysis purposes, the evaluation process will begin with a pre-test questionnaire about your background and general usage of computers and the Internet. The questionnaire is designed in such a way that the data which is collected will not allow anyone to identify you as an individual. You are not required to disclose any personal information (including your name, address, phone number, etc.). Following the pre-test questionnaire, you will be provided with set of task scenarios to complete using the web site. This will be recorded using video cameras and recording equipment which videotapes computer screens. There are two cameras in the room. One will record your facial expressions and the other is focused on the keyboard and mouse you are using. The cameras can also capture any comments you make. In addition to the two cameras, participants will have their computer screens videotaped during their use of the web site so that detailed notes can be made about the patterns of use.

Please rest assured that we are not testing you or your ability to use a computer. The purpose of this evaluation is to assess the web site in order to determine how well it suits

your needs.

- 388 -

During this process, we will be available to answer questions about the procedure and deal with equipment problems. However, we are not able to answer questions about how to use the web site. If it helps, pretend that you are logged in from home. If you feel that you are unable to complete a task scenario after several attempts, do not hesitate to indicate this to us. Remember, we are testing the web site, and not you! It is very useful for us to know what you are thinking about the web site while you are using it. If you feel comfortable doing it, we would like you to think aloud. Feel free to make comments and ask questions out loud, although we probably won’t answer them unless you need help interpreting our intentions on the task scenarios. Upon completing the task scenarios, you will be asked to complete a questionnaire indicating your satisfaction with the performance of the web site, and be given the opportunity to comment briefly on your experience in an interview. The entire process will take approximately 45 to 60 minutes. Your Rights Your rights, as a participant in this project, are guaranteed at all times and include the following: • Your participation in this project is entirely voluntary and you are free to decline to

participate. You can withdraw from the project at any time. Your decision not to participate or to withdraw your consent will not affect your treatment or your relationship with the Chief Investigator, the University or the Owner of the web site who has requested the evaluation, in any way.

• The Owner of the web site who has requested this evaluation will not have access to any

individual data or recordings which result from this project. This data is classified as confidential and private. Under no circumstances will the Owner be permitted to attend or observe any individual session. This is in order to ensure your privacy and anonymity, and, consequently, address any potential dependency relationship between the Owner and participants.

• All the data collected from the evaluation will be stored securely for a period of 5 (five)

years, as per University policy, and maintained in full confidence at all times. Upon expiry of the stipulated secure storage period, the data will be destroyed. As a participant in the project, you have the right to access and view data collected and stored about or from you at any time.

• With the exception of the computer screens, none of the questionnaire data, audio or video

recordings will be made public at any time. Only certain researchers associated with the project will have access to the recordings and then only for analysis and research purposes. The Owner of the web site will not be able to access or view any individual data. Furthermore, the data collected will not be sold or made available for commercial purposes at any time.

• Data collected from the evaluation will not allow any person, including the Owner of the

web site, to identify you as an individual. The questionnaire and video data will be stored in separate locations to protect your privacy. You are not required to disclose any personal information which could potentially identify you. Instead, each participant will be allocated a random Participant ID.

- 389 -

• The project findings will be presented to the Owner of the web site who has requested the evaluation. The findings may also be presented at academic conferences or published in research journals. The project findings will be presented in summarised form only, therefore ensuring your anonymity and privacy at all times. Individual data will only be made available to researchers directly associated with the project who are authorised to carry out the data analysis for research purposes.

If you would like to discuss this research further please contact the Chief Investigator on […]. This research is being undertaken as per Ethics Clearance No. HE02/021. If you have questions about the ethical conduct of this research, please phone the Secretary, Human Research Ethics Committee on (02) 4221 xxxx.

- 390 -

Appendix F

Consent Form

CONSENT FORM Web Site Evaluation

To indicate your consent to participate in the project outlined above, please complete this form by writing your name in BLOCK letters in the space provided, and signing and dating at the bottom. Please return this form to the project facilitator upon completion.

Thank you. I, ........................................................................ have read and understood the Information Sheet, and consent to participate in the web site evaluation project as it has been described to me in the Information Sheet. I understand that the data collected will be used to evaluate the usefulness and effectiveness of the system and to identify potential problems and highlight parts of the system which require improvement. I consent to the data being used in that manner. I am also aware that the data will be published in formal client and research reports but only in summarised form and that my privacy will be assured at all times. Signed Date ....................................................................... ....…../…...../…………......

I, ........................................................................ hereby confirm that I have received a $20.00 Gift Voucher as payment for my participation in this project and I will make no further claims from the Chief Investigator or any other party associated with the project. Signed Date ....................................................................... ....…../…...../…………......

Gift Voucher # …………..

Participant ID:

- 391 -

Appendix G

Profile of Participants – Traditional Usability Testing

Legend (based on pre-test questionnaire in Appendix B)

Question Responses Gender 1 = Female

2 = Male Age 1 = Under 21

2 = 21 to 25 3 = 26 to 35 4 = 36 to 45 5 = 46 to 55 6 = Over 55

Education 1 = Primary school 2 = High school 3 = Higher School Certificate (HSC) 4 = Apprenticeship or trade (TAFE) qualification 5 = Bachelor degree 6 = Master degree 7 = PhD

Occupation S = Student Faculty A = Arts

C = Commerce CA = Creative Arts E = Engineering Ed = Education I = Informatics L = Law S = Science

Language 1 = English 2 = Other

Own computer 1 = Yes 2 = No

Using computer 1 = Less than a year 2 = 1 – 2 years 3 = 3 – 5 years 4 = 6 – 10 years 5 = More than 10 years 6 = I have never used a computer

Average use 1 = Everyday 2 = 5 – 6 times per week 3 = 3 – 4 times per week 4 = 1 – 2 times per week 5 = Less than once a week 6 = Never

- 392 -

Question Responses Computer type 1= PC/IBM

2 = Apple Use for: Surfing; E-mail; Word processing; Games; Education; Databases; Multimedia; Programming; Special software; Special admin; Other.

1 = Option ticked 0 = Option not ticked

Using Internet 1 = Less than a year 2 = 2 – 4 years 3 = 5 – 7 years 4 = 8 – 10 years 5 = More than 10 years

Average use 1 = Everyday 2 = 5 – 6 times per week 3 = 3 – 4 times per week 4 = 1 – 2 times per week 5 = Less than once a week 6 = Never

Surfing time 1 = 1 – 5 hours per week 2 = 6 – 10 hours per week 3 = 11 – 15 hours per week 4 = 16 – 20 hours per week 5 = 21 – 25 hours per week 6 = More than 25 hours per week 7 = I do not use the Internet

Ability Rating 1 = Beginner/novice 2 = Mediocre/average 3 = Fairly Competent 4 = Expert

Use for: Surfing; Personal info; Work info; E-mail; Downloads; Online chat; Shopping; Other

1 = Option ticked 0 = Option not ticked

Browser 1 = Netscape Navigator 2 = Internet Explorer 3 = Other

Access 1 = Home 2 = Work 3 = School/TAFE/university 4 = Internet café 5 = Other

Used uni web site 1 = Yes 2 = No

How often 1 = Everyday 2 = 3 – 4 times per week 3 = 1 – 2 times per week 4 = 5 – 6 times per month 5 = 3 – 4 times per month 6 = Less than once a month 7 = Never

- 393 -

Domestic Students PARTICIPANTS’ RESPONSES Question 1 2 3 4 5 6 7 8 9 10 Gender 1 2 1 2 1 1 2 1 1 1 Age 4 5 2 2 1 1 4 1 1 6 Education 2 3 3 3 3 3 3 3 3 3 Occupation S S S S S S S S S S Faculty Ed I A I A/C CA L/C A/S S A Language 1 1 1 2 1 1 1 1 1 1 Own computer 1 1 1 1 1 1 1 1 1 1 Using computer 4 5 4 5 3 3 5 4 5 3 Average use 2 1 2 1 1 1 1 1 1 1 Computer type 1 1 1 1 1 1 1 1 1 1 Use for Surfing 1 1 1 1 0 1 1 1 1 1 E-mail 1 1 1 1 1 1 1 1 1 1 Word processing 1 1 1 1 1 1 1 1 1 1 Games 0 1 0 1 0 0 0 1 1 0 Education 0 1 1 1 0 0 1 0 1 0 Databases 0 1 0 0 1 1 1 1 0 0 Multimedia 0 1 0 1 0 0 0 1 0 0 Programming 0 1 0 1 0 0 0 0 1 0 Special software 0 0 1 1 0 1 0 0 0 0 Special admin 0 0 1 0 1 0 0 0 0 0 Other 0 1 0 0 0 0 0 0 0 0 Using Internet 2 5 3 3 2 3 2 3 4 2 Average use 2 1 2 1 3 1 2 2 1 1 Surfing time 1 2 2 3 1 3 1 1 1 1 Ability rating 2 3 3 3 2 3 3 2 3 2 Use for Surfing 0 1 0 1 0 1 1 1 0 1 Personal info 1 1 1 1 1 1 1 1 1 1 Work info 0 0 1 0 0 0 1 1 0 0 E-mail 1 1 1 1 1 1 1 1 1 1 Downloads 0 1 1 1 0 1 1 1 1 0 Online chat 0 0 0 0 0 1 0 0 0 0 Shopping 0 0 0 0 0 0 0 0 0 0 Other 0 0 0 0 0 0 0 0 0 0 Browser 2 2 2 2 2 2 2 1 2 2 Access 1 1 1 1 2 3 3 3 1 1 Used uni site 1 1 1 1 1 1 1 1 1 1 How often 4 3 2 2 1 1 2 1 2 5

- 394 -

International Students

PARTICIPANTS’

RESPONSES Question 1 2 3 Gender 2 1 2 Age 3 3 2 Education 6 5 5 Occupation S S S Faculty C C C Language 2 2 2 Own computer 1 1 1 Using computer 5 3 3 Average use 1 1 1 Computer type 1 1 1 Use for Surfing 1 1 1 E-mail 1 1 1 Word processing 1 1 1 Games 1 0 1 Education 1 0 1 Databases 1 1 1 Multimedia 1 1 0 Programming 1 0 1 Special software 0 1 0 Special admin 0 1 0 Other 0 0 0 Using Internet 5 2 2 Average use 1 1 1 Surfing time 6 1 2 Ability rating 3 2 2 Use for Surfing 1 1 1 Personal info 1 1 1 Work info 1 1 0 E-mail 1 1 1 Downloads 1 1 1 Online chat 1 1 0 Shopping 0 0 0 Other 0 0 0 Browser 1 2 2 Access 2 3 1 Used uni site 1 1 1 How often 1 1 1

- 395 -

Research Students

PARTICIPANTS’

RESPONSES Question 1 2 Gender 1 2 Age 2 2 Education 6 5 Occupation S S Faculty C E Language 1 1 Own computer 1 1 Using computer 5 3 Average use 1 1 Computer type 1 1 Use for Surfing 1 1 E-mail 1 1 Word processing 1 1 Games 1 1 Education 0 1 Databases 1 1 Multimedia 0 1 Programming 1 1 Special software 1 1 Special admin 0 0 Other 0 0 Using Internet 2 3 Average use 1 1 Surfing time 6 2 Ability rating 4 3 Use for Surfing 1 0 Personal info 1 1 Work info 1 1 E-mail 1 1 Downloads 1 1 Online chat 1 0 Shopping 0 0 Other 1 0 Browser 2 2 Access 2 1 Used uni site 1 1 How often 1 1

- 396 -

Appendix H

Quantitative Data

Domestic Students Time taken to complete a task (min:sec) PARTICIPANTS 1 2 3 4 5 6 7 8 9 10 Mean

Task 1 1:49 7:04 1:38 2:55 1:18 1:09 1:09 2:16 1:16 6:48 2:44 Task 2 7:39 4:58 4:11 0:36 3:27 4:28 6:10 2:22 2:06 3:15 3:55 Task 3 6:21 4:03 3:08 1:21 2:08 4:15 1:49 0:36 0:26 1:41 2:35 Task 4 1:01 3:35 2:03 3:41 0:55 1:21 5:41 0:36 2:37 1:22 2:17 Task 5 1:11 0:39 1:23 1:57 0:11 3:58 0:37 0:11 0:21 4:38 1:31 Task 6 2:01 0:55 1:56 0:21 0:38 3:56 3:09 0:14 0:19 1:51 1:32 Task 7 1:42 4:53 1:57 0:38 1:08 2:18 2:37 0:33 0:26 0:49 1:42

Number of hyperlinks used PARTICIPANTS 1 2 3 4 5 6 7 8 9 10 Mean

Task 1 3 23 5 13 9 3 4 14 4 9 9 Task 2 14 10 5 3 13 15 17 10 12 4 10 Task 3 7 9 6 3 7 10 9 4 2 2 6 Task 4 7 9 3 17 3 4 11 3 14 4 8 Task 5 3 2 6 18 2 8 3 2 2 12 6 Task 6 6 2 6 2 4 8 9 2 2 2 4 Task 7 4 5 2 4 4 7 3 2 2 2 4

Number of incorrect hyperlinks used PARTICIPANTS 1 2 3 4 5 6 7 8 9 10 Mean

Task 1 1 4 2 7 7 0 2 11 2 5 4 Task 2 5 1 5 3 8 12 11 8 4 2 6 Task 3 4 9 3 0 5 7 7 0 0 0 4 Task 4 3 4 0 10 0 2 8 0 10 1 4 Task 5 0 0 3 9 0 4 0 0 0 8 3 Task 6 3 0 4 0 1 5 6 0 0 0 2 Task 7 1 2 2 2 3 2 0 0 0 0 1

- 397 -

Number of participants who completed task successfully (out of 10) PARTICIPANTS 1 2 3 4 5 6 7 8 9 10 Total

Task 1 8 Task 2 3 Task 3 5 Task 4 8 Task 5 8 Task 6 9 Task 7 8

International Students Time taken to complete a task (min:sec) PARTICIPANTS 1 2 3 Mean

Task 1 2:53 4:37 1:52 3:07 Task 2 2:25 3:28 5:19 3:44 Task 3 2:31 7:13 2:59 4:14 Task 4 6:21 4:52 6:22 5:52 Task 5 2:02 6:28 1:49 3:26 Task 6 3:40 2:21 1:57 2:39

Number of hyperlinks used PARTICIPANTS 1 2 3 Mean

Task 1 8 6 4 6 Task 2 4 6 21 10 Task 3 4 15 9 9 Task 4 7 8 16 10 Task 5 6 15 3 8 Task 6 5 8 3 5

Number of incorrect hyperlinks used PARTICIPANTS 1 2 3 Mean

Task 1 2 6 1 3 Task 2 0 6 12 6 Task 3 0 14 4 6 Task 4 1 7 11 6 Task 5 1 14 0 5 Task 6 1 8 0 3

- 398 -

Number of participants who completed task successfully (out of 3) PARTICIPANTS 1 2 3 Total

Task 1 1 Task 2 1 Task 3 1 Task 4 1 Task 5 2 Task 6 2

Research Students Time taken to complete a task (min:sec) PARTICIPANTS 1 2 Mean

Task 1 2:45 2:46 2:46 Task 2 0:52 0:27 0:40 Task 3 0:34 0:16 0:25 Task 4 0:09 0:39 0:24 Task 5 2:54 2:43 2:49 Task 6 0:40 0:27 0:34 Task 7 1:52 0:17 1:05

Number of hyperlinks used PARTICIPANTS 1 2 Mean

Task 1 17 9 13 Task 2 5 4 5 Task 3 4 3 4 Task 4 3 3 3 Task 5 10 11 11 Task 6 2 2 2 Task 7 8 2 5

Number of incorrect hyperlinks used PARTICIPANTS 1 2 Mean

Task 1 11 1 6 Task 2 0 0 0 Task 3 0 0 0 Task 4 0 0 0 Task 5 4 2 3 Task 6 0 0 0 Task 7 2 0 1

- 399 -

Number of participants who completed task successfully (out of 2) PARTICIPANTS 1 2 Total

Task 1 2 Task 2 2 Task 3 2 Task 4 2 Task 5 2 Task 6 2 Task 7 2

- 400 -

Appendix I

Post-Test Questionnaire Data Summary

Legend (based on post-test questionnaire in Appendix C)

Strongly Disagree 1 Disagree 2 Neutral 3 Agree 4 Strongly Agree 5

Domestic Students PARTICIPANTS’ RESPONSES Statement 1 2 3 4 5 6 7 8 9 10 MedianThe web site was easy to use 4 3 4 4 4 4 4 4 4 4 4 The web site does all the things I would like it to do 3 4 4 4 4 3 5 4 4 4 4 I find the information on this web site useful 4 4 5 4 5 4 4 4 4 5 4 Learning to use the web site would be/was easy for me 3 3 4 3 5 5 5 4 4 4 4 This web site contained the information I required 4 4 5 4 5 3 5 4 4 4 4 I find it easy to locate the information I need on this web site

4 3 4 3 4 3 4 4 4 4 4

I am able to find the information I need quickly using this web site

3 3 4 3 4 3 4 4 3 4 3.5

My interaction with this web site was clear and understandable 4 3 5 4 4 3 5 4 4 4 4 I find this web site flexible to use 4 3 5 4 5 4 5 4 4 4 4 It would be easy for me to become skilful at using this web site

4 4 5 4 5 5 5 4 5 5 5

Overall, I am satisfied with how easy it is to use this web site 4 4 5 4 5 4 4 4 4 4 4 The language used on this web site is simple and easy to understand

4 4 5 3 5 3 4 4 4 4 4

The language used on this web site is appropriate for someone like me

4 4 5 4 5 4 4 4 3 4 4

The language on this web site is consistent 5 4 5 5 5 5 5 4 5 4 5 I feel comfortable using this web site 4 4 5 4 4 4 4 4 4 4 4

- 401 -

PARTICIPANTS’ RESPONSES Statement 1 2 3 4 5 6 7 8 9 10 MedianWhenever I make a mistake using the web site, I am able recover easily and quickly

4 4 5 5 4 5 5 4 5 4 4.5

The organisation of information on this web site is clear 4 2 4 3 4 3 5 4 3 4 4 It is easy to get lost using this web site 3 3 2 3 2 1 1 2 2 1 2 The links on this web site are clear and understandable 4 3 3 4 4 4 5 4 4 4 4 I am able to navigate through this web site easily 4 3 3 3 4 4 5 4 4 4 4 I always knew where I was at this web site 3 4 2 4 4 3 5 3 2 4 3.5 This web site has all the information, functions and capabilities I expect it to have

3 4 4 3 4 5 5 4 4 4 4

I found using this web site was a frustrating experience 2 2 2 2 2 2 1 1 2 1 2 This web site does all the things I would like it to do 3 4 4 3 4 3 4 4 4 4 4 This web site is designed for all types and levels of users 4 2 4 3 3 2 4 4 2 3 3 Overall, I am satisfied with this web site 4 3 5 4 4 4 4 4 4 4 4

International Students

PARTICIPANTS’

RESPONSES

Statement 1 2 3 MedianThe web site was easy to use 4 4 4 4 The web site does all the things I would like it to do 3 3 4 3 I find the information on this web site useful 3 4 3 3 Learning to use the web site would be/was easy for me 5 5 3 5 This web site contained the information I required 3 3 2 3 I find it easy to locate the information I need on this web site 3 2 3 3 I am able to find the information I need quickly using this web site 3 3 1 3 My interaction with this web site was clear and understandable 4 4 4 4 I find this web site flexible to use 4 3 4 4 It would be easy for me to become skilful at using this web site 5 5 3 5 Overall, I am satisfied with how easy it is to use this web site 4 4 4 4 The language used on this web site is simple and easy to understand 4 5 4 4 The language used on this web site is appropriate for someone like me 4 5 4 4 The language on this web site is consistent 4 5 3 4 I feel comfortable using this web site 4 5 4 4 Whenever I make a mistake using the web site, I am able recover easily and quickly 4 4 5 4 The organisation of information on this web site is clear 4 4 2 4 It is easy to get lost using this web site 2 2 2 2 The links on this web site are clear and understandable 4 4 3 4 I am able to navigate through this web site easily 4 5 4 4 I always knew where I was at this web site 4 5 3 4 This web site has all the information, functions and capabilities I expect it to have 3 3 2 3 I found using this web site was a frustrating experience 2 2 2 2 This web site does all the things I would like it to do 3 3 4 3 This web site is designed for all types and levels of users 3 4 2 3

- 402 -

PARTICIPANTS’

RESPONSES

Statement 1 2 3 MedianOverall, I am satisfied with this web site 4 4 3 4 Research Students

PARTICIPANTS’

RESPONSES

Statement 1 2 MedianThe web site was easy to use 3 4 3.5 The web site does all the things I would like it to do 4 4 4 I find the information on this web site useful 4 4 4 Learning to use the web site would be/was easy for me 4 5 4.5 This web site contained the information I required 4 4 4 I find it easy to locate the information I need on this web site 2 5 3.5 I am able to find the information I need quickly using this web site 1 5 3 My interaction with this web site was clear and understandable 3 5 4 I find this web site flexible to use 2 5 3.5 It would be easy for me to become skilful at using this web site 5 4 4.5 Overall, I am satisfied with how easy it is to use this web site 2 5 3.5 The language used on this web site is simple and easy to understand 2 5 3.5 The language used on this web site is appropriate for someone like me 3 5 4 The language on this web site is consistent 2 4 3 I feel comfortable using this web site 2 5 3.5 Whenever I make a mistake using the web site, I am able recover easily and quickly 2 5 3.5 The organisation of information on this web site is clear 2 4 3 It is easy to get lost using this web site 5 2 3.5 The links on this web site are clear and understandable 3 4 3.5 I am able to navigate through this web site easily 2 5 3.5 I always knew where I was at this web site 1 5 3 This web site has all the information, functions and capabilities I expect it to have 1 4 2.5 I found using this web site was a frustrating experience 4 2 3 This web site does all the things I would like it to do 2 4 3 This web site is designed for all types and levels of users 2 4 3 Overall, I am satisfied with this web site 2 5 3.5

- 403 -

Appendix J

Usability Problems – Traditional Usability Testing

Usability Problem

Severity rating (as per Nielsen

(1993)) Students relied heavily on the search engine and site index to find information. The performance of the former was not satisfactory.

High

Students thought the Current Students link was useful, but ONLY after they had found it. Those students who never used it, were less satisfied with the web site, than those that did use it. Those students who didn’t use it cited the following reasons in the interview: • They didn’t find it meaningful, or • They did not have a positive view of the link on the current site and their

expectations were negative, or • They expected this link to contain general information only.

High

The Refund policy is not available on the International Students Web site. High Students experienced some difficulty understanding terminology used on web site because it did not reflect a student point of view. Some examples: • Policy directory was not a term meaningful to most students, especially

international students who did not think to look there for use of foreign dictionaries in exam rules;

• Campuses was associated with accommodation (because one of the residence halls is called Campus East);

• “Learning development” did not reflect its purpose (helping students with study).

Medium

Students are not aware what is “behind a link”, and they click on the incorrect links because they have to guess. • E.g. Students used Teaching and Learning link on the home page often,

expecting it to contain information relevant to them, or • Students looked for referencing guides under Research because they

associated this with “researching an essay”.

Medium

It is necessary to add links within and between different sections of the On-line calendar and Policy Directory. The structure of these is confusing to students because of information duplication and difficulty navigating to the correct information. It is easy to “get lost” in this part of the site.

Medium

Students expect certain information to be located in places where it is currently not: • E.g. “Getting to the University” page does not have information about

transport options, or • Students looked up transport and scholarship information under Prospective

students because that’s where they expected to find it.

Medium

There is a great deal of information duplication, as well as information which is out of date. • E.g. Accommodation information can be found on several different pages, or • What’s on link was a week out of date at one point.

Medium

Students commented that a Courses link should be on the home page. Medium

- 404 -

Usability Problem

Severity rating (as per Nielsen

(1993)) Career opportunities does not contain information about careers for students (which is what students expected) but job vacancies at the University.

Medium

The use of the term Graduates caused some confusion as students contemplated whether it referred to a graduate degree, a person who is about to graduate or a person who has graduated.

Medium

Students do not expect to find a lot of information on the University’s web site, therefore they are not aware that it is there. • E.g. UniLearning is service students are not aware of so they didn’t search for

it.

Low

Scholarships page requires re-linking so that scholarships for current students (instead of commencing students) are linked to from the current students page and those for commencing students are linked to from the prospective students page.

Low

Acronyms need to be added – students are used to SOLS, not Student Online Services.

Low

Clubs and societies link should be added to Current students page, and possibly the Prospective students page as well because that is where students expected to find the information.

Low

News and Events is duplicated on the home page (once as a link and then as a feature article concerning a recent news/event). Also, the News and events link should contain more student-oriented information (e.g. UniMovies, which was the most difficult thing for students to find on the site).

Low

Students commented that the Career Services web page is overloaded with too many links.

Low

Students associate certain services with buildings (e.g. special consideration applications through Administration, accommodation and part-time/casual jobs through Student Services in Bldg 17). They suggested that this should be addressed in some form on the web site.

Low

Students are not aware that certain images and maps can be clicked on (have “hotspots”), while they expect others that don’t have this functionality, to have it.

Low

International Students found International Student Advisors and Additional Info for International Students irrelevant to their needs.

Low

- 405 -

Appendix K

DUEM Recruitment Poster

What do you think of the new Uni web site? Let us know by taking part in an evaluation

of the web site! The new Uni web site was launched earlier this year. We want to know what you think about it! We need students to take part in the evaluation of the web site to let us know whether they find the web site useful and easy to use. ALL students on campus are welcome to participate. If you are interested in taking part or would simply like to find out more about the project, please send an e-mail to [email protected] or call 9999 9999.

- 406 -

Appendix L

Distributed Usability Problems

Activity 1: Enrolling in subjects/courses

Information about enrolment procedure (e.g. rules) is not available on the online student enrolment system

Information about subjects and courses is not integrated into the online student enrolment system

Unable to change/withdraw from courses/degrees using online student enrolment system

Information about relevant academic advisers and their contact information is not available on the web site

Information about pre-requisite waivers and other special circumstances is only in the course handbook, which is not integrated into the online student enrolment system

System does not provide warning if enrol in overlapping subjects

Not all subjects use the student online enrolment system for enrolling in tutorials

Activity 2: Taking subjects/Doing research

Subject information is not integrated with subject content (on commercial online learning software)

Subject materials are provided via different means (ranging from online to hard copy) and no information is available from a single location about what means each subject uses

Inability to access a subject website from the student online enrolment system

Some readings specified on subject web sites are not linked to the Library readings

Academic staff who have designed own site (instead of using the commercial online learning system) are unable to link the site to the University web site or the online student enrolment system

Research materials are provided via different means and each faculty/supervisor has own requirements for standard research materials (e.g. writing a literature review)

- 407 -

Students have no means of recording meetings with research supervisors and tracking their progress

Activity 3: Doing assignments/Studying for and taking tests and exams/Writing a Thesis

Information about subject/assignment rules, policies and procedures is not integrated into the online student learning system

Certain subject policies vary across faculties, however, this is not reflected on the web site

Application for special consideration is not integrated into the commercial online learning system

The online student enrolment system and the commercial online learning system need to be integrated

There is no link to the University’s web site after logging out of the web-based e-mail system (the system cited by most students as being their default ‘home page’ and the system they most frequently log in to on a daily basis)

Students must access different sources to find out their marks for assessment tasks

Activity 4: Socialising

It is difficult to find and access information about social activities and events

Students living in the residence halls use the residence hall’s home page to find out about social activities and events. However, their home pages are not accessible from the University’s web site

Most clubs and societies have their own web site but it is not accessible from the University’s home page

UniMovies (the most popular club) is difficult to find

Newsletters about social activities are sent to students via e-mail if they sign up, however the information in the e-mail is not replicated on the University’s web site