Cardiovascular physiology predicts learning effects in a serious game activity

12
(This is a sample cover image for this issue. The actual cover is not yet available at this time.) This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Transcript of Cardiovascular physiology predicts learning effects in a serious game activity

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Cardiovascular physiology predicts learning effects in a serious game activity

Ben Cowley a,b,*, Niklas Ravaja a,c,d, Tuija Heikura e

aCentre for Knowledge and Innovation Research, School of Economics, Aalto University, Helsinki, FinlandbCognitive Science Unit, Department of Behavioural Sciences, University of Helsinki, FinlandcDepartment of Social Research, University of Helsinki, FinlanddHelsinki Institute for Information Technology, University of Helsinki, Finlande Finnish Science Park Association TEKEL, Finland

a r t i c l e i n f o

Article history:Received 4 November 2011Received in revised form20 July 2012Accepted 21 July 2012

Keywords:Serious gamePeacemakerLearningAssessmentHeart rate variabilityFacial electromyographyBloom taxonomy

a b s t r a c t

In a study on learning in serious games, 45 players were tested for topic-comprehension by a ques-tionnaire administered before and after solo-playing of the serious game Peacemaker (Impact Games2007), during which their psychophysiological signals were measured. Play lasted for 1 h, with a break athalf time. The questionnaire was divided into two parts, with fixed and open questions respectively. Weuse the Bloom taxonomy to distinguish levels of difficulty in demonstrated learning – with the first fivelevels assigned to fixed questions – and gain scores to measure actual value of demonstrated learning.We present the analysis of the psychophysiology recorded during game play and its relationship tolearning scores. The Heart Rate Variability (HRV) (an indicator of mental workload) and interactionbetween HRV and electromyography of Orbicularis Oculi (an indicator of positive affect) significantlypredicted the learning results at certain levels of difficulty. Results indicate that increased working-memory related mental workload in support of on-task attention aids learning at these levels.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

The potential of “serious games” as new tools for learning is recognised as an exciting possibility for Technology Enhanced Learning (TEL),but there remain barriers to achieving all possible benefits, such as how to find the optimal balance between entertainment and education.Part of the problem lies with insufficient understanding of the embodied experience of learning-while-having-fun.

Squire advocates that educators designing games should shift the focus from delivering content to designing experiences “in whichparticipants learn through a grammar of doing and being” (Squire, 2006). In the ideal case, when learning through playing such games, thelearner undergoes an engaging experience that contributes to the development of her knowledge or competences (Kolb, 1984). In the fieldsof game theory (Koster, 2005), learning theory (Gee, 2003), even animal behaviour (Groos, 1898) it has been observed that the subjectiveexperiences of learning and play are intrinsically (though not necessarily) linked. Thus the players’ psychological and physiological expe-rience is of central interest, motivating the need to objectively measure this subjectivity.

We present results from an experiment designed to address this knowledge gap. We tested learning outcomes from a serious gameactivity using psychophysiological methods, with the motivation to improve understanding of how learning from a serious game affects theplayer. Psychophysiological research is defined as using physiological signals to study psychological phenomena, in this case, the player’sexperience during serious game play/learning. In this respect, it is important to use physiological signals for whichwe have some theoreticaljustification to link to learning. That is, the entire action of the physiology of the player is so complex, that only by focussing on specificfeatures of signals can sensible interpretations be made if a relationship is found. Two features of the physiology1 which are particularlyinteresting in respect of learning are the cardiac measures and the facial signs of valence – i.e. respectively, bodily activation and pleasure ordispleasure in response to the experimental activity. These may act as a sufficient causal explanation (Peters, 1960) of observed learning.

* Corresponding author. Address: Centre for Knowledge and Innovation Research, School of Economics, Aalto University, PO Box 21250, 00076 Aalto, Helsinki, Finland. Tel.:þ358 40 353 8339; fax: þ358 9 4313 8391.

E-mail address: [email protected] (B. Cowley).1 Excluding Electroencephalography (EEG) measures of brain activity. Note that results were discovered with respect to EEG; however this work is still on-going and is

complex enough to require treatment as a standalone topic.

Contents lists available at SciVerse ScienceDirect

Computers & Education

journal homepage: www.elsevier .com/locate/compedu

0360-1315/$ – see front matter � 2012 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.compedu.2012.07.014

Computers & Education 60 (2013) 299–309

Author's personal copy

Several studies have shown that digital games (i.e. an active coping task) elicit considerable emotional arousal- or stress-relatedcardiovascular reactivity in terms of Heart Rate (HR) and blood pressure (e.g., Johnston, Anastasiades, & Wood, 1990). Given that digitalgame play may elicit both arousal and attentional engagement, the dual innervation of the heart may entail interpretative difficultiesassociated with HR (Ravaja, 2004). That is, HR does not index only emotional arousal (accompanied by sympathetically-mediated HRacceleration) but also attentional engagement (accompanied by parasympathetically-mediated HR deceleration (Turpin, 1986)). However,the previously found convergent relations between HR and arousal during digital game playing suggest that HR covaries primarily withemotional arousal during playing (Ravaja, Saari, Salminen, Laarni, & Kallinen, 2006). Thus we state Hypothesis 1a: as an indicator ofunderlying emotional arousal, HR will predict learning outcomes from a serious game activity. Additionally, HR variability (HRV) has beenassociated with mental workload (high mental workload results in a decrease in HRV), giving Hypothesis 1b: as an indicator of mentalworkload, HRV will predict learning outcomes from a serious game activity.2

Facial Electromyography (fEMG), direct measurement of the electrical activity associated with facial muscle contractions (Tassinary &Cacioppo, 2000), has the potential to record the unconscious emotional reactions of the player to interaction with in-game stimuli. Morespecifically, recording at the sites of Zygomaticus major (cheek) and Corrugator Superciili (brow) can index positive and negative valence,respectively (Lang, Greenwald, Bradley, & Hamm,1993; Ravaja, Saari, Kallinen, & Laarni, 2006;Witvliet & Vrana,1995). In addition, recordingat the Orbicularis Oculi (periocular) can index high arousal positive valence (Ekman, Davidson, & Friesen, 1990). We may suppose thatemotional engagement in the game play will influence learning, hence Hypothesis 2: stronger expressions of valence as indexed by acti-vation in one or more fEMG channels will predict learning scores. Further, because several studies have shown that Orbicularis oculi activityis particularly high during pleasurable high-arousal emotions (Ravaja, Saari, Kallinen, et al., 2006; Witvliet & Vrana, 1995), and we wouldexpect to see the physiology react to stimuli in a cohesive way, we state Hypothesis 2a: interaction between HR and Orbicularis Oculi willpredict learning scores; and Hypothesis 2b: interaction between HRV and Orbicularis Oculi will predict learning scores.

Participants were recruited to play 1 h of the Peacemaker serious game (Impact Games 2007), see Fig. 1. This game attempts to teach theplayer about the nature and causes of the Israel–Palestine conflict, and has seen considerable success in its aim (Burak, Keylor, & Sweeney,2005).

This work is a part of the European project TARGET (Transformative, Adaptive, Responsive and enGaging EnvironmenT, IST 231717),which aims to reduce time to competence by providing life-like learning experiences through educational games. It was the final goal of thisstudy to advance the state of the art in support of this project.

In the next section we describe the state of the art in educational games and measurement of such activity using methods of psycho-physiology. Section 3 details the studymethodology, including sub-sections to outline the experiment procedure, the relevant aspects of thisgame and its method of choice, the assessment questions and psychophysiological methods. Section 4 details our results and interpretation,and Section 6 gives conclusions and future directions.

2. Background

The basic premise of psychophysiological methods for TEL evaluation is that the player cannot give inaccurate physical signals (dis-counting acquisition issues), and the acquisition of signals is non-intrusive, freeing the player’s attention. Psychophysiological enabledevaluation can then take place alongside other forms, such as self-report, entirely consistently. This has the potential to improve on-taskattention during the protocol and also reduce reactivity (also known as the Hawthorne effect).

2.1. Psychophysiological methods for studying game players

We present some of the most relevant literature in the field of psychophysiological enabled evaluation, but for a thorough review seeKivikangas et al. (2011). As suggested by the recent EU-funded NEST project FUGA (“The fun of gaming: Measuring the human experience ofmedia enjoyment”3), psychophysiological measurements provide an innovative method for assessing digital games. That is, they can beapplied when designing new digital games for different purposes (e.g., entertainment, education, therapy) (Mandryk & Atkins, 2007; Ravaja,Saari, Kallinen, et al., 2006; Ravaja, Saari, Salminen, et al., 2006; Ravaja, Saari, Turpeinen, et al., 2006).

Psychophysiological methods are particularly useful for objectively examining experience: because the physiological processesmeasured are mostly non-voluntary, the measurements are not contaminated by participant answering style, social desirability, inter-pretations of questionnaire itemwording, or limits of their memory. Combinedwith othermethods (e.g., self-report and observational data),the user experiences can be studied with a precision not possible without psychophysiological methods. It is even possible to boost thereliability of self-report methods by reference to the psychophysiological data. The limitation of this method in general is that, although allmental phenomena are based on the physical, in practice the inference is limited by the extent of the current knowledge and the signal-to-noise ratio of non-invasive instruments. Also, physiological processes are typically not related to psychological phenomenawith one-to-onerelationship, but a change in physiological activity may be related to several psychological processes and vice versa (Cacioppo, Tassinary, &Berntson, 2000).

A large number of studies have shown that psychophysiological measures (e.g., EDA, HR, fEMG, fronto-cortical asymmetry of Electro-encephalography (EEG)) can be used to index emotional, motivational, and cognitive responses to media messages (e.g., video, television,radio and textual messages) and digital games (Ravaja, 2004; Ravaja, Saari, Turpeinen, et al., 2006). Prior research has associatedpsychophysiological measures (e.g., EEG, facial EMG, EDA, and different cardiac measures, such as heart rate variability) with psychologicalstates and processes, including joy/enthusiasm, frustration, mental stress/cognitive overload, approach/withdrawal motivation, threat vs.challenge appraisals, and attentional processes (Harmon-Jones, 2003; Tomaka, Blascovich, Kelsey, & Leitten, 1993; Witvliet & Vrana, 1995).

2 Note that as this is an exploratory study, we do not propose directed hypotheses – it is plausible to consider learning-triggered physiological activation as eitherincreasing or decreasing with respect to baseline, depending on context.

3 http://project.hkkk.fi/fuga/.

B. Cowley et al. / Computers & Education 60 (2013) 299–309300

Author's personal copy

In one set of studies, laboratory experiments were carried out to examine how psychophysiological recordings (i.e. EEG, facial EMG, EDA,cardiac activity) obtained during using the serious game predict (short-term) learning outcomes (Chaouachi & Frasson, 2010; Chaouachi,Jraidi, & Frasson, 2011; Heraz, Jraidi, Chaouachi, & Frasson, 2009).

Several authorshave suggested that it isadvisable tousemultiplemeasures sothatdifferential responsepatterns (orprofiles) canbe identified(Cacioppo et al., 2000). There is also evidence suggesting that the synchronisation of activity of different physiological response systems (i.e.response coupling) may reflect the central state of the individual particularly well (Kettunen, Ravaja, Naatanen, & Keltikangas-Jarvinen, 2000).

2.2. Educational games

Player subjective experience and engagement is a key determinant in the potential for learning from games, and as mentioned in theopening, we study it with both objective measurements and experimental manipulation. The experiment design draws on the TEL literatureto select a factor of learning/play which to manipulate.

Several researchers advocate that learning must involve more than the transmission of knowledge, but must instead encourage andhappen through rich contexts that reflect real life learning processes (Lave & Wenger, 1991), suggesting a variation of classroom context tosupplement the game. McGinnis, Bustard, Black, and Charles (2008) draw a parallel between the factors describing student engagement andthose involved in gameplay,which illustrates the elementsnecessary for creating game-like engagement inpedagogy.However learningdoesnot necessarily follow engagement, as argued by Kirschner, Sweller, and Clark (2006) who point out that discovery, problem-based, expe-riential andenquiry-based techniques are themain tools of games, but all requireprior knowledgeon thepart of the student toevoke learning.A solution is sought in scaffolding thegame, i.e. instructional supportduring learning to reduce cognitive load (O’Neil,Wainess,&Baker, 2005).One approach is described by de Freitas (2006, p. 37)

“In order for learning outcomes to be achieved it is necessary with simulations (and games) to reinforce learning that has taken placethrough meta-reflection and post exercise consideration. This may be done through replaying the simulation, discussion, and dedicatedactivities that aim to highlight key aspects of the learning.”

Thus we explored two contrasting ways in which the serious game could be deployed. The contrast involved the presence or absence ofinteraction between the game-play and a period of reflection/guided group discussion, to address the question: would playing witha reflection period/group discussion predict improved learning scores compared to playing the game alone? We described the resultspertaining to this second question in detail previously (Cowley, Heikura & Ravaja, submitted). This design also complements the investi-gation of psychophysiological results, as described in the next section.

Fig. 1. Screenshot of the Peacemaker game. Players act as a regional leader (Israeli or Palestinian) and play is oriented around strategic management of conflict, taking governmentalactions as shown by the menu on the left. Conflict is modelled by factions/stakeholders who each have approval ratings for the player – information can be obtained by clicking ona faction’s icon. ‘Spontaneous’ events are reported as news (marked on the screenshot by reticules), drive the game narrative, and as player approval ratings with a particular factionvary these events become more or less critical (in the screenshot crisis is indicated by the colour of the reticule). Events and player actions combine to drive approval ratings –

winning is defined as achieving 100/100 on both Israeli and Palestinian ratings (see bottom left), while losing happens after scoring �50/100 on either.

B. Cowley et al. / Computers & Education 60 (2013) 299–309 301

Author's personal copy

3. Materials and methods

Our experimental protocol is designed to test hypotheses 1 and 2 with the protocol shown in Fig. 2. As noted above the conditions weredesigned to investigate variants of deployment for a serious game – a between subjects question; these variants interact with the physi-ological aspect of experience by varying the time-on-task – a within-subjects question. Thus we examine how the continuous psycho-physiological data predict learning scores, and also see how the varied deployment affects players physiologically.

We enrolled a random sample of 45 participants (29 males), derived from subscribers to a number of university mailing lists especiallyrelevant to the topics of project management (business students) and conflict resolution (psychology students). All participants gave theirinformed consent after a two stage briefing (by email and in person) which allowed opportunities for questions for the experimenters.Background informationwas also obtained on participants’ age (19–32 years,M ¼ 24.6, SD ¼ 3.5), gender, ethnic background (Finn 35, Latin4, Russian 4, Chinese 1, Canadian 1), language knowledge, education (41 graduated or studying), religious view (other 2, none 13, atheist/agnostic 14, Christian 16), and computer-game playing frequency (on a scale from ‘1:Not a lot’ to ‘5:A lot’, average score 3.13). Personalconnections to Israel or Palestine and prior knowledge of the subject-matter were used as exclusion criteria to prevent bias in the learningprocess. Participants were randomly divided into two conditions, 20 in condition 1 (13 males) to 25 in condition 2 (16 males), to achievea specific-effects randomised controlled trial (the condition samples were of unequal size because some data from the psychophysiologicalrecordings in condition 2 was corrupted). The age range of all participants was from. Age and gender were balanced between conditionswith no significant difference (age: t(43) ¼ 0.0, ns., gender: t(43) ¼ �0.68, ns.).

Participants first took a questionnaire to assess their knowledge of the topic area, were dressed inmeasurement electrodes then engagedin two game play trials, and after play retook the same questionnaire. The precise phases are shown in Fig. 2.

The experiment procedure was divided into six main phases. First the participants answered 41 questions concerning the Israel–Palestine crisis which usually took them an hour (M ¼ 56.0 min, SD ¼ 17.3 min) – a time that did not significantly vary between conditions(t(43) ¼ 0.85, ns.).

In phase 2 the participants were dressed in themeasuring equipment, and seated in an electrically-shielded lab for impedance inspectionand game-play. This process took, on average, 49.4 min for condition 1 (SD ¼ 18.0 m) and 100.1 min for condition 2 (SD ¼ 22.1 m). Thedifference was partially due to the ratio of experimenters to participants –which for the first conditionwas 2:1 and for the second was 1:1 –

and partially due to the increase in technical issues when using 3 sets of equipment as opposed to 1. However this duration difference did notsignificantly predict learning scores or physiological-recording data.

Next in phase 3 the participants were seated at computers, visually and aurally isolated from one another in condition 2, and playeda game tutorial and the first of two 30 min gaming sessions. In each game session, game difficulty was held fixed at the lowest level andplayers completing the game were asked to play again, to maintain similar experiences for all participants.

The two game sessions were broken by phase 4. For condition 1 this consisted only of answering two quick experiential self-reportquestionnaires, the Self Assessment Mannikin (Bradley & Lang, 1994) and the Perceived Competence Survey (Guay, Ratelle, & Chanal,2008). Condition 2 differed from condition 1 by the presence of a reflection period during phase 4: the players were brought into a grouptoparticipate in aguideddiscourse reflectingon their gameexperience, in addition tocompleting the self-reports. Thisdiscussionwas theonlypoint atwhich participants in condition 2were not visually and aurally isolated fromeach other, so as to create a similar playing experience inboth conditions. The lead experimenter directed the discussion period so that it remained on topic, encouraging free discussion.

In phase 5 the second game sessionwas played. The sixth and final phase of the experiment (after removing the monitoring equipment)was to answer the 41 questions a second time, taking on average 34.4 min (SD¼ 12.7 min) againwithout significant difference in time taken(t(43) ¼ 0.95, ns.).

Repeat assessment A in exam setting & End interview

Removal of psychophysiological sensors

Game session 2: continuing to play scenario S with learning content C

Peacemaker: Tutorial and Game session 1: scenario S with learning content C

Guided group reflection on in-game performanceEmotional self-report,

proceed straight to session 2

Emotional self-report. Colocate subjects

Placement of psychophysiological sensors : Varioport ARM device

Assessment A on topics relating to content C - in exam settingQualitative assessment: Open questions

Quantitative assessment: structured questions

G1: 20 subjects, 1 at atime

G2: 25 subjects, 2 or 3 at a time, separate playing location0

1

2

3

4

5

6

Fig. 2. The experiment protocol setting out activities for 2 groups. Participants first take a questionnaire to assess their knowledge of the topic area, psychophysiological sensors arethen attached and they engage in two game play trials. After sensor detachment they take the same questionnaire again. Markers are recorded during game play for the followingevents: Menu click (‘start’, etc); In-Game action click; Click for in-Game information; Unprompted (‘inciting’) event; End game (‘finish’).

B. Cowley et al. / Computers & Education 60 (2013) 299–309302

Author's personal copy

The phases described were designed to help achieve a measurable learning outcome. Orientation of the participants by the pre-test wasa concern which the long period of sensor attachment may have partially addressed. The game session length was maximised with respectto the overall length of the experiment and the other periods, to enable a better chance of learning by sufficient exposure.

3.1. Proxy game

The Peacemaker serious game was designed to teach a peace-oriented perspective on the Israel–Palestine conflict. For a thorough studyon the interaction effects between psycho-social personalities of players and their performance in Peacemaker see Gonzalez and Czlonka(2010).

It is a point-and-click strategy game, where the player acts as a regional leader and must choose how to react to the (deteriorating)situation, deploying more or less peaceful options from diplomacy and cultural outreach to police and military intervention. Prerequisiteswere set for the game choice – it must model a conflict negotiation scenario (reminiscent of the aims of TARGET); the interaction mechanicsshould be simple and intuitive yet rich, and pragmatically it must be possible to instrument so all player actions could be logged.4 Peacemakergave an added advantage – it implied through its subject matter the potential for a sincere emotional reaction from players, which fora psychophysiological study of learning is a distinct advantage over more ‘logistical’ or ‘managerial’ style of play (Bateman & Boon, 2005).

The Peacemaker game was not chosen arbitrarily – six criteria were used to evaluate a number of diverse games, chosen for theirsuitability in terms of educational theme and capacity to be instrumented for play logging. This entire process is described in detail inAppendix A. Another important aspect of the game choice and pre-processing was the assessment of learning – deriving the questionswhich were put to the participant. The questions had to be answered pre-game and so they could not reference too specifically the contentin the game, but must be answered again post-game and so also be able to elicit the participant’s learning of the topic represented by thatcontent. The questions also need to address several levels of learning, which the game must therefore provide scope for.

3.2. Assessment questions

The testquestionsneeded tomeasure learningatseveral levels and in linewith the theoretical approach to learningalreadyestablishedwithinthe TARGET project. The questions were generated by mining the content of the game, assigning Bloom taxonomy (Anderson, Krathwohl, &Bloom, 2001) levels based on complexity of interactions between content in the question itself and the acceptable answers to the question.For instance,first order interactions exist between a question such as “What is the religious capital of Israel?” and the answer “Jerusalem”, whichwould place this question at the first Bloom level. The Bloom levels describe the difficulty of attaining a particular level of learning, the levelsthemselvesbeingrepresented (inBloom’s system)bydescriptionsof thekindsof contentonewouldproduceto showattainmentof suchlearning.

Belowwe describe the format of questions, and briefly the assessment protocols for each question format. Table 1 gives an outline of therelationship between types of questions, the Bloom level assigned to them and the game data or experience which the question addresses –it also lists the number of such questions asked (or part thereof for multi-item rating questions).

3.2.1. Learning score derivedAssessment strategies were developed for each level. For the Bloom levels 1–5 we derived a ‘correct’ answer from the game

documentation and data mining of empirical records (logs) of games played – i.e. a ‘truth’ value in relation to each question wasestablished by studying what the game had shown the players. Using these answers we assessed fixed-choice responses by scoring thedifference between the participant’s first and second response with respect to how much more accurate (or inaccurate) they became,i.e. gain scores. Normalised gain scores were considered a non-prejudicial approach with high flexibility, in that the gain could bereadily transformed for weighting or data exploration (as advised by Lord, French, and Crow (2009, p. 22), who also detail otherpotential scoring methods). For questions (of level 1–5) which requested specific information but allowed open answers (free textinput) we defined a synonymy set, that is, a set of answers which could legitimately be given in lieu of the ‘correct’ answers. Ratingquestions were assessed by a formula which preserved the magnitude of the participant’s response preference without giving anarbitrary ‘truth’ value to the rating item.

Before summation to a final learning score, gain scores for each question were weighted by the Bloom level rating of the questionassociated (giving more weight to questions which theoretically indicated a higher level of learning) and then normalised. Initially, weightswere the product of the gain score and the number of the Bloom level, which gives a linear increase in importance over Bloom levels. Yet the‘learning value’ of the Bloom levels are not defined in a scalar sense, only as ordinals, so there is more than one option supported by theoryfor weighting each level. For instance, the importance of learning at higher levels could be considered parametrically greater than lowerlevels (because mastery at each level is considered to require mastery at all the lower levels first): applying this changes the weight valuesfrom linear scaling [1, 2, 3, 4, 5, 6] into exponential scaling e.g., [1, 2, 4, 8, 16, 32]. We tested permutations of the weighting options for eachBloom level, including linear vs. exponential scaling, and the latter option provided the most differentiated learning outcomes (for morerationale on linear vs. non-linear weighting see Gribble, Meyer, and Jones (2003, p. 26)).

Thus our final dependent variable (DV) was the exponentially weighted normalised gain score, with parametrically increasing impor-tance of Bloom levels, which wewill refer to as gainExp (mean m¼ 12.02, SD s¼ 12.06, range [�13.56, 41.59]). The gainExp distribution (splitby Condition) was normal where Condition 1 was D(20) ¼ 0.12, ns and Condition 2 was D(25) ¼ 0.08, ns. For the two conditions the varianceswere equal, F(1, 43) ¼ 1.24, ns.

To assess the Bloom level 6 open questions we used a more qualitative approach, whose final quantificationwas not directly comparable(on an interval scale) to the level 1–5 questions. Instead of basing assessment on empirical analysis of game data, we judged responses by sixdetailed criteria, using two separately-working assessors adjudicated by a third. Each of the six criteria below was applied to the pre- andpost-test responses and the potential ‘gain’ was analysed by comparison.

4 The Peacemaker logging plugin was obtained through personal communication with the game producer Eric Brown.

B. Cowley et al. / Computers & Education 60 (2013) 299–309 303

Author's personal copy

1. Indications of finding, understanding central factors or components of a phenomenon.2. Indications of being able to take into account different perspectives.3. Indications of being able to approach the phenomena from different themes.4. Indications of being able to apply a principle derived from or presented in the game to another context.5. Indications of beingable to reflectupon theprinciplespresented in the gameandassessing their validityandapplicability in other contexts.6. Indications of self-reflection or questioning of one’s own beliefs.

Unfortunately, many participants did not demonstrate any level 6 learning, giving a score of zero and overall quite low variance. Also, theinter-rater reliability for the six level 6 questions was not good – the question “What is your current understanding of the causes for theIsraeli–Palestine conflict?” had Cohen’s kappa¼ 0.44, which is considered ‘moderate agreement’, but all others scored less than 0.4, i.e. poorto fair. Thus these responses are not used as dependent variables below.

3.3. Psychophysiological data acquisition and pre-processing

For data acquisition, we recorded the fullest range of psychophysiological signals possible given available resources. This included EEG,Electrocardiogram (ECG), ElectroOculogram (EOG), EDA, Respiration (Resp) and facial EMG. In this report we describe signals settings onlyfor results given below. Facial EMGwasmeasured at Zygomaticus major, Corrugator supercilii, and Orbicularis oculi (pars orbitalis). ECGwastaken from electrodes placed at the upper sternum (manubrium) and lower left-hand ribs, using the back of the neck as ground. Impedancetesting was carried out, and 8 min of baseline were recorded.

The preprocessing of data is at least as important to consider as the acquisition. ECG and facial EMG signals were examined by eye forartefacts (for instance in ECG there may be ectopic or arrhythmic beats) and corrected by interpolation using the Anslab toolbox for Matlab.We performed ‘QRS’ wave detection on ECG data, and calculated the Heart Rate (HR) and Heart Rate Variability (HRV). HR is the average ofinterbeat intervals (IBI) which are the distances between R-peaks from the QRS complex; while HRV was calculated by Formula 1.

HRV ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

i¼0ðxi � xiþ1Þ2n

s(1)

where xi is the occurrence time of the ith R-peak, and n is the number of peaks, within the elapsed time of measurement. Thus we obtain theHRV by the square root of the mean of the sum of squared differences between successive RR intervals (RMSSD). RMSSD is denoted as thebest predictor for variability of the heart rate of the short term time domain measures (Berntson et al., 1997; Malik, Camm, & Smith, 1996).Facial EMG and Respiration are characterised for analysis by the moving average of the rectified signal.

We took 1 min mean values from each participant’s psychophysiological data. The mean-values approach allowed us to test for rela-tionships that hold across trial duration – i.e. are not specific to individual events. The tutorial and reflection trials were excluded (as notbeing sufficiently comparable between conditions), and all remaining data were examined to check for distribution characteristics. Toachieve approximate normality, we rectified the data (summed with a constant so x � 1), and calculated the z-scores. After excluding anyrecords which had a z-score greater than 2.58 (i.e. any outliers plus the most extreme 1% of the distribution; for HRV 36 of 2302; forOrbicularis 30 of 2471), data were transformed by taking the square root. With all data �1, this transform preserved relative values whilehelping to correct skew. Although the data were still not normal by Kolmogorov–Smirnov tests, this is not unusual for large data setsaccording to Field (2009, p.139), whose visual criterion (histogram-to-normal curve matching) and z-score criterion (95% < 1.96, less than1% > 2.58) we used to judge that the data showed a good approximation to normal.

3.4. Statistical analysis

As mentioned our DV was gainExp. Since both conditions provide data on the psychophysiological correlates of playing, our independentvariables (IVs) were simply the physiological signals measured over time for each player; and our models also always accounted for thefactor Condition. The full list of IVs included: HR, HRV, fEMG Zygomaticus, fEMG Orbicularis Oculi, fEMG Corrugator Supercilii. Additonalfactors included the baseline measures for each of the IVs, and Condition.

Table 1Relationship between types of question, their Bloom level, and the game. Final column is the number of questions of that level that the player had to answer (for level 4questions the number in parentheses shows how many items were asked in multi-part rating questions).

Type of question Bloom Game data covered N

Recall/multiple choice questions 1 or 2 Geography, groups and leaders, polls, tutorial,timeline, intro videos

28

Application/active recall 3 The effect of actions available to the player:

� Security: retaliation, suppression,public order, prisoners

� Politics: international, domestic,cabinet, Palestinian Authority, workers, trade

� Construction: medical aid, civil aid, domesticor Palestinian infrastructure, settlements,security wall, refugee camps

5

Analysis of given data to provide inference 4 Sequences of actions where the outcome can bedefined with certainty, given the particular state/stage of the game. Which actions do or don’t work together.

4 (28)

Evaluation or judgement of given game scenarios 5 Strategies in the game with their probable outcomes. 4Synthesis of game data, or use of game

scenario, to relate real world situation;open format

6 Personal expressions/descriptions of topicalknowledge/lessons from game experience.

6

B. Cowley et al. / Computers & Education 60 (2013) 299–309304

Author's personal copy

The Generalized Estimating Equations (GEE) procedure in SPSS was used to test all hypotheses, to support a repeated measures modelover the 1 min means. We specified participant ID as the subject variable and trial number and minute as the within-subject variables. Onthe basis of the Quasi-likelihood under Independence Model Criterion (QIC), we specified autoregressive as the structure of the workingcorrelation matrix. We specified a normal distribution with identity as the link function.

GEE are an extension of the generalized linear model, and were first introduced by Liang and Zeger (Liang & Zeger, 1986). GEEs allowrelaxation of many of the assumptions of traditional regression methods such as normality and homoscedasticity, and provide the unbiasedestimation of population-averaged regression coefficients despite possible misspecification of the correlation structure. Where psycho-physiology is modelled in several variables, the usual assumption of independent observations would be violated. Unless the modelaccounts for the ‘within’ correlation, the result may inflate the Type II error – thus GEEs suit well the analysis of time series psychophys-iological data.

4. Results

Hypothesis 1a predicted that HR would correlate with learning scores. This was not supported by results, as shown by a GEE model forgainExp scores (dependent variable), including main effects of playing condition (factor), baseline HR (covariate) and task-level HR (co-variate). Task-level HR was not associated with gainExp with p > .9.

Hypothesis 1b predicted that HRV levels during the serious game playing would be related to improved learning scores.When predictinggainExp scores, we specified a model that included the main effects of playing condition (factor), baseline HRV (covariate), and task-levelHRV (covariate). In agreement with H1b, task-level HRV was positively associated with gainExp scores, B ¼ 0.003, SE ¼ 0.001, Wald Chi-Square (df ¼ 1) ¼ 4.11, p < .05. That is, increased HRV during game playing was associated with better learning.

To interpret Hypothesis 2 we required three models, one for each fEMG channel. Each was built with main effects of playing condition(factor), baseline fEMG channel (covariate) and task-level fEMG channel (covariate). However no model supported the hypothesis, all p > .1.

Hypothesis 2a proposed that HR would interact with fEMG at the Orbicularis Oculi to predict learning scores. This was not supported bya GEE model including the main effects of playing condition (factor), baseline HR (covariate), baseline Orbicularis Oculi EMG activity (co-variate), task-level HR (covariate), task-level Orbicularis oculi activity, and the task-level HR � task-level Orbicularis Oculi activityinteraction.

Hypothesis 2b stated that the interaction between HRV and Orbicularis oculi EMG activity would predict learning scores. We specifieda GEE model including the main effects of playing condition (factor), baseline HRV (covariate), baseline Orbicularis oculi EMG activity(covariate), task-level HRV (covariate), task-level Orbicularis oculi activity, and the task-level HRV � task-level Orbicularis Oculi activityinteraction. The interaction was significant, B ¼ 0.002, SE ¼ 0.001, Wald Chi-Square (df ¼ 1) ¼ 3.95, p < .05.

Fig. 3 shows the interaction more clearly by discretisation. The variables HRV and Orbicularis were cast into quartile bins. Plotting thesebinned variables against the mean learning scores, we can see that the correlation is mainly positive but at extremes Orbicularis activityseems to act as a mediator of the HRV-gainExp relationship – as discussed in Section 5.2.

Although not themain focus of this paper, the difference between conditions produced some relevant results (we have discussed inmoredetail elsewhere (Cowley, Heikura & Ravaja, submitted)). Testing by ANOVA, factor Condition predicted the learning scores with a negativerelationship between presence of reflection period and learning – F(1, 43) ¼ �8.1, p < .01 – we term this the ‘reflection-negative’ result. Therelationship between playing condition and HRV is also negative, such that players in condition 2 have a higher mean HRV, t(df ¼ 2300) ¼ �12.5, p < .001.

Fig. 3. Interaction between binned values of ECG HRV, EMG Orbicularis and mean gainExp learning scores. Binning is performed to allow a clear picture of the interaction.

B. Cowley et al. / Computers & Education 60 (2013) 299–309 305

Author's personal copy

5. Discussion

We begin with a short discussion of the condition-related results, which lend context to the psychophysiological discussion. Recall thatthe gainExp scores represent questions relating mostly to the context of the game. Thus the reflection-negative result may be explained byconsidering the effect on both working memory, and immersion in the game context, caused by the reflection period occurring betweengame sessions. The reflection period acted to refocus players, away from the context of the game, to the context of discussing the game (ibid).In addition to this, we now see a link between HRV and Condition, and it is known that the 0.10 Hz component of HRV has been found to bespecifically sensitive toworking memory demands, yet not to response or motor demands (Kramer &Weber, 2000, p. 802). This suggests thatthe unimpeded working memory was instrumental in achieving higher learning scores. To put it another way, the absence of the reflectionperiod in condition 1 may have allowed better concentration on, and short term retention of, the functional operation and other ‘smalldetails’ in the game. This context is relevant as we look at the physical state of the players during the game trials.

5.1. Hypothesis 1b

The positive correlation betweenHRV and gainExp implies two conclusions regarding the interaction between games and learners. Firstly,we observe that increased HRV is due to increased parasympathetic activity (PA) (Brownley, Hurwitz, & Schneiderman, 2000, p. 232). PAmayincrease due to attentional demands of an external stimulus, and increased attention leads to better learning. Thuswemay infer that intrinsicmotivation to focus on the in-game stimuli remains important, despite the oft-repeated claim that games are ‘inherently motivating’.

The second conclusion invokes a more complicated interaction. HRV has been associated with mental workload (Kramer &Weber, 2000, p.799), such that high mental workload results in a decrease in HRV – thus our result shows that high scoring learners experienced less mentalworkload.Mentalworkload has been conceptualised as the processing cost incurred to achieve an acceptable level of performance in the face ofvarying task requirements (environment remaining stable in this case) (Jorna,1992). The task of playing a gamemay be considered to be subjectto mental workload because it is time-limited, performance driven and compound (composed of multiple competing tasks). Based on this weexpect toseean inversecorrelationbetweenHRVandgainExp, butwesee theopposite. This canbesimplyexplainedwithamorenuancedpictureof game activity. Atomic game tasksmay be subject to mental workload, but an entire game is a series of tasks with interlinked learning curvesand skill-sets (Salen & Zimmerman, 2004),which thusmay be subject to the phenomenon of cognitive efficiency (Cobb,1997; Rypma, Berger, &D’Esposito, 2002). That is, players are asked tomeet sequential challenges by deploying skillswhich are learnt in an ordered sequence, such thatearlier skills are drawn upon to develop later skills (Cook, 2006). Thus, looking again at our two conditions, players whose concentration isuninterrupted may benefit more from the practice effect than those who practice the same amount, but split over shorter sessions.

The implication of the two conclusions is that learningmay bebetter achieved bya less-stressed, clearerbrain. Thus a learning game could beimproved by first inducing some degree of cognitive efficiency in players, such that the playing task becomes more automated and frees upmental resources (Cook, 2006),whichmay then be deployed to learn content and not just game-play skills. Thereafter theHRVmay be increased(facilitating increased learning) by tuning the attentional demands of the game – e.g., introducing interesting content beyond the skill-basedtasks, such as the rich virtual worlds of games like ICO (Sony, 2001) which carry information beyond the skills required to navigate them.

5.2. Hypothesis 2b

The HRV � Orbicularis interaction is positively related to gainExp, and given the H1b result, this suggests that Orbicularis activity signalsa mediator of HRV’s effect on learning. Orbicularis activity is often interpreted as an index of positive valence (Ekman & Friesen,1982), as thismuscle participates in the forming of a Duchenne smile and for most people is an involuntary movement (Ekman et al., 1990). It is worthnoting the nature of the Orbicularis measurement – it is not bipolar, as low values do not indicate displeasure, but rather lack of reactivefacial expression. Due to this and because not all enjoyment is characterised by Orbicularis-type facial expressions, we use the phrasepositive facial reactivity (PFR) instead of enjoyment for this discussion. Also of note is the link between affect and focused attention. Positiveemotions have been linked with broadened attentional focus, negative emotions with narrow focus (Isen, 2000). This finding was recentlyqualified by Gable and Harmon-Jones (2008), who state that the intensity of motivation is the key determinant responsible for the focalrange of attention – low intensity motivation, whether approach or withdrawal, results in broader attention.

Overall, the usually positive correlation between HRV and Orbicularis suggests that those with less mental workload also found theexperiencemore genuinely pleasurable. In linewith the concept of cognitive efficiency, we see that reducingmental workload almost alwaysimproved scoreswhatever the player’s PFR – participantswhowere less cognitively-challenged in the task also scored better. Therewere twoexceptions: although players in the extreme Orbicularis groups continued to increase scores at the highest level of HRV, players in quartilestwo and three decreased. And generally the Orbicularis values have a non-linear relationship to learning, for instance the lowest quartilecorresponds to the lowest scores, but the second quartile mainly corresponds to the highest scores. Therefore, a more detailed explanation iscalled for – if we consider that the combinedphysiological signals indicate a singular state ofmind, the behaviour-scoring relationshipmay beclarified by looking at each combination of HRV and Orbicularis, representing progressive levels of ease-of-play and positive reaction.

The lowest Obicularis quartile implied near-total lack of PFR (perhaps indicating boredom or only negative reactions) and led toconsistently bad relative scoring; but for such participants, to have decreased levels of mental workload consistently meant better results –i.e. boredom mediated by ease of performance may not equate good engagement, but could here still produce results.

For the two middle quartiles of Orbicularis, reduced mental workload also meant better scores, up to a point – beyond which we suspectthe task had insufficient challenge for participants since difficulty levels were held fixed. Combined with high-to-mediummental workload,middle-quartile Orbicularis participants were usually the best scorers; so it is probably good not to be too relaxed when task-demands arehigh and some focused attention is needed. Nor is it good to have too little task-demand, unless the task elicits no PFR or very much PFR.

In the latter case, i.e. for the fourthOrbicularis quartile, participants’ high PFRmay imply a lack of focused attention (compared to quartile 2);athigher taskdemand levels thiswasnot good (compared toquartile 2). But the lowest level ofmentalworkloadappears tohavemediatedbroadattention to produce a ‘harmonious optimum’. In all three upper HRV � Orbicularis quartiles, it may be the case that improved game perfor-mance led to higher PFR (smiling after scoring), and game performance followed from cognitive efficiency, but this is not confirmed fromanalysis of the game scores.

B. Cowley et al. / Computers & Education 60 (2013) 299–309306

Author's personal copy

While acknowledging that enjoyment can be characterised bymore than just the Orbicularis-type facial expressions, this finding may beinterpreted to suggest that in order to learn best, an emotionally-positive, cognitively-efficient state of mind was conducive; with the caveatthat ability to engage the game tasks must come before ‘enjoying it’.

5.3. Further issues and future work

There were a number of issues with study design which need to be addressed by replication or further studies. Of common concern withpsychological studies, the age range of the sample was small reflecting the student body from which it was recruited and further work wouldbenefit from recruiting directly from companies or other professional bodies (at whom the funding project was aimed). The design had anunrecognisedflaw in changing the amountof time for phase four between conditions. Ideally the solo condition groupwouldhavehad the time toself-reflect alonewhilst groupswere interacting. In the event, suchdifferenceswere controlled statisticallywerepossible. Thisflawmayhavebeeninadvertently beneficial however – given the results reported here pointing toworkingmemoryas amediator of condition differences in learningscores, if both conditionshad featured longbreaks interrupting theplayers’ascentof the learning-curve thenperhapsphysiological datawouldnothave been strongly different enough between conditions to highlight the perils of using exactly this kind of break in pedagogical game design.

The test duration (to answer 41 questions) was fairly long, and while participants were remunerated for their time, a short overalldurationmay run less risk of incurring automatic answering. In future studies, the research questions could be refined to allow less chance ofconfounds entering the results. An ideal multi-site study would partition research questions over multiple sessions of learning, and haveparticipants practice the game beforehand in a structured manner to ensure the effects in tests relate to content learning.

The game chosen as a test-bed was (the authors believed) emotionally evocative – this was a design decision made to improve the effectsize of psychophysiological signals. The corollary is that our results might be less evident for a serious game which doesn’t evoke emotionalreactions from players. However we did see a group with no PFR, who nevertheless were affected strongly by their HRV. Also, becausephysiological correlates of emotion vary continuously in strength, then our results may be expected to holdwith varying strength dependingon the emotional impact of the learning game.

Finally in future work, for the same study some results may still be expected, as analysis is on-going for EEG and game-event reactions.This may help to clarify the picture of cognitive efficiency hinted at here. Any study involving games is tackling a highly multi-modalprotocol, with very many variables and interactions of interest, and this complex picture requires careful probing.

6. Conclusions

We reported on a study into learning effects in serious games, investigating the impact of an inserted reflection period, in addition tostudying the psychophysiological correlates of learning. For studying game experiences psychophysiological measures have several advan-tages over traditional self-report:measurements are continuous, fairly covert and indexnon-consciousemotional,motivational, and cognitiveprocesses (Ravaja, 2004). The learning assessment was done in two parts, fixed-format and open questions, though significant results applyonly to the fixed-format questions. We found that increased HRV alone, and HRV � Orbicularis Oculi, both predicted learning scores.

Positively correlated HRV, Orbicularis and learning scores imply that low mental-workload and positive affective reaction signify betterlearning at the Bloom levels 1–5. If the nature of the pedagogy is such that these levels of learning are important, as with simulation gamesfor learning basic skills, then it is important to consider this result in designing the game and its application. How this might best be donedepends on context – people differ in their learning, social and play styles (Goldberg, 1993). For each person, some forms of learning aremore entertaining than others (Bateman & Boon, 2005). The learning principles that naturally occur in games are covered comprehensivelyby Gee (2003), while Cowley, Moutinho, Bateman, and Oliveira (2011) give an example of the application of such principles to the design ofa serious game. Since Bloom levels 1–5 relate mainly to information content and game processes, it is plausible that learning at this levellargely depends on the ability to separate task-demands from semantically-relevant stimuli, while being positively engaged by the process.That suggests including significant training time (not often an option in corporate environments) or designing tasks for naturally-occurringskills (e.g., motion-based games) and semantic spaces with naturally-occurring signs.

Given how often both engagement and enjoyableness are cited as mediating factors in learning and these features of entertainmentgames are held up as desirable to emulate in serious games, it is important to study their psychophysiological mechanisms – to whichpurpose we submit this small work.

Acknowledgements

The authors would like to thank research assistants Lauri Janhunen, Siiri Peli and Svetlana Kirjanen in conducting this study, as well asMarco Rapino and Simo Jarvela for technical assistance. Wewould also like to warmly thank Eric Brown for his help in using the Peacemakergame and its monitoring software.

This work has been funded by the EU project TARGET (IST 231717). This publication reflects the views only of the authors, and theCommission cannot be held responsible for any use which may be made of the information contained therein.

Appendix A

In order to select a suitable proxy game for our study, ten diverse serious games were shortlisted {Science Supremo, Thinking Worlds: Sims,DoomEd, Making History, Colobots, Typing for Racing, IBM Innov8 game, Peacemaker, A Force More Powerful, Facade}. The gameswere chosen fortheir suitability: to one degree or another they all met the same prerequisites as were listed for Peacemaker above.

From this list the Peacemaker game was chosen by three assessors using six criteria {Content complexity, Simple controls, Relevance toTarget, No Nash equilibrium, No biasing content, Surety of learning outcome} to evaluate suitability. The first and second criteria wereweighted(�3 and�2 respectively) due to their importance to the particulars of the experiment – namely that players had to be able to learn from thegame, and that learning had be assessable. The games were played independently by the assessors, who rated each game on a scale from 1 to5 for each criterion. The average of all criteria was taken and rounded to obtain a final objective score for which Peacemaker performed best.

B. Cowley et al. / Computers & Education 60 (2013) 299–309 307

Author's personal copy

The outcome of this scoring method is included in Table 2.

Appendix B

Table 2The weighting of a selection of candidate games to fairly choose a proxy for target

Requirements criteria Game Sum

Correct content complexity Simple controls Relevant to Target Nash equili-brium No biasing content Surety of learning outcome

x3 x2 x1 x1 x1 x1

6 10 2 5 2 5 Science Supremo 306 8 3 4 3 4 Thinking Worlds: sims 2812 6 3 3 3 3 DoomEd 3015 2 4 2 4 2 Making History 2915 8 3 3 3 3 Colobots 353 10 1 5 3 5 Typing for Racing 279 10 4 4 3 5 IBM Innov8 game 3512 8 5 5 3 3 Peacemaker 3615 4 5 3 3 3 A Force More Powerful 3312 6 4 5 4 2 Facade 33

Fig. 4. The unbinned values of HRV for each quartile of Orbicularis, plotted against gainExp learning scores.

Author's personal copy

References

Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York:Longman.

Bateman, C., & Boon, R. (2005). 21st Century game design, Vol. 1. London: Charles River Media.Berntson, G. G., Bigger, J. T., Jr., Eckberg, D. L., Grossman, P., Kaufmann, P. G., Malik, M., et al. (1997). Heart rate variability: origins, methods, and interpretive caveats.

Psychophysiology, 34(6), 623–648.Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry,

25(1), 49–59.Brownley, K. A., Hurwitz, B. E., & Schneiderman, N. (2000). Cardiovascular psychophysiology. In J. T. Cacioppo, L. G. Tassinary, & G. G. Berntson (Eds.), Handbook of

psychophysiology (2nd ed.) (pp. 224–264). Cambridge: University of Cambridge.Burak, A., Keylor, E., & Sweeney, T. (2005). In M. Maybury, O. Stock, & W. Wahlster (Eds.), PeaceMaker: a video game to teach peace intelligent technologies for interactive

entertainment, Vol. 3814 (pp. 307–310). Berlin/Heidelberg: Springer.Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (2000). Psychophysiological science. In J. T. Cacioppo, L. G. Tassinary, & G. G. Berntson (Eds.), Handbook of psychophysiology

(2nd ed.) (pp. 3–26). Cambridge: Cambridge University Press.Chaouachi, M., & Frasson, C. (2010). Exploring the relationship between learner EEG mental engagement and affect. In V. Aleven, J. Kay, & J. Mostow (Eds.), Intelligent tutoring

systems, Vol. 6095 (pp. 291–293). Berlin/Heidelberg: Springer.Chaouachi, M., Jraidi, I., & Frasson, C. (2011). Modeling mental workload using EEG features for intelligent systems. In J. Konstan, R. Conejo, J. Marzo, & N. Oliver (Eds.), User

modeling, adaption and personalization, Vol. 6787 (pp. 50–61). Berlin/Heidelberg: Springer.Cobb, T. (1997). Cognitive efficiency: toward a revised theory of media. Educational Technology Research and Development, 45(4), 21–35.Cook, D. (2006). The chemistry of game design. Gamasutra, Retrieved from: http://www.gamasutra.com/view/feature/1524/the_chemistry_of_game_design.php.Cowley, B., Heikura, T., & Ravaja, N. submitted. The effect of guided reflection on experience-based learning in a serious game activity.Cowley, B., Moutinho, J., Bateman, C., & Oliveira, A. (2011). Learning principles and interaction design for ‘Green My Place’: a massively multiplayer serious game. Enter-

tainment Computing, 2(2), 10.Ekman, P., Davidson, R. J., & Friesen, W. V. (1990). The Duchenne smile: emotional expression and brain physiology. II. Journal of Personality and Social Psychology, 58(2), 342–

353.Ekman, P., & Friesen, W. V. (1982). Felt, false, and miserable smiles. Journal of Nonverbal Behavior, 6(4), 238–252.Field, A. P. (2009). Discovering statistics using SPSS: (and sex and drugs and rock ’n’ roll). London: SAGE Publications.de Freitas, S. (2006). Learning in immersive worlds. Report for Joint Information Systems Committee JISC. Bristol, UK. Available at:. http://www.jisc.ac.uk/eli_outcomes.html.Gable, P. A., & Harmon-Jones, E. (2008). Approach-motivated positive affect reduces breadth of attention. Psychological Science, 19(5), 476–482.Gee, J. P. (2003). What video games have to teach us about learning and literacy. New York: Palgrave Macmillan.Goldberg, L. R. (1993). The structure of phenotypic personality traits. The American Psychologist, 48(1), 26–34.Gonzalez, C., & Czlonka, L. (2010). Games for peace. In J. Cannon-Bowers, & C. Bowers (Eds.), Serious game design and development (pp. 134–149). Hershey, PA: Information

Science Reference.Gribble, J., Meyer, L., & Jones, A. (2003). Quantifying and assessing learning objectives. Working Paper Series Paper no. 112. Australia: Centre for Actuarial Studies, University of

Melbourne, Available at: http://repository.unimelb.edu.au/10187/665.Groos, K. (1898). The play of animals: A study of animal life and instinct. Lond: Chapman & Hall.Guay, F., Ratelle, C. F., & Chanal, J. (2008). Optimal learning in optimal contexts: the role of self-determination in education. Canadian Psychology, 49(3), 233–240.Harmon-Jones, E. (2003). Clarifying the emotive functions of asymmetrical frontal cortical activity. Psychophysiology, 40(6), 838–848.Heraz, A., Jraidi, I., Chaouachi, M., & Frasson, C. (2009). Predicting stress level variation from learner characteristics and brainwaves. In Paper presented at the international

conference on artificial intelligence in education.Isen, A. M. (2000). Some perspectives on positive affect and self-regulation [Editorial Material]. Psychological Inquiry, 11(3), 184–187.Johnston, D. W., Anastasiades, P., & Wood, C. (1990). The relationship between cardiovascular responses in the laboratory and in the field. Psychophysiology, 27(1), 34–44.Jorna, P. G. (1992). Spectral analysis of heart rate and psychological state: a review of its validity as a workload index. Biological Psychology, 34(2–3), 2–3.Kettunen, J., Ravaja, N., Naatanen, P., & Keltikangas-Jarvinen, L. (2000). The relationship of respiratory sinus arrhythmia to the co-activation of autonomic and facial responses

during the Rorschach test. Psychophysiology, 37(2), 242–250.Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based,

experiential, and inquiry-based teaching. Educational Psychologist, 41(2), 75–86.Kivikangas, J. M., Chanel, G., Cowley, B., Ekman, I., Salminen, M., Järvelä, S., et al. (2011). A review of the use of psychophysiological methods in game research. Journal of

Gaming & Virtual Worlds, 3(3), 181–199.Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development. Beverley Hills: Sage Publications.Koster, R. (2005). A theory of fun for game design. Scottsdale, AZ: Paraglyph Press.Kramer, A. F., & Weber, T. (2000). Applications of psychophysiology to human factors. In J. T. Cacioppo, L. G. Tassinary, & G. G. Berntson (Eds.), Handbook of psychophysiology

(2nd ed.) (pp. 794–814). Cambridge, UK: Cambridge University Press.Lang, P. J., Greenwald, M. K., Bradley, M. M., & Hamm, A. O. (1993). Looking at pictures: affective, facial, visceral, and behavioral reactions. Psychophysiology, 30(3), 261–273.Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge [England]; New York: Cambridge University Press.Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.Lord, T. R., French, D. P., & Crow, L. W. (2009). College science teachers guide to assessment. Arlington, Va: National Science Teachers Association.Malik, M., Camm, A. J., & Smith, J. M. (1996). Heart rate variability. Journal of Cardiovascular Electrophysiology, 7(4), 386.Mandryk, R. L., & Atkins, M. S. (2007). A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. International Journal of

Human-Computer Studies, 65(4), 329.McGinnis, T., Bustard, D. W., Black, M., & Charles, D. (2008). Enhancing E-learning engagement using design patterns from computer games. In Proceedings of the first

international conference on advances in computer-human interaction (pp. 124–130). ACM.O’Neil, H. F., Wainess, R., & Baker, E. L. (2005). Classification of learning outcomes: evidence from the computer games literature. Curriculum Journal, 16(4), 455–474.Peters, R. S. (1960). The concept of motivation. London; New York: Routledge & Kegan Paul; Humanities Press.Ravaja, N. (2004). Contributions of psychophysiology to media research: review and recommendations. Media Psychology, 6(2), 193–235.Ravaja, N., Saari, T., Kallinen, K., & Laarni, J. (2006). The role of mood in the processing of media messages from a small screen: effects on subjective and physiological

responses. Media Psychology, 8(3), 239–265.Ravaja, N., Saari, T., Salminen, M., Laarni, J., & Kallinen, K. (2006). Phasic emotional reactions to video game events: a psychophysiological investigation.Media Psychology, 8(4),

343–367.Ravaja, N., Saari, T., Turpeinen, M., Laarni, J., Salminen, M., & Kivikangas, M. (2006). Spatial presence and emotions during video game playing: does it matter with whom you

play? Presence: Teleoperators and Virtual Environments, 15(4), 381–392.Rypma, B., Berger, J., & D’Esposito, M. (2002). The influence of working-memory demand and subject performance on prefrontal cortical activity. Journal of Cognitive

Neuroscience, 14(5), 721–731.Salen, K., & Zimmerman, E. (2004). Rules of play: Game design fundamentals, Vol. 1. London: MIT.Squire, K. (2006). From content to context: videogames as designed experience. Educational Researcher, 35(8), 19–29.Tassinary, L. G., & Cacioppo, J. T. (2000). The skeletomotor system: surface electromyography. In J. T. Cacioppo, L. G. Tassinary, & G. G. Berntson (Eds.), Handbook of

psychophysiology (2nd ed.) (pp. 163–199). Cambridge UK: Cambridge University Press.Tomaka, J., Blascovich, J., Kelsey, R. M., & Leitten, C. L. (1993). Subjective, physiological, and behavioral effects of threat and challenge appraisal. Journal of Personality and Social

Psychology Journal of Personality and Social Psychology, 65(2), 248–260.Turpin, G. (1986). Effects of stimulus intensity on autonomic responding: the problem of differentiating orienting and defense reflexes. Psychophysiology, 23(1), 1–14.Witvliet, C. V., & Vrana, S. R. (1995). Psychophysiological responses as indices of affective dimensions. Psychophysiology, 32(5), 436–443.

B. Cowley et al. / Computers & Education 60 (2013) 299–309 309