Learning and Individual Differences - MSR Lab

16
Contents lists available at ScienceDirect Learning and Individual Dierences journal homepage: www.elsevier.com/locate/lindif Is academic diligence domain-specic or domain-general? An investigation of the math, verbal, and spatial academic diligence tasks with middle schoolers Catherine A. Spann a, , Alisa Yu b , Brian M. Galla c , Angela L. Duckworth d , Sidney K. D'Mello a a University of Colorado Boulder, United States of America b Stanford University, United States of America c University of Pittsburgh, United States of America d University of Pennsylvania, United States of America ARTICLE INFO Keywords: Academic diligence Grit Self-control Academic Diligence Task Performance measures ABSTRACT We tested the domain-specicity or domain-generality of academic diligence in middle-school students using the Academic Diligence Task (ADT), a performance task that assesses eort on tedious problems in the face of digital distractions. Students in 8th grade (N = 439) were randomly assigned to individually complete a math, verbal, or spatial ADT or to a combination of all three. Conrmatory factor analyses suggested domain-generality, as did the fact that ADT scores in a given domain did not dierentially predict academic achievement in that domain. Results indicated that all three ADTs had adequate external and predictive validity, but convergent validity varied. Whereas both math and verbal ADT scores correlated with teacher-reports of grit and self-control, only math scores consistently correlated with self-reports of the same constructs; these measures did not correlate with spatial ADT scores. Thus, the math ADT is the best performance measure of diligence, followed by the verbal ADT. What we hope ever to do with ease we may learn rst to do with dili- gence.Samuel Johnson 1. Introduction Homework or social media? Students often select the latter, choosing an immediately gratifying option over one requiring sustained eort and self-control (Duckworth, Taxer, Eskreis-Winkler, Galla, & Gross, 2019). Can we blame them? Remaining engaged in tedious school work when digital distractions abound is a challenge. Whereas short breaks upon completion of a certain amount of work or elapsed time might be a benecial recharge(Ariga & Lleras, 2011; Rhee & Kim, 2016), disengaging from schoolwork over extended periods of time has its costs. Thus, students must have diligence to stay focused through the tedium of developing academic skills, all the while resisting the temptation to engage in something more immediately rewarding. In the words of a student, I wanted to nish an essay but I also wanted to watch Netix and I was at a great part in [a show called] The Oce so The Oce was more important at that time, but I eventually did the essay.(Duckworth, White, Matteucci, Shearer, & Gross, 2016). We previously developed a performance-based measure called the Academic Diligence Task (ADT) to simulate this very conundrum (Galla, 2014). In the ADT, students have free choice between good for youmath skill-building problems while being tempted with en- tertaining distractions (i.e., Tetris, YouTube videos) and can repeatedly swap between the two. A validation study suggested that the ADT is a reliable and valid measure of diligence (Author, 2014). Specically, after controlling for socio-demographics and uid intelligence, high school students who performed better on the ADT had higher grade point averages (GPAs 1 ), better math and reading standardized test scores, and were more likely to graduate high school and enroll in college. Performance on the math ADT also exhibited convergent va- lidity with self-reported grit and self-control, both facets of con- scientiousness, as well as discriminant validity from openness, https://doi.org/10.1016/j.lindif.2020.101870 Received 29 January 2019; Received in revised form 19 December 2019; Accepted 20 March 2020 This research was made possible by the Templeton Foundation and the Walton Family Foundation. The content is solely the responsibility of the authors and does not necessarily represent the ocial views of the funding agencies. Corresponding author at: University of Colorado Boulder, Institute of Cognitive Science, Boulder, CO 80309, United States of America. E-mail addresses: [email protected] (C.A. Spann), [email protected] (S.K. D'Mello). 1 Grade point average is a numeric measure of academic achievement widely used in the U.S. Learning and Individual Differences 80 (2020) 101870 1041-6080/ © 2020 Published by Elsevier Inc. T

Transcript of Learning and Individual Differences - MSR Lab

Contents lists available at ScienceDirect

Learning and Individual Differences

journal homepage: www.elsevier.com/locate/lindif

Is academic diligence domain-specific or domain-general? An investigationof the math, verbal, and spatial academic diligence tasks with middleschoolers☆

Catherine A. Spanna,⁎, Alisa Yub, Brian M. Gallac, Angela L. Duckworthd, Sidney K. D'Melloa

aUniversity of Colorado Boulder, United States of Americab Stanford University, United States of AmericacUniversity of Pittsburgh, United States of AmericadUniversity of Pennsylvania, United States of America

A R T I C L E I N F O

Keywords:Academic diligenceGritSelf-controlAcademic Diligence TaskPerformance measures

A B S T R A C T

We tested the domain-specificity or domain-generality of academic diligence in middle-school students using theAcademic Diligence Task (ADT), a performance task that assesses effort on tedious problems in the face of digitaldistractions. Students in 8th grade (N = 439) were randomly assigned to individually complete a math, verbal,or spatial ADT or to a combination of all three. Confirmatory factor analyses suggested domain-generality, as didthe fact that ADT scores in a given domain did not differentially predict academic achievement in that domain.Results indicated that all three ADTs had adequate external and predictive validity, but convergent validityvaried. Whereas both math and verbal ADT scores correlated with teacher-reports of grit and self-control, onlymath scores consistently correlated with self-reports of the same constructs; these measures did not correlatewith spatial ADT scores. Thus, the math ADT is the best performance measure of diligence, followed by theverbal ADT.

“What we hope ever to do with ease we may learn first to do with dili-gence.”

–Samuel Johnson

1. Introduction

Homework or social media? Students often select the latter,choosing an immediately gratifying option over one requiring sustainedeffort and self-control (Duckworth, Taxer, Eskreis-Winkler, Galla, &Gross, 2019). Can we blame them? Remaining engaged in tediousschool work when digital distractions abound is a challenge. Whereasshort breaks upon completion of a certain amount of work or elapsedtime might be a beneficial “recharge” (Ariga & Lleras, 2011; Rhee &Kim, 2016), disengaging from schoolwork over extended periods oftime has its costs. Thus, students must have diligence to stay focusedthrough the tedium of developing academic skills, all the while resistingthe temptation to engage in something more immediately rewarding. In

the words of a student, “I wanted to finish an essay but I also wanted towatch Netflix and I was at a great part in [a show called] The Office so TheOffice was more important at that time, but I eventually did the essay.”(Duckworth, White, Matteucci, Shearer, & Gross, 2016).

We previously developed a performance-based measure called theAcademic Diligence Task (ADT) to simulate this very conundrum(Galla, 2014). In the ADT, students have free choice between “good foryou” math skill-building problems while being tempted with en-tertaining distractions (i.e., Tetris, YouTube videos) and can repeatedlyswap between the two. A validation study suggested that the ADT is areliable and valid measure of diligence (Author, 2014). Specifically,after controlling for socio-demographics and fluid intelligence, highschool students who performed better on the ADT had higher gradepoint averages (GPAs1), better math and reading standardized testscores, and were more likely to graduate high school and enroll incollege. Performance on the math ADT also exhibited convergent va-lidity with self-reported grit and self-control, both facets of con-scientiousness, as well as discriminant validity from openness,

https://doi.org/10.1016/j.lindif.2020.101870Received 29 January 2019; Received in revised form 19 December 2019; Accepted 20 March 2020

☆ This research was made possible by the Templeton Foundation and the Walton Family Foundation. The content is solely the responsibility of the authors and doesnot necessarily represent the official views of the funding agencies.

⁎ Corresponding author at: University of Colorado Boulder, Institute of Cognitive Science, Boulder, CO 80309, United States of America.E-mail addresses: [email protected] (C.A. Spann), [email protected] (S.K. D'Mello).

1 Grade point average is a numeric measure of academic achievement widely used in the U.S.

Learning and Individual Differences 80 (2020) 101870

1041-6080/ © 2020 Published by Elsevier Inc.

T

emotional stability, test anxiety, life satisfaction, and positive and ne-gative affect. Yeager et al. (2014) replicated the convergent validityfindings (Study 1) and also found that the ADT was sensitive to a briefsocial psychological intervention in that students who received a self-transcendent purpose (i.e., pro-social) for learning message had higherADT scores than those receiving a control message (Study 4).

Despite these promising results, the original investigation with theADT only assessed diligence in math. However, math is unique whencompared to other academic subjects. It is associated with subject-specific phenomena, such as math anxiety (Ashcraft, 2002) and mathstereotype threat (Spencer, Steele, & Quinn, 1999). Thus, the focus onmath in the original ADT might make it less generalizable to otheracademic subject domains.2 This raises the following questions: Does astudent who is diligent in math exhibit diligence in other subjects aswell? Moreover, does diligence in other domains predict achievementto the same degree as the original math diligence task? To address thesequestions, we examined the extent to which academic diligence is do-main-specific or general across academic subject domains.

Our findings have important implications for research and practice.That is, should measures and interventions target academic diligencemore broadly or will it be more effective to target diligence in separatedomains? If academic diligence is found to be domain general, thenboth domain-general and domain-specific instruments and interven-tions are appropriate. If, on the other hand, academic diligence appearsto be domain specific, then perhaps we should target the academicdomain in which diligence is weak.

Our work also contributes to broader efforts towards developingreliable and valid measures of individual differences in social-emotionalcompetencies in academic contexts. Performance tasks like the mathADT have some advantages over more traditional assessments includingself- and informant-reports in that they directly measure behavior; seeDuckworth and Yeager (2015) for a detailed discussion on thestrengths/limitations of various measurement approaches. However, itis unclear if changes to the task domain affect the validity of perfor-mance measures. For example, should a researcher interested instudying the effect of a particular instructional practice utilize the mathADT to assess diligence of students in an English class or should theyuse a version of the task more suited towards spelling or vocabularyitems? The current study aims to shed light on this question by devel-oping and validating versions of the ADT for two new domains.

Finally, given the increased calls for replicability and general-izability of psychological studies, we replicate the findings with theoriginal math ADT and extend it to other domains. We also study thegeneralizability of the ADTs to a different population. Compared to theoriginal ADT research on high school students, we focused on middleschool students, because the transition from early to middle adoles-cence represents a significant developmental window to promote po-sitive behaviors/attitudes towards schoolwork and address negativeones (Yeager, Dahl, & Dweck, 2018).

1.1. What is academic diligence?

Diligence is the ability to productively engage with tedious tasksdespite competing distractions (Author, 2014). Simply put, it meanschoosing the tedious, but important, over the more appealing, but un-important. This is a major challenge in the digital age where distrac-tions abound and multitasking during academic work is the norm. Forexample, according to a recent Common Sense Media Report, a largescale study on media use among 2600 youth,> 50% report engagingwith social media, watching TV, or texting while doing homework(CSM, 2015). Thus, when choosing between the competing goals ofacademics vs. entertainment, a diligent student chooses to sustain effort

on the academic task and delay gratification by resisting temptations.Thus, diligence is related to, but distinct from, psychological con-

structs such as self-control, grit, and persistence. Self-control involvesvoluntarily altering one's own responses to align them with personallyvalued goals especially when these goal conflict with more immediatelygratifying alternatives (Baumeister, Heatherton, & Tice, 1994;Duckworth et al., 2019; Hofmann, Baumeister, Förster, & Vohs, 2012).Diligence requires self-control to regulate impulses and resist thetemptation to engage in more pleasurable distractions. Whereas self-control is a broad umbrella concept often referred to as self-regulation,diligence is more specific, referring to the ability to engage in tediouswork-related activities with focus and concentration. We refer to dili-gence as academic diligence when the “work” is construed as an aca-demic skill building exercise such as homework.

Another related construct is grit—the inclination to pursue goalswith passion and perseverance over years and years (Duckworth,Peterson, Matthews, & Kelly, 2007). Although diligence and grit areconceptually related, the former addresses momentary decisions andthe latter longer-term pursuits that are intrinsically motivating(Duckworth & Gross, 2014). We also distinguish diligence from per-sistence, a more general term, which pertains to the pursuit of a courseof action or outcome in the face of difficulty, distress, and frustration(Meindl et al., 2019). Whereas persistence is a broad term that appliesto staying on a planned course despite rejection and failure (e.g., asticking with a diet despite lapses and little progress), we consider di-ligence to more narrowly apply to productively engaging in a valued,but tedious, activity despite more attractive alternatives.

Given the emphasis on tedium, we suggest that diligence requiresregulating boredom—defined as the aversive experience of wanting,but being unable, to engage in a more satisfying activity(Csikszentmihalyi, 1975; Eastwood, Frischen, Fenske, & Smilek, 2012;Pekrun, Goetz, Daniels, Stupnisky, & Perry, 2010). The need to remaindiligent in the face of boredom is not uncommon as boredom is a fre-quent emotion reported by students (Macklem, 2015). For example, oneearly study indicated that middle-school students report feeling boredapproximately 32% of the time in class (Larson & Richards, 1991),whereas a more recent study of 11th grade students indicated a muchhigher (58%) incidence rate of boredom (Nett, Goetz, & Hall, 2011).Further, mind wandering, which is related to boredom (Eastwood et al.,2012) has been reported to occur between 30% to 50% of the time inclassrooms (Varao-Sousa & Kingstone, 2019; Wammes, Boucher, Seli,Cheyne, & Smilek, 2016). Boredom is not merely an unpleasant fee-ling—it is associated with lower attentional control (Danckert &Merrifield, 2018; Eastwood et al., 2012; Hunter & Eastwood, 2018) andis consistently negatively correlated with learning (Pekrun, Hall, Goetz,& Perry, 2014; Putwain, Becker, Symes, & Pekrun, 2018). To this point,a recent meta-analysis on 29 studies (N = 19,052 students) found anoverall significant negative effect r = −0.24 of boredom on academicoutcomes (Tze, Daniels, & Klassen, 2016). Given the prevalence ofboredom and its negative relationship to learning, a diligent studentmust effectively regulate boredom in order to maintain attention andremain motivated to learn (see Nett et al. (2011) for strategies studentsuse to regulate boredom in academic settings).

1.2. Domain-specificity or domain-generality of academic diligence

Psychological constructs measured in academic settings are oftenmeasured in a domain-specific or subject-specific fashion, where itemsreference target domains (e.g., “I find my Math homework boring”).Examples include measures of emotions such as boredom and anxiety(Goetz, Frenzel, Pekrun, Hall, & Lüdtke, 2007), homework effort(Trautwein, Lüdtke, Schnyder, & Niggli, 2006), interest (Linnenbrink-Garcia et al., 2010), need for cognition (Keller, Strobel, Martin, &Preckel, 2019), self-concept (Brunner et al., 2010; Marsh, 1990), andtask-value (Gaspard, Häfner, Parrisius, Trautwein, & Nagengast, 2017),among others. Individual differences in these constructs generally show

2We use the term “domain” to narrowly refer to academic subjects like Math,English, and Science.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

2

domain specificity (e.g., Goetz et al., 2007; Marsh, 1990), meaningstudents experience different levels of the construct depending on theacademic subject. However, there are some exceptions. For example,performance goals appear to be more domain general (Bong, 2001), asis test anxiety (Sarason & Sarason, 1990), and whether students endorsea growth vs. fixed mindset (Dweck, 1986; Stipek & Gralinski, 1996).What about academic diligence?

A number of factors may contribute to the degree of domain-spe-cificity of academic diligence. First, diligence towards schoolwork maydepend on students' preconceived notions generated from lay theoriesabout a topic or cognitive schemas (Bartlett, 1932). For example, astudent may have a cognitive schema that social studies homeworkinvolves reading terse texts whereas science homework is aboutlearning about fun experiments and discoveries. For this student,completing social studies homework would require more diligence thanscience homework. Diligence may also depend on which academicsubject receives the most attention at the present time, perhaps due tofailing grades in a particular subject or, alternatively, which subject isof most interest to the student. In these examples, measures of diligenceacross domains should only correlate weakly and should predict aca-demic achievement in specific domains better than achievement inother domains.

Theoretical support for the notion that academic diligence might bedomain-specific can be found in its underlying psychological compo-nents discussed above. In particular, there is considerable evidence tosuggest that self-control is domain-specific (Duckworth & Tsukayama,2015), at least when domains are construed broadly (e.g., distin-guishing self-control in alcohol use vs. financial decisions). Similarly,impulsivity, the opposite of self-control, is distinct in the schoolworkand interpersonal domains (Tsukayama, Duckworth, & Kim, 2013).Diligence also requires the ability to regulate boredom, and researchsuggests that boredom is experienced and presumably also regulated ina domain-specific manner (Acee et al., 2010; Daschmann, Goetz, &Stupnisky, 2011; Goetz et al., 2007). For example, Acee et al. (2010)distinguish between boredom experienced in under-challenging vs.over-challenging academic situations, each potentially involving dif-ferent regulatory mechanisms (Nett et al., 2011). Students are alsolikely be more diligent in domains they value more, have mastery goals,and have higher-self efficacy, and research suggests that these moti-vational constructs vary by academic domain (Bong, 2001).

On the other hand, it can be argued that academic diligence is do-main-general. One argument is that diligence may be strongly influ-enced by general factors that are consistent across domains. In parti-cular, diligence is most closely related to the Big five personality trait ofconscientiousness (Author, 2014), and more specifically to its facts ofindustriousness and self-control (DeYoung, Quilty, & Peterson, 2007).Conscientiousness is widely viewed as a stable trait with consistentpatterns of thoughts, feelings, and behaviors across domains (Jackson &Roberts, 2017), which would imply that diligence should similarly berelatively consistent across domains. Another factor contributing to thedomain generality of diligence pertains to the overall well-being of thestudent. For example, mental health diagnoses, such as depression,have negative associations with academic achievement quite broadly(Fröjd et al., 2008; Owens, Stevenson, Hadwin, & Norgate, 2012). Ingeneral worries that consume a student's thoughts and emotions con-sume limited working memory resources (Curci, Lanciano, Soleti, &Rimé, 2013; Klein & Boals, 2001), making it difficult to remain diligentacross domains.

Further, from a developmental perspective, diligence might be moredomain-general for the middle school students studied here. To thispoint, Goetz et al. (2007) found that self-reported boredom (and otheracademic emotions) exhibited more domain-specificity with respect todifferent academic subjects for older compared to younger students(e.g., boredom in math and English classes correlated at 0.10 vs. 0.30for 11th vs. 8th graders). Similarly, motivational orientations also ap-pear to be more differentiated for high schoolers compared to middle-

school students (Bong, 2001). Thus, it might also be the case that di-ligence is more domain-general for our target population of middleschoolers.

In summary, there are theoretical arguments in favor of for both thedomain-specificity and domain-generality of academic diligence, whichwe empirically investigate here. This of course requires appropriatemeasures of diligence, which we consider next.

1.3. How is academic diligence measured?

There are no sufficiently validated self-report measures of academicdiligence (see Author, 2014 for a discussion), so our focus is on per-formance measures, which fall into two categories. In one camp, re-searchers equate diligence with persistence, assessed as the amount oftime participants remain on task before discontinuing. Hartshorne andMay (1929) undertook this approach in their landmark study of char-acter development. To assess multiple dimensions of character, theyasked several thousand fifth through eighth-grade students to completea large battery of performance tasks. Tasks measuring persistence in-cluded timing how long students spent unscrambling words in a story,how long they persisted in attempting to solve a wooden puzzle, andhow consistently they worked on a long series of simple addition pro-blems. Persistence measures have since been expanded to include te-dious activities (e.g., solving simple addition problems, crossing outletters on a sheet of paper), manual dexterity tasks (e.g., building ahouse of cards), sustained mental effort tasks (e.g., difficult anagrams,riddles, or math puzzles), and physical endurance tasks (e.g., prolongedarm extension) (Battle, 1965; Eisenberger, Kuhlman, & Cotterell, 1992;Ryans, 1938; Sansone, Weir, Harpster, & Morgan, 1992; Sansone,Wiebe, & Morgan, 1999; Ventura, Shute, & Zhao, 2013). Although thereis wide variation across performance tasks, in this paradigm, tasks arecompleted in isolation, and persistence is assessed by the amount oftime participants spend on a task before quitting.

The other approach incorporates distractions by providing partici-pants with a choice to do monotonous (but important) work or to dosomething more immediately gratifying. Performance is assessed as theamount of work completed within a certain amount of time and/or theproportion of available time spent working. This approach is inspiredby classic studies (Patterson & Mischel, 1975, 1976) of preschoolers'self-control, where children were told that they should work on a te-dious activity (e.g., copying a series of letters into a grid) in the pre-sence of a distracting box that was painted like a clown, made noises,and spoke to them. Children needed to exhibit diligence to resist theurge to engage with the clown box and remain on task (Patterson &Carter, 1979; Peake, Hebl, & Mischel, 2002). That task inspired thecreation of similar tasks for older children and adults (Gollwitzer &Schaal, 1998; Parks-Stamm, Gollwitzer, & Oettingen, 2010; Wieber, vonSuchodoletz, Heikamp, Trommsdorff, & Gollwitzer, 2011) and also in-spired the design of the original ADT (Author, 2014).

We focus on this latter measurement approach because it moreclosely aligns with our theoretical conceptualization of diligence in thatit captures the real-world dilemma of choosing a tedious academicactivity in the face of distractions. Perhaps with the rare exception ofthe examination hall, a student's surrounding typically affords nu-merous distractions through digital media, interacting with peers, andso on. Student can use these distractions as short work breaks, an ef-fective strategy for preventing burn-out during tedious tasks (Ariga &Lleras, 2011; Rhee & Kim, 2016). Further, with some exceptions, suchas doing homework immediately before a deadline, a student who quitsa task has the opportunity to resume it at a later time.

In line with the above, the ADT (Author, 2014) measures workproduct (or total time spent [after excluding breaks]) during a tedioustask in the face of distractions. In order to increase its ecological va-lidity, the ADT focuses specifically on academic tasks, and students areexplicitly informed of the importance of these tasks in building foun-dational academic skills. The ADT had the following additional

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

3

properties: (1) it focuses on academic skills that are relevant to middleschool and high school students, (2) the vast majority (> 90%) ofstudents can complete the tasks with 100% accuracy if engaged, (3) itrequires minimal prior knowledge and/or experience, and (4) each trialcan be solved in no more than a few seconds (< 10s); and (5) thedistractors are relevant and tempting for a majority of students; seeAuthor (2014) for a justification of these choices. The challenge forstudents, therefore, is two-fold: to remain engaged in a tedious, butimportant, academic task while simultaneously resisting the temptationto enjoy a gratifying, but unimportant, alternative.

1.4. The current investigation

We had two overarching goals. The first was to examine the domain-generality vs. domain-specificity of academic diligence (as assessed bythe ADT) on middle-school students. The second was to develop andvalidate versions of the ADT for new domains beyond math. To ac-complish these goals, we developed verbal and spatial ADTs and testedthem along with the existing math ADT. The verbal ADT was identicalto the math ADT except that it focused on word problems similar topractice exercises in spelling and vocabulary that a student might en-counter in an English class. We chose to work with the verbal domainsince it is a core subject that offered opportunities for basic practicesimilar to the math problems without reliance on specialized knowl-edge as in science and social studies. In contrast, we developed a novelspatial task distinct from any standard academic subject as a point ofcomparison. Our rationale was as follows. It is clear from the extantliterature (reviewed above) that students have preconceived cognitiveschemas and attitudes towards specific academic subjects (Gaspardet al., 2017; Goetz et al., 2007). Accordingly, students should relate themath and verbal tasks to the corresponding academic subjects, but thereshould be no direct associations for the spatial task. Thus, if the spatialADT shows different relationships compared to the math and verbalADTs, it would highlight how student's preconceived notions of aca-demic subjects might influence their behavior with the measurement ofdiligence by the ADT.

We conducted a study to adjudicate between competing hypothesesregarding the domain-specificity or domain-generality of academic di-ligence (our first goal) while also validating the new ADTs (our secondgoal). In our study, one sample of 8th grade students was asked tocomplete two, 1.5-minute blocks of each ADT (combined ADT sample),whereas three other samples of students completed three, 3-minuteblocks of one of the three randomly assigned ADT domains (math,verbal, or spatial; individual ADT samples).

For our first goal, we used confirmatory factor analyses on thecombined sample to compare the fits of a domain-general vs. a domain-specific model. Then, we used the individual ADT samples to ascertainwhether ADT performance differentially predicted academic achieve-ment in specific domains (e.g., is verbal ADT a better predictor ofEnglish GPA than math ADT)?

For our second goal, we analyzed the validity of the new ADTs aswell as replicated findings with the original math ADT. First, we com-pared students' subjective perceptions of boredom, temptation [of thedistractors], and perceived importance of the skill-building activitiesacross the three ADTs. Second, because diligence is expected to be as-sociated with grit and self-control, we collected self-report and teacher-report ratings of these measures to assess the convergent validity of theADTs. Further, since these measures are reliable predictors of academicachievement (Duckworth et al., 2007; Duckworth et al., 2019), we as-certained the incremental predictive validity of the ADT after control-ling for them.

Finally, we tested whether student characteristics (i.e., intelligence,gender, race/ethnicity, and socioeconomic status) predict performanceacross the three ADTs. We would expect intelligence to predict per-formance on the ADTs due to well established links between IQ andperformance on academic tasks (Galla et al., 2019; Rohde & Thompson,

2007), so the strength of the effect is of interest here. Further, givengender differences in self-regulation (Cross, Copping, & Campbell,2011; Duckworth & Seligman, 2006; Else-Quest, Hyde, Goldsmith, &Van Hulle, 2006; Silverman, 2003; Voyer & Voyer, 2014), a componentof diligence (see above), we expected girls to outperform boys on theADTs. Differences in ADT performance across race/ethnicity and so-cioeconomic status would suggest bias and would have implications forits use.

2. Method

All methods were approved by our Institutional Review Board (IRB)and the research was conducted in compliance with APA ethical stan-dards. Students completed the ADT in a school computer lab during a60-minute time window in Spring 2015. The study was delivered onlinevia Qualtrics. The measures included here were part of a larger studydesigned to assess character development in adolescence over a two-year timespan. The study included numerous measures (e.g., mind-fulness, empathy, perspective, belief biases) and only the pertinent onesare discussed here. The ADT was administered towards the end of thesession as we thought it would be more prudent to measure diligencewhen students might be fatigued, when the distractors would be moreappealing.

2.1. Participants

Participants were 439 8th grade students from eight middle schoolsin the United States. Participation in the study was voluntary and wasapproved by the Institutional Review Board.

Participants were randomly assigned to complete shorter versions ofall three ADTs (math, verbal, and spatial; combined ADT sample;n = 113) or an individual ADT (individual ADT samples; mathn = 110, verbal n = 110, spatial n = 106). Due to differences in thecombined vs. individual ADT samples (see below), we treated them asdistinct samples and performed separate analyses accordingly.

Table 1 shows the socio-demographic characteristics of participantsfrom both samples. These data were provided by the school districts.Based on Chi-Square Tests of Independence, random assignment in the

Table 1Demographic characteristics of the combined and individual ADT samples.

Demographicvariable

Combined ADT Individual ADT

n = 113 n = 110 n = 110 n = 106

% Math ADT % VerbalADT %

Spatial ADT%

Gender [female] 48 46 60 45EthnicityAfricanAmerican

45 45 45 42

Asian 5.8 15 10 15Caucasian 30 26 28 26Hispanic 18 12 14 14Multiracial/other

1.0 1.9 1.9 3.0

Free/reducedlunch

56 66 69 55

SchoolSchool 1 31 34 34 33School 2 2.8 1.9 2.8 1.9School 3 31 30 28 32School 4 4.6 5.6 6.4 4.8School 5 3.7 3.7 3.7 4.8School 6 6.4 6.5 6.4 5.8School 7 9.2 7.4 7.3 7.7School 8 11 11 11 12

Note. % represents the percentage of the sample that occurred in each category.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

4

individual ADT sample was mostly successful (χ2, ps > .08), with theexception of a significantly higher percentage of females in the verbalADT compared to the math and spatial ADTs, χ2(2, N = 327) = 8.30,p = .02. We included socio-demographic variables as covariates in theanalyses.

2.2. Measures

2.2.1. The ADTThe ADT has a split-screen interface where students are asked to

perform the skill-building task on the left-half of the screen in thepresence of the games and movies3 on the right-half of the screen(Fig. 1). The distractions on the right-half of the screen include an in-teractive media panel with video clips and games selected to be enticingfor students. The student is faced with a choice: complete the tediousacademic task or indulge in the distractions, but not both.

The “skill-building” problems consisted of one-digit subtraction(e.g., 9–5), addition (e.g., 5 + 3), and multiplication (e.g., 4 × 2)problems4 in random order (math ADT; Fig. 1), incomplete word pro-blems with two-to-seven letters (e.g., “a_ple” – correct answer “p”) inlength akin to spelling/vocabulary problems (verbal ADT; Fig. 2).Students submitted their responses to the math and verbal ADTs byselecting an option from four multiple choice items, of which, one al-ways included the correct response. The spatial ADT used cardinal di-rection problems similar to a navigation task (Fig. 3) and includedadditional audio (~20 s) and text instructions (see Fig. 3). Studentsbegan in the “START” square (whose location changed at random inevery trial) and followed the cardinal direction (e.g., EN for East North)to find the correct square. They submitted their responses by clicking onthe correct square.

In all three tasks, students were unable to change a selected re-sponse and received an “incorrect” message if they answered the pro-blem incorrectly (no message was given if they answered the problemcorrectly), upon which the next problem automatically appeared.Students were informed prior to the tasks that they would find out howmany problems they answered correctly and a message at the end of thetask informed them of their scores (e.g., “You attempted 39 spellingproblems and got 37 correct”).

Students were randomly assigned to complete shorter versions of allthree ADTs (two, 1.5-minute blocks of each ADT, nine minutes total;combined sample) or completed an individual ADT (three, 3-minuteblocks, nine minutes total; individual samples). Students in the com-bined sample completed the spatial, verbal, then math tasks. This orderwas informed by several pilot studies examining fully-counterbalancedorderings, where we discovered that the spatial task was perceived asbeing the most novel and also required the most instructions, poten-tially disrupting the flow of the assessment. For this reason, we chose toadminister the spatial task first. There were no notable differences be-tween the orderings of the verbal and math tasks, so for simplicity, weelected for the verbal task to precede the math task. We note that thedifferent timings and orderings of the combined ADT sample was basedon informed design decisions (see Introduction) and precludes com-parisons with the individual ADTs, which was never the goal.

2.2.2. State measures of boredom, importance, and temptationStudents rated their levels of boredom (e.g., “How bored were you

by the math problems in the last session?”), the importance the task

(e.g., “How important is this spelling task?”), and their levels oftemptation (e.g., “How tempted were you to play games/watch vi-deos?”). Scoring was based on a 5-point Likert scale (1 = not at all,5 = very). Students completed these items after each task for thecombined ADT sample (i.e., after two 1.5 min blocks) and after eachthree-minute block for the individual ADT samples.

2.2.3. Self-report questionnairesStudents self-reported grit using five-items designed to assess pas-

sion and perseverance for long-term goals (e.g., “I finished whatever Istarted”) (Park, Yu, Baelen, Tsukayama, & Duckworth, 2018). Reli-abilities were adequate: combined ADT sample α = 0.72; individualADT samples: α = 0.76 for math, 0.73 for verbal, and 0.73 for spatial.Students self-reported self-control using 10-items measuring the abilityto exert self-control in two separate domains – work self-control (e.g., “Icame to class prepared”) and interpersonal self-control (e.g., “I con-trolled my temper”) (Park, Tsukayama, Goodwin, Patrick, &Duckworth, 2017). Work and interpersonal self-control were correlatedin both samples, combined ADT sample: r(106) = 0.68, 95% CI [0.57,0.77]; individual ADT samples, r(325) = 0.62, 95% CI [0.54, 0.68],which warranted averaging the two. Reliabilities of the composite self-control measure were acceptable: combined ADT sample α = 0.88;individual ADT samples: α = 0.85 for math, 0.87 for verbal, and 0.86for spatial.

2.2.4. Teacher-report questionnairesFifty-one teachers who taught one of the four subject areas (Math,

English, Science, and Social Studies) provided global, single-item rat-ings of grit and self-control for students whom they taught. Specifically,we showed teachers the same items that students completed (see above)and asked them to rate how much those items as a whole (to reduceburden) described each student using a 5-point Likert scale (1 = Not atall like this student to 5 = Very much like this student). This was done toreduce burden on teachers; see Galla et al. (2019) for more details onthis procedure.

Thus, each student had four different teacher ratings (one from theirMath, English, Science, and Social Studies teachers) on each of the threeoutcomes. Mean pairwise correlations across all teacher ratings weremoderate in size (grit Rmean = 0.59, work self-control Rmean = 0.63,interpersonal self-control Rmean = 0.53). The size of these correlationscompare favorably to meta-analytic effect sizes of cross-informant rat-ings (Achenbach, McConaughy, & Howell, 1987; Renk & Phares, 2004).For example, Achenbach et al. (1987) found teacher-teacher correla-tions of 0.64 for ratings of students' behavioral and emotional problems.We averaged teacher ratings to create a single score for each outcome.We then averaged work self-control and interpersonal self-control (r(325) = 0.87, 95% CI [0.84, 0.89]) to create a composite score for eachstudent.

2.2.5. Academic performanceWe collected students' core GPA (average of Math, English, Science,

and Social Studies GPAs) at the end of the Spring semester (same se-mester when the ADT was completed) from school records and z-scorestandardized GPA within each school before combining to accom-modate different grading scales across schools.

2.2.6. Crystallized intelligence (IQ)The previous study by Author (2014) demonstrated incremental

predictive validity of the ADT above fluid intelligence as measured bythe matrix reasoning subtest of the Kaufmann Brief Intelligence Test(Kaufman & Kaufman, 2004). Here, we focus on crystallized in-telligence, which was measured during the Fall semester prior tocompleting the ADT with the Mill Hill Vocabulary Test, Junior versionset A (Raven, Court, & Raven, 1998). This assessment provides parti-cipants with a target word (e.g., cap) and six options (e.g., splash, hat,leg, ball, smoke, mill), and they have to identify the correct option that

3 Due to bandwidth issues at two of the schools, which we identified prior todata collection, a subset of students interacted with a version of the task withonly the games instead of the games and videos (combined ADT sample: 13.3%;individual ADT samples: 14% math, 13% verbal, 13% spatial). Pilot testingindicated no differences among these two forms of distractors.

4 The original math ADT only included subtraction problems. We addedmultiplication and addition problems as a further refinement.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

5

means the same thing as the target (i.e., hat in this case). The task getsprogressively difficult across trials.

2.3. Procedure

Students were introduced to the study by reading the followingcover story:

“We are interested in learning about students your age, especially yourinterests, habits, and beliefs. We have designed this survey for you to geta good snapshot of who you really are, kind of like a survey selfie.”

Students completed demographic questions and the self-reportmeasures and read a cover story explaining the importance of the taskto build foundational skills:

Fig. 1. Screenshots of the math ADT (top) along with the different types of problems (bottom). Students chose to “do work” or “play game or watch movie” at anytime in the task, but cannot do both simultaneously. If “play games or watch movie” is selected, students chose among six options - three videos (e.g., basketballhighlights) and three games (e.g., Tetris).

Fig. 2. Screenshot of the verbal ADT. Students must select the correct response (“n” for “went” in this case).

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

6

“We want to tell you about some interesting new scientific research onstudents your age. New scientific research shows that practicing basicskills, like math facts, spelling, and geography, lead to better academicperformance later on. Even doing simple exercises can improve youracademic skills, which can help you in all areas of your life.”

Students in the combined ADT sample read the cover story oncebefore progressing through the spatial, verbal, and math tasks, andstudents in the individual ADT sample read the cover story once beforecompleting one of the ADTs. We then recorded whether or not studentsbelieved the cover story. We asked, “Why are you going to do theseproblems?” and students chose one of the following answers: (1)“Because I'm in school,” (2) “Practicing basic skills can make mesmarter,” and (3) “My teacher told me to.” Answer option 2 was anindication that the student believed the cover story which was true for

the majority of students (combined ADT sample: 74.3% believed coverstory; individual ADT samples: 70.90% math, 76.36% verbal, and63.81% spatial; χ2(2, N = 327) = 4.04, p = .13). This measure wasused to control whether diligence effects observed could be associatedwith internally- (option 2) versus externally-imposed self-regulation(i.e., compliance effects).

Students in the combined ADT sample were then instructed how toperform the spatial task and students in the individual ADT samplewere instructed how to perform their assigned task. All students com-pleted practice trials for 30 s (without the option to play games/watchmovies). The instructions then reminded students that they were free toclick on the right side of the screen to play games or watch fun videos,yet they were also reminded that the more problems they finished, themore likely their problem-solving abilities would improve. Studentswere informed that they could do whatever they preferred; they were in

Fig. 3. Screenshot of the spatial ADT (top) along with instructions (bottom) provided as both audio and text.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

7

no way obligated to do the problems if they did not want to, and theywould not be punished for playing games or watching videos.

Next, students in the combined ADT sample completed two blocksof each ADT for 1 min and 30 s per block (nine minutes total), pro-gressing through the blocks in the following order: spatial, verbal, thenmath. Students practiced each task for 30 s immediately before com-pleting that particular task type. Students in the individual ADT samplecompleted three blocks of their assigned domain for 3 min per block(nine minutes total).

2.4. Data treatment

Performance on the ADT was measured by the number of problemssolved correctly which we call productivity. For the combined ADTsample, productivity was calculated by first averaging the numbercorrect across the two blocks for each task type (i.e., spatial, verbal,math), z-score standardizing by task type, and then averaging the threez-scores to obtain a single productivity measure. For the individual ADTsamples, productivity was computed by averaging the number correctacross the three blocks and then z-score standardizing within each tasktype.

Author (2014) reported productivity and time on task as outcomes,but we solely focused on productivity for three reasons. First, the twomeasures were strongly correlated in both samples (combined ADTsample: r(106) = 0.94, 95% CI [0.92, 0.96]; individual ADT samples: (r(325) = 0.90, 95% CI [0.88, 0.92])). Second, we consider productivityto be a more germane measure of “work product” compared to time ontask because it distinguishes students who productively engage with thetask by submitting correct answers from those who may only super-ficially engage by selecting options randomly or making a lot of errorsdue to concentration failures. Third, Author (2014) reported a verysimilar pattern of results for both measures, so examining them sepa-rately would be redundant.

2.4.1. Missing dataThe following variables had missing data: crystallized intelligence

(combined ADT: 18% missing, individual ADT: 15% missing), self-re-ported grit and self-control (combined ADT: 4% missing, individualADT: 2% missing), teacher-reported grit and self-control (combinedADT: 8% missing, individual ADT: 4% missing), ADT productivity(combined ADT: 11% spatial, 13% verbal, 19% math, individual ADT:1% missing), and core GPA (combined ADT: 10% missing, individualADT: 8% missing). For the combined ADT sample where we performedconfirmatory factor analyses, we used full information maximum like-lihood (FIML) estimation. For the individual ADT sample, we usedmultiple imputation (Rubin, 2004) to fill in missing observations (ex-cept for race/ethnicity, which was not imputed). We imputed 25 dif-ferent versions of the data sets using bootstrap sampling and predictivemean matching via the “Hmisc” package in R. Based on recommenda-tions from Tabachnick and Fidell (2007), one dataset was selected atrandom for analyses. All analyses were computed with the original andimputed dataset and results remained consistent. Reported results arebased on imputed data, but note sample sizes per analysis slightly differsince race/ethnicity was not imputed.

3. Results

3.1. Preliminary analyses

3.1.1. Internal consistency reliability (both samples)Table 2 shows descriptive statistics among all variables. Cronbach's

α on productivity (the number of correct responses) across the twoblocks in the combined ADT sample showed adequate reliability for themath (α = 0.76) verbal tasks (α = 0.77), though reliability was lowerfor the spatial task (α = 0.53). Cronbach's α across the three blocks inthe individual ADT samples were somewhat stronger (math, α = 0.81;

verbal, α = 0.79, spatial, α = 0.83).

3.1.2. Adequacy of random assignment (individual samples)We performed between-subject ANOVAs to examine the adequacy

of random assignment in the individual ADT samples. There were nosignificant differences (ps > .28) among ADT domains for crystallizedintelligence, grit and self-control (self-report and teacher-report), orGPA, indicating that random assignment was successful.

Students in the individual samples also demonstrated high accuracyrates across ADT domains (92% math, 92% verbal, 86% spatial). Thelower accuracy scores for the spatial ADT was driven by five studentswith< 10% accuracy.5 Median accuracy rates, which are not influ-enced by outliers, were more similar (97% math, 95% verbal, 94%spatial), confirming that the problems were easy to complete in alldomains.

3.1.3. Productivity across time and domains (individual samples)Using the individual ADT samples, a 3 (block) × 3 (domain) mixed-

ANOVA indicated a main effect of block on productivity, F(2,648) = 18.15, p < .001, partial η2 = 0.05, with pairwise comparisonsindicating an expected decrease in productivity over blocks (Blocks 1and 2, p < .001, Blocks 1 and 3, p < .001, Blocks 2 and 3, p = .07;see Fig. 4). There was no block × domain interaction, F(4,648) = 21.92, p = .11, partial η2 = 0.01, but a significant main effectof domain, F(2, 324) = 8.54, p < .001, partial η2 = 0.05. Specifically,math and verbal productivity (p = .78) was higher than spatial pro-ductivity (math> spatial, p = .03; verbal> spatial, p < .001). Wenote that this difference was anticipated because the spatial problemstook slightly more time to complete. However, this is not of concernsince the goal of these analyses was not to study differences among ADTscores across domains, but rather to study consistency of relationshipsacross domains.

3.2. Domain-specificity vs. domain-generality of academic diligence

To examine our first goal regarding the degree of domain-specificityof academic diligence, we used the combined ADT sample data toperform confirmatory factor analyses using procedures motivated byBong (2001). We then used the individual ADT samples to examinewhether ADT performance in each domain differentially predicted GPAfor the respective academic subject.

3.2.1. Confirmatory factor analyses (combined sample)Correlations among the math, verbal, and spatial tasks were large

(rmath_verbal = 0.692, rmath_spatial = 0.639, rverbal_spatial = 0.689, ps <.001). We estimated two structural models (Fig. 5). The first model(Fig. 5a) assumed domain-generality and consisted of one domain-general latent factor on which math, verbal, and spatial performancejointly loaded. This model premises that the three domains share asizable amount of variance through a general latent factor. The secondmodel (Fig. 5b) assumed domain-specificity and consisted of three,correlated domain-specific factors where math, verbal, and spatialperformance loaded on separate factors. It assumes that three higher-order factors are necessary to account for the relationships among do-main-specific factors. The degree of domain-specificity was determinedby examining (1) model fit for both models; (2) a chi-square testcomparing the difference between the two models; and (3) the covar-iance structure of the domain-specific model. Goodness of fit was de-termined by a non-significant Chi-Square Difference Test, a Compara-tive Fit Index (CFI)> 0.90, and a Root Mean Square Error ofApproximation (RMSEA) between 0.05 and 0.08 (Hu & Bentler, 1999).

5 To ensure our results were not influenced by students with low accuracyrates, we ran all analyses with only those who were>50% accurate (n = 311).Results remained consistent when this was done.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

8

The domain-general model exhibited a good fit to the data,χ2(9) = 10.8, p = .29; CFI = 0.99; RMSEA = 0.05, as did the domain-specific model, χ2(6) = 7.48, p= .28; CFI = 0.99; RMSEA= 0.06. Themodels were not significantly different, χ2(3) = 3.32, p = .35, so themore parsimonious (i.e., fewer estimated parameters), domain-generalmodel, was preferred. Finally, the latent math, verbal, and spatial fac-tors in the domain-specific model were all significantly positively cor-related (ps < .001), providing additional support for domain-gen-erality.

3.2.2. Differential predictions of ADT performance on GPA (individualsamples)

We investigated the effect of the ADT on GPAs within correspondingacademic domains. Specifically, we regressed individual GPAs (Math,English, Science, and Social Studies) on math, verbal, and spatial ADTproductivity, respectively (controlling for crystallized intelligence,gender, ethnicity, free/reduced lunch status [as an approximate

measure of a student's socio-economic status], belief in the cover story,6

and school attended; see Table 3 for standardized regression coeffi-cients7). We then compared the effect sizes of each ADT on individualGPAs (e.g., is verbal ADT performance a better predictor of English GPAcompared to Math GPA?). There were no significant differences amongeffect sizes (ps > .22).8 Thus, math ADT productivity did not

Table 2Descriptive statistics for the samples.

Variable Combined ADT Individual ADT

Range M (SD) MathADTM (SD)

VerbalADTM (SD)

SpatialADTM (SD)

Range M (SD) MathADTM (SD)

VerbalADTM (SD)

SpatialADTM (SD)

ADTProductivity 0–53 14 (11.1) 17 (15) 16 (13) 11 (9.4) 0–100 30 (24) 31 (25) 35 (24) 23 (22)Boredom 1–5 3.0 (1.0) 3.0 (1.2) 3.0 (1.2) 2.8 (1.1) 1–5 3.1 (1.1) 3.1 (1.2) 3.1 (1.0) 3.2 (1.1)Importance 1–5 3.1 (1.0) 3.1 (1.2) 2.9 (1.2) 3.1 (1.1) 1–5 3.1 (1.0) 3.0 (1.1) 3.2 (1.0) 2.9 (1.0)Temptation 1–5 3.2 (1.1) 3.3 (1.3) 3.3 (1.2) 3.0 (1.2) 1–5 3.1 (1.1) 3.1 (1.1) 3.1 (1.2) 3.0 (1.2)

Crystallized Intl. 5–31 18 (4.2) 0–29 18 (4.3) 17.5 (4.4) 18 (4.1) 17 (4.5)GritSelf-reported 2.2–5 3.9 (0.6) 2–5 3.8 (0.7) 3.8 (0.7) 3.8 (0.7) 3.8 (0.6)Teacher-reported 1.3–5 3.3 (1.0) 1–5 3.4 (0.9) 3.4 (1.0) 3.3 (0.8) 3.4 (0.9)

Self-controlSelf-reported 1.3–5 3.7 (0.7) 2–5 3.7 (0.7) 3.6 (0.6) 3.7 (0.7) 3.7 (0.7)Teacher-reported 1.3–5 3.4 (1.0) 1–5 3.5 (0.9) 3.4 (1.0) 3.5 (0.9) 3.6 (1.0)

Note. Descriptives for GPA are not included due to differences in scales across schools. Intl. = Intelligence.

Fig. 4. Productivity decreased over time for all ADT domains using the Individual samples. There were no differences between tasks in this rate of decrease.

6 We included believing in the cover story as a covariate to control for ex-ternal self-regulation (i.e., compliance effects for students endorsing “BecauseI'm in school,” or “My teacher told me to.” [scored as a 0]) vs. internal self-regulation (“Practicing basic skills can make me smarter.” [scored as a 1]). Thisis also consistent with Author (2014).

7 The full regression models are shown in the Supplemental materials.8 Regression coefficients were compared using the “multcomp” package in R

for testing the equality of regression coefficients with the same independent,but different dependent variables. Although not germane to our central aim, wecompared coefficients across the Spatial ADT task upon discovering the sizeable

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

9

a. One-factor model (domain-general). Values to indicators are standardized path

coefficients. Indicators represent number correct during two blocks of math (M1, M2),

verbal (V1, V2), and spatial (S1, S2) ADTs. Values below indicators represent variance

estimates. Model fit: CFI = .99, RMSEA = .05.

b. Three-factor model (domain-specific). Values connecting latent factors and values to

indicators are standardized estimates. Model fit: CFI = .99, RMSEA = .06.

Fig. 5. Domain-general (a) and domain-specific (b) models using the combined sample.

(footnote continued)differences among beta coefficients (e.g., English GPA β = 0.17, Science GPA β= 0.41). None of the coefficients significantly differed (ps > .07).

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

10

differentially predict Math GPA (relative to other subject domains), nordid the verbal ADT differentially predict English GPA, which providesmore evidence for domain-generality.

3.3. Validating the new ADTs (individual ADT samples)

We now turn to analyses aimed to validate the math, verbal, andspatial ADTs, which were all done on the individual ADT samples.Appendix A provides zero-order correlations between ADT performanceand all key variables.

3.3.1. Boredom, importance, and temptation during the ADTsWe first examined the subjective experience of students while

completing the ADTs. Three 3 (block) × 3 (domain) mixed-modelANOVAs indicated no main effect of academic domain on boredom(p = .63), importance (p = .19), or temptation (p = .97). There was amain effect of block on boredom (F(2, 648) = 6.92, p = .001, partialη2 = 0.02) and importance (F(2, 648) = 10.38, p < .001, partialη2 = 0.03), but no block × domain interactions (ps > .57). As timeprogressed, students reported greater boredom (Block 1 < Block 3,p < .001) and rated the tasks as less important (Block 1 < Block 3,p < .001) irrespective of domain. There was a main effect of block ontemptation, F(2, 648) = 5.98, p = .003, partial η2 = 0.02, but also amarginal interaction effect, F(4, 648) = 2.28, p = .06, partialη2 = 0.02 (Fig. 6). Specifically, the temptation to disengage fromacademic work significantly increased over time for math (Block1 < Block 3, p = .001) but not for the verbal or spatial domains(ps > .36).

3.3.2. Associations between student characteristics and productivityWe regressed productivity of the three individual ADTs on crystal-

lized intelligence, gender, ethnicity, and free/reduced lunch status.School and believing the cover story were also included as controlvariables. As expected, we found that crystallized intelligence predictedproductivity for the math and verbal (p < .01), but not the spatial,ADT (p= .21). As predicted, girls had higher productivity scores for theverbal (p < .01) and math ADT (p = .09), but not the spatial ADT(p = .311). Race/ethnicity and free/reduced lunch states were non-significant predictors of productivity (see Table 4).

3.3.3. Incremental predictive validityWe regressed core GPA, computed as the average of Math, English,

Science, and Social Studies GPAs, on ADT productivity after controlling

for the same six covariates. We ran an additional model that also con-trolled for self-reports of grit and self-control. We found that ADTperformance in all three domains significantly predicted GPA even aftercontrolling for self-reported grit and self-control (Table 5).

3.3.4. Convergent validityWe regressed self- and teacher-reports of grit and self-control on

ADT scores in each domain after controlling for the six covariates (seeTable 6). We found that math ADT productivity positively predictedboth self- and teacher-reports of grit and self-control, but verbal pro-ductivity only predicted teacher-reports of grit and self-control. Therewas no relationship between spatial ADT performance and self- orteacher-reports of grit and self-control.

We further explored the lack of an association between verbal ADTperformance and self-reported grit and self-control. Given the earlierfinding that girls scored higher on the math and verbal ADTs, which isconsistent with previous research on gender differences in self-regula-tion (Cross et al., 2011; Duckworth & Seligman, 2006; Else-Quest et al.,2006; Silverman, 2003; Voyer & Voyer, 2014), we investigated ifgender interacted with ADT performance in predicting self-reported gritand self-control. This was indeed the case for self-reported grit, wherewe found a significant interaction for the verbal ADT (p = .005); theinteraction was non-significant (ps > .56) for the math and spatialADTs. Simple slopes analyses revealed that verbal ADT performancewas significantly correlated with self-reported grit for boys (B = 0.307,p = .02), but not for girls (B = −0.154, p = .14). There were nointeractions between gender and ADT performance on any domainwhen predicting self-reported self-control (ps > .81). Thus, gendereffect might partially explain the lack of convergent validity with verbalADT performance and self-reports of grit.

4. Discussion

Is academic diligence domain-specific or domain-general duringmiddle school? We addressed this question in the context of a perfor-mance-based measure of diligence called the Academic Diligence Task,which was originally validated for high school students in the mathdomain Author (2014). We developed and tested two new versions ofthe ADT (verbal and spatial) to address this question. We also validatedthese new performance measures of diligence and investigated gen-eralizability and replicability of the original math ADT.

4.1. Main findings

We presented competing hypotheses on the domain-specificity ordomain-generality of diligence based on related psychological processessuch as self-control, impulsivity, boredom, and boredom regulation (seeIntroduction). We then tested these hypotheses using two samples ofstudents in 8th grade from several U.S. schools. Table 7 provides asummary of the results.

The results supported a domain-general interpretation of academicdiligence. First, confirmatory factor analyses on the combined samplethat completed all three ADTs provided preliminary (but non definitive)support for a domain-general factor structure. Specifically, diligence onmath, verbal, and spatial ADTs jointly loaded on one, domain-generaldiligence factor, and ADT performance was strongly correlated acrossdomains. Next, regression models on the individual sample ADTs re-vealed that diligence within specific domains (e.g., math ADT perfor-mance) did not differentially predict academic achievement within thecorresponding domain (e.g., math GPA) more so than the other domains(e.g., English GPA). Rather, the math and spatial ADTs were generallybetter predictors of GPA in all four academic subjects than the verbalADT, providing more evidence for domain-generality. When taken to-gether, the data suggest that academic diligence, as assessed by theADT, appears to be domain-general.

What about the arguments in favor of a domain-specific view of

Table 3Regression analyses examining unique predictive validity of ADT productivityon individual subject GPAs in the individual samples.

Predictors Outcomes: individual subject GPAs

Math English Science Socialstudies

β β β β

Math ADTproductivity

0.30⁎⁎ 0.34⁎⁎ 0.31⁎⁎ 0.36⁎⁎⁎

Verbal ADTproductivity

0.21(p = .06)

0.23⁎ 0.19(p = .07)

0.21 (p = .07)

Spatial ADTproductivity

0.35⁎⁎ 0.17(p = .15)

0.41⁎⁎⁎ 0.27⁎

Note. Nonsignificant p-values in parentheses. Analyses represent the uniquepredictive power of ADT performance for individual GPAs after controlling forcovariates (crystallized intelligence, gender, ethnicity, free/reduced lunchstatus, belief in cover story, and school affiliation).

⁎ p < .05.⁎⁎ p < .01.⁎⁎⁎ p < .001.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

11

academic diligence? One argument was that key psychological pro-cesses implicated in being diligent, such as self-control, boredom, andmotivational orientations appear to be domain-specific. This raises thequestion of what constitutes a domain. In particular, studies whichsuggest that self-control and impulsivity are domain-specific construedomains rather broadly, for example distinguishing self-control in fi-nances vs. work habits (Duckworth & Tsukayama, 2015). However, thepresent investigation is considered with more nuanced domain-differ-ences, such as self-control in specific academic subjects, where domain-differences might be less pronounced. More similar to the present study,there is data to support a domain-specific patterning in boredom andmotivation for different academic subjects (Bong, 2001; Daschmannet al., 2011; Goetz et al., 2007). An important aspect of all these studies,however, is that they relied on self-reports to measure the focal con-structs unlike the behavioral measure utilized here. The fact that self-reports and behavioral measures are typically only weakly correlated

(e.g., Duckworth & Kern, 2011; McHugh et al., 2011; Sharma, Markon,& Clark, 2014) (for example, Duckworth and Kern (2011) report ameta-analytic correlations of 0.11 between delay of gratification tasksand self-reports of self-control), might explain why we did not finddomain-specific patterns in the present study. Further research isneeded to ascertain whether a domain-general view of diligence wouldemerge if a different measurement approach (e.g., self- or informant-reports) was used.

Our second goal was to validate the ADTs for middle school stu-dents. We used the individual ADT samples for this purpose. We found

Fig. 6. Temptation to disengage from academic work significantly increased over time in the math ADT (Block 1 < Block 3, p = .001), but not in the spatial orverbal ADTs. These analyses used the Individual samples.

Table 4Regression analyses examining crystallized intelligence and sociodemographicsas predictors of ADT productivity.

Predictors Outcomes: individual ADT productivity

Math Verbal Spatial

β β β

Crystallized intelligence 0.26⁎ 0.27⁎ 0.13 (p = .21)Gender [male] −0.16 (p = .09) −0.28⁎⁎ 0.11 (p = .31)Asian 0.09 −0.06 0.06Caucasian 0.17 0.19 0.10Hispanic −0.04 −0.06 0.37Multi racial/other 0.01 −0.02 −0.15Receives FR lunch 0.10 −0.11 −0.12

Note. Analyses control for belief in cover story, and school affiliation.⁎ p < .05.⁎⁎ p < .01.

Table 5Regression analyses examining incremental predictive validity of ADT pro-ductivity on core GPA using the individual samples.

Predictors Outcome: core GPA

Math ADT(n = 106)

Verbal ADT(n = 106)

Spatial ADT(n = 100)

β β β

Covariates + ADTproductivity only

ADT productivity 0.37⁎⁎⁎ 0.25⁎ 0.35⁎⁎

With grit and self-controladded

Self-reported grit 0.15 (p = .15) 0.18 (p = .11) 0.19 (p = .13)Self-reported self-control 0.00 (p = .99) 0.09 (p = .42) 0.06 (p = .66)ADT productivity 0.33⁎⁎ 0.22⁎ 0.34⁎⁎

Note. Nonsignificant p-value in parentheses. Analyses represent the incrementalpredictive power of ADT performance for GPA after controlling for covariates(crystallized intelligence, gender, ethnicity, free/reduced lunch status, belief incover story, and school affiliation) and also self-reported grit and self-control(bottom).

⁎ p < .05.⁎⁎ p < .01.⁎⁎⁎ p < .001.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

12

expected patterns with respect to increased boredom and lower per-ceptions of task importance across blocks for all three ADTs, but weresurprised to find that the temptation to disengage from the problemsonly increased for the math ADT (more on this below).We also foundthat all three ADTs predicted core GPA even after controlling for self-reports of grit and self-control. Results were less clear for the analysesof convergent validity. The math ADT scores showed positive re-lationships with self- and teacher-reports of grit and self-control,thereby replicating results from Author (2014) for self-reports; thisprevious study did not include teacher-reports. Spatial ADT scores didnot correlate with either teacher- or self-reports, suggesting that thisnovel task unlike any academic subject might be assessing differentpsychological processes.

But why did scores on the verbal ADT, which was designed to reflectschoolwork in an English class, only correlate with teacher-, but not self-reports, of grit and self-control? Our results suggest two possible ex-planations. The first has to do with students' subjective perceptions ofthe three ADTs. As noted above, only the math ADT showed a

significant increase in temptation to disengage from the problemsacross blocks. This suggests that the math, but not the verbal (andspatial) ADTs were more successful at activating cognitive schemasassociated with tedium and effort. Second, there might be gender ef-fects at play due to known gender differences in self-control (Crosset al., 2011; Duckworth & Seligman, 2006; Else-Quest et al., 2006) andrestraint (Silverman, 2003). Indeed, not only did we find that girlsscored higher on the math and verbal ADTs, the gender effect washigher for the verbal (β = 0.28) compared to the math ADT (β = 0.16).Further, verbal ADT performance did predict self-reported grit (but notself-control), but only for boys. This suggests that the verbal ADT wasmore likely to activate schemas of tedium/effort for boys, but there wasa buffering effect for girls, echoing findings from a recent meta-analysisthat found that the female advantage in academic achievement waslargest for language courses and smallest for math courses (Voyer &Voyer, 2014).

Aside from this inconsistency, the overall pattern of results sum-marized in Table 7 suggest that both the math and verbal ADTs appearto be valid measures of academic diligence. The spatial ADT was mainlydeveloped as a non-academic alternative to the math and verbal ADTs.Whereas performance on this task was the least associated with crys-tallized intelligence and gender, and it predicted GPA, the lack ofconvergence with either self- or teacher- reported measures of grit andself-control suggest that it is likely tapping additional psychologicalprocesses beyond diligence. Thus, we would not recommend using thespatial ADT to assess diligence.

4.2. Research implications

Our results have implications for the measurement of diligence.Performance on all three ADTs significantly predicted overall GPA afteraccounting for crystallized intelligence, socio-demographics, schoolaffiliation, and even self-reports of grit and self-control. Further, stu-dents' race/ethnicity and free/reduced lunch status (a conveniencemeasure of socioeconomic status) did not predict ADT performance,suggesting fairness of the assessments with respect to these dimensions.Although crystallized intelligence predicted scores on all three ADTs(with a non-significant trend for the spatial ADT), this is an expectedfinding due to the academic nature of the task and well established linksbetween IQ and academic achievement (Galla et al., 2019; Rohde &Thompson, 2007). Thus, the ADT, a brief objective measure of in-dividual differences in academic diligence, can predict valuedachievement outcomes and can complement self-report measures ofother pertinent constructs.

But should academic diligence be measured in a domain-general ordomain-specific way? Our results suggest that academic diligence, asassessed by the ADT, appears to be a domain-general construct duringthe middle school years. Thus, although domain-general measures maybe justified, it is important to keep in mind that to appropriately assessa psychological construct, it is vital to examine whether the cognitiveprocesses implicated by the measure operate similarly across academicdomains. Here, we have found that self-reported self-regulatory pro-cesses appear to be more critical for math compared to verbal ADTscores, ostensibly due to different schemas about effort and due topotential gender effects. The implication is that a measure that showsconvergent validity in one domain may not show the same pattern inanother for all students, so rather than ask whether a measure is valid,one might ask, “valid in what domain and for whom?”.

There is also the question of which ADT is best suited for themeasurement of academic diligence? Overall, the math ADT was abetter predictor of GPA than the verbal ADT, while also demonstratingconvergent validity with both self- and teacher-reports of grit and self-control for both genders. Its performance was also less influenced bygender compared to the verbal ADT. The present study that tested themath ADT on middle schoolers also replicates the original validationstudy on high schoolers (Author, 2014). Thus, the math ADT might be

Table 6Regression analyses examining convergent validity between ADT productivityand self- and teacher-reported grit and self-control using the individual sam-ples.

Outcome Predictor: ADT productivity

Math ADT(n = 106)

Verbal ADT(n = 106)

Spatial ADT(n = 100)

β β β

GritSelf-report 0.28⁎ 0.05 (p = .71) 0.02 (p = .84)Teacher-report 0.44⁎⁎⁎ 0.34⁎⁎ 0.20 (p = .08)

Self-controlSelf-report 0.37⁎ 0.14 (p = .37) −0.07 (p = .72)Teacher-report 0.38⁎⁎⁎ 0.31⁎⁎ 0.16 (p = .14)

Note. Nonsignificant p-values in parentheses. We ran separate regression modelsfor each of the four outcomes as dependent variables and performance on eachof the three ADT domains as predictors (12 models total). The models controlfor crystallized intelligence, gender, ethnicity, free/reduced lunch status, beliefin cover story, and school affiliation.

⁎ p < .05.⁎⁎ p < .01.⁎⁎⁎ p < .001.

Table 7Summary of main findings from Tables 3–6.

Dependent variable Independent variable

ADT productivity

Math ADT Verbal ADT Spatial ADT

GPAMath + +a +English + +Science + +a +Social studies + +a +Core GPA + + +

Self-reportsGrit + +for boys

Self-control +Teacher-reportsGrit + + +a

Self-control + +

Note. +/− indicate that ADT productivity is a significant(p < .05; ap < .07)positive/negative predictor of the dependent variable after controlling forcovariates (crystallized intelligence, gender, ethnicity, free/reduced lunchstatus, belief in cover story, and school affiliation). Blank cells indicate nosignificant relationship.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

13

the best overall performance measure of diligence followed by theverbal ADT. Both measures can be used if multiple assessments areneeded as in pre-post designs.

Our results also have implications for interventions aimed at im-proving academic diligence. The so-called “wise intervention” strate-gies which precisely target underlying psychological processes (Walton,2014) might be particularly effective in improving diligence. This ap-proach would not target diligence directly, but would focus on itspsychological levers, including promoting a growth mindset (Yeageret al., 2019), increasing motivation (Yeager et al., 2014), and en-couraging deliberate practice (Eskreis-Winkler et al., 2016). Given thatindividual differences in diligence appear to be domain-general, inter-ventions could target diligence across domains rather than focusing ona target domain. The math and verbal ADTs can also be useful to assesschanges in diligence if one is administered prior to the intervention andthe other post intervention with appropriate counterbalancing.

4.3. Limitations and future research

Like all studies, ours has limitations. First, a few of the results wereonly marginally significant, suggesting that our sample size might havelacked adequate statistical power to detect some of the smaller effect(e.g., verbal ADT as a predictor of Math GPA, p = .06). Thus, replica-tion with a larger sample is warranted. Second, because the currentinvestigation only included measurements completed across a shortperiod of time, longer timescales are needed to fully study the pre-dictive validity of the ADT. In particular, the first study of the mathADT with high-school seniors showed it predicted later college enroll-ment status (Author, 2014); this should be replicated with the verbaland spatial ADT. We also did not develop ADTs for other academicsubjects including science and social studies, so we cannot comment onthe domain-generality of diligence or the use of the ADT for those do-mains. Another limitation is the ADT does not simultaneously allowstudents to engage with both the work task and the distractors, whereasthis is what students can, and likely do, while engaging in academicwork both in and out of the classroom (Flanigan & Babchuk, 2015; VanDer Schuur, Baumgartner, Sumter, & Valkenburg, 2015). Although thebetween-teacher correlations for grit and self-control were consistentwith prior meta-analyses on cross-informant reports (Achenbach et al.,1987), our use of single-item measures is a limitation. Finally, we didnot measure student motivation in each of our domains so are unable toassess the incremental predictive validity of the ADTs after controllingfor motivational orientations. The original study on the math ADT

(Author, 2014) did find that the measure predicted GPA and othervalued outcomes after controlling for students' attitudes towards math,so we have reason to think that a similar effect would be observed here.We also did not collect self-reports of grit and self-control for specificacademic domains. This information would have been useful to ascer-tain if students in our sample did indeed report exerting more self-control while doing their math compared to English homework, whichwould shed some light on the differences in convergent validity resultswith the math and verbal ADTs.

One obvious step for the future is to address the above limitationswith subsequent research. It is also pertinent to note that the degree ofdomain-specificity of academic diligence may change with age.Executive functions and cognitive abilities develop as children progressthrough adolescence (Best, Miller, & Jones, 2009; Blakemore &Choudhury, 2006; Steinberg, 2005), along with greater metacognitiveskills allowing them to become aware of their own strengths, weak-nesses, and interests more broadly (Kuhn, 2000). Diligence in subjectareas may show greater domain-specificity as students develop meta-cognitive awareness and develop longer-term goals (e.g., a high schoolsenior planning to major in computer science when she starts her un-dergraduate degree) and as cognitive schemas become more en-trenched. Future studies should examine whether the domain-gen-erality of academic diligence is observed during high-school years.Higher levels of domain-specificity in self-efficacy have been found forhigh school students compared to middle school students (Bong, 2001),suggesting that psychological constructs can change in the degree ofdomain-specificity as a function of age.

5. Conclusion

In summary, our study provides evidence for the domain-generalityof academic diligence during the middle school years. Additionally,results demonstrate the validity of the ADT to assess academic diligencewith middle school students in school settings. The ADT is an objectivemeasure of diligence that is relatively easy to administer, requires<15 min to complete, has good external validity, adequate convergentvalidity (depending on domain), and predicts valued academic out-comes net of sociodemographics, crystallized intelligence, and self-re-ported grit and self-control. Indeed, students who are able to remaindiligent in the short-term, as reflected in the seemingly trivial moment-by-moment choices they make while engaging in the ADT, may reapsubstantial benefits in the long-term.

Appendix A. Zero-order correlations between measured variables and productivity separated by domain using the individual ADT samples

Variable Productivity

Math ADT(n = 110)

Verbal ADT(n = 111)

Spatial ADT(n = 106)

ADT ratingsBoredom 0.01 0.11 0.06Importance 0.20⁎ 0.18 0.07Temptation −0.42⁎⁎⁎ −0.52⁎⁎⁎ −0.38⁎⁎⁎

Crystallized intelligence 0.40⁎⁎⁎ 0.32⁎⁎ 0.39⁎⁎⁎

GritSelf-reported 0.32⁎⁎ −0.03 0.13Teacher-reported 0.54⁎⁎⁎ 0.31⁎⁎ 0.32⁎⁎

Self-controlSelf-reported 0.43⁎⁎⁎ 0.09 0.13Teacher-reported 0.49⁎⁎⁎ 0.30⁎⁎ 0.35⁎⁎⁎

Core GPA 0.41⁎⁎⁎ 0.17 0.37⁎⁎

Note.⁎ p < .05.⁎⁎ p < .01.⁎⁎⁎ p < .001.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

14

Appendix B. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.lindif.2020.101870.

References

Acee, T. W., Kim, H., Kim, H. J., Kim, J. I., Chu, H. N. R., Kim, M., ... Wicker, F. W. (2010).Academic boredom in under-and over-challenging situations. ContemporaryEducational Psychology, 35(1), 17–27.

Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent beha-vioral and emotional problems: Implications of cross-informant correlations for si-tuational specificity. Psychological Bulletin, 101(2), 213–232.

Ariga, A., & Lleras, A. (2011). Brief and rare mental “breaks” keep you focused:Deactivation and reactivation of task goals preempt vigilance decrements. Cognition,118(3), 439–443.

Ashcraft, M. H. (2002). Math anxiety: Personal, educational, and cognitive consequences.Current Directions in Psychological Science, 11(5), 181–185.

Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. NewYork, NY: Cambridge University Press.

Battle, E. S. (1965). Motivational determinants of academic task persistence. Journal ofPersonality and Social Psychology, 2(2), 209–218.

Baumeister, R. F., Heatherton, T. F., & Tice, D. M. (1994). Losing control: How and whypeople fail at self-regulation. San Diego, CA: Academic Press.

Best, J. R., Miller, P. H., & Jones, L. L. (2009). Executive functions after age 5: Changesand correlates. Developmental Review, 29(3), 180–200.

Blakemore, S. J., & Choudhury, S. (2006). Development of the adolescent brain:Implications for executive function and social cognition. Journal of Child Psychologyand Psychiatry, 47(3–4), 296–312.

Bong, M. (2001). Between-and within-domain relations of academic motivation amongmiddle and high school students: Self-efficacy, task value, and achievement goals.Journal of Educational Psychology, 93(1), 23.

Brunner, M., Keller, U., Dierendonck, C., Reichert, M., Ugen, S., Fischbach, A., & Martin,R. (2010). The structure of academic self-concepts revisited: The nested Marsh/Shavelson model. Journal of Educational Psychology, 102(4), 964–981.

Cross, C. P., Copping, L. T., & Campbell, A. (2011). Sex differences in impulsivity: A meta-analysis. Psychological Bulletin, 137(1), 97–130.

Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco, CA: Jossey-Bass.CSM (2015). The Common Sense Census: Media use by tweens and teens. San Francisco:

Common Sense Media.Curci, A., Lanciano, T., Soleti, E., & Rimé, B. (2013). Negative emotional experiences

arouse rumination and affect working memory capacity. Emotion, 13(5), 867–880.Danckert, J., & Merrifield, C. (2018). Boredom, sustained attention and the default mode

network. Experimental Brain Research, 236(9), 2507–2518.Daschmann, E. C., Goetz, T., & Stupnisky, R. H. (2011). Testing the predictors of boredom

at school: Development and validation of the precursors to boredom scales. BritishJournal of Educational Psychology, 81(3), 421–440.

DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10aspects of the Big Five. Journal of Personality and Social Psychology, 93(5), 880–896.

Duckworth, A., & Gross, J. J. (2014). Self-control and grit: Related but separable de-terminants of success. Current Directions in Psychological Science, 23(5), 319–325.

Duckworth, A., Taxer, J., Eskreis-Winkler, L., Galla, B., & Gross, G. (2019). Self-controland academic achievement. Annual Review of Psychology, 70, 373–399.

Duckworth, A. L., & Kern, M. L. (2011). A meta-analysis of the convergent validity of self-control measures. Journal of Research in Personality, 45(3), 259–268.

Duckworth, A. L., Peterson, C., Matthews, M. D., & Kelly, D. R. (2007). Grit: Perseveranceand passion for long-term goals. Journal of Personality and Social Psychology, 92(6),1087.

Duckworth, A. L., & Seligman, M. E. (2006). Self-discipline gives girls the edge: Gender inself-discipline, grades, and achievement test scores. Journal of Educational Psychology,98(1), 198–208.

Duckworth, A. L., & Tsukayama, E. (2015). Domain-specificity in self-control. In C. Miller,R. M. Furr, A. Knobel, & W. Fleeson (Eds.). Character: New directions from philosophy,psychology, and theology (pp. 393–411). New York, NY: Oxford University Press.

Duckworth, A. L., White, R. E., Matteucci, A. J., Shearer, A., & Gross, J. J. (2016). A stitchin time: Strategic self-control in high school and college students. Journal ofEducational Psychology, 108(3), 329–341.

Duckworth, A. L., & Yeager, D. S. (2015). Measurement matters: Assessing personalqualities other than cognitive ability for educational purposes. Educational Researcher,44(4), 237–251.

Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist,41(10), 1040–1048.

Eastwood, J. D., Frischen, A., Fenske, M. J., & Smilek, D. (2012). The unengaged mind:Defining boredom in terms of attention. Perspectives on Psychological Science, 7(5),482–495.

Eisenberger, R., Kuhlman, D. M., & Cotterell, N. (1992). Effects of social values, efforttraining, and goal structure on task persistence. Journal of Research in Personality,26(3), 258–272.

Else-Quest, N. M., Hyde, J. S., Goldsmith, H. H., & Van Hulle, C. A. (2006). Gender dif-ferences in temperament: A meta-analysis. Psychological Bulletin, 132(1), 33–72.

Eskreis-Winkler, L., Shulman, E. P., Young, V., Tsukayama, E., Brunwasser, S. M., &Duckworth, A. L. (2016). Using wise interventions to motivate deliberate practice.Journal of Personality and Social Psychology, 111(5), 728–744.

Flanigan, A. E., & Babchuk, W. A. (2015). Social media as academic quicksand: A phe-nomenological study of student experiences in and out of the classroom. Learning andIndividual Differences, 44, 40–45.

Fröjd, S. A., Nissinen, E. S., Pelkonen, M. U., Marttunen, M. J., Koivisto, A.-M., & Kaltiala-Heino, R. (2008). Depression and school performance in middle adolescent boys andgirls. Journal of Adolescence, 31(4), 485–498.

Galla, B. M., Plummer, B. D., White, R. E., D'Meketo, D, Mello, S. K, & Duckworth, A. L.(2014). The Academic Diligence Task (ADT): Assessing individual differences in ef-fort on tedious but important schoolwork. Contemporary Educational Psychology,39(4), 314–325.

Galla, B., Shulman, L., Plummer, B., Gardner, M., Hutt, S. J., Goyer, J. P., ... Duckworth,A. (2019). Why high school grades are better predictors of on-time college graduationthan are admissions test scores: The role of self-regulation and cognitive ability.American Education Research Journal, 56(6), 2077–2115.

Gaspard, H., Häfner, I., Parrisius, C., Trautwein, U., & Nagengast, B. (2017). Assessingtask values in five subjects during secondary school: Measurement structure andmean level differences across grade level, gender, and academic subject.Contemporary Educational Psychology, 48, 67–84.

Goetz, T., Frenzel, A. C., Pekrun, R., Hall, N. C., & Lüdtke, O. (2007). Between-and within-domain relations of students’ academic emotions. Journal of Educational Psychology,99(4), 715–733.

Gollwitzer, P. M., & Schaal, B. (1998). Metacognition in action: The importance of im-plementation intentions. Personality and Social Psychology Review, 2(2), 124–136.

Hartshorne, H., & May, M. A. (1929). Studies in the nature of character: Volume II: Studies inself-control. Vol. 2. New York: McMillan.

Hofmann, W., Baumeister, R. F., Förster, G., & Vohs, K. D. (2012). Everyday temptations:An experience sampling study of desire, conflict, and self-control. Journal ofPersonality and Social Psychology, 102(6), 1318–1335.

Hu, L.t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structureanalysis: Conventional criteria versus new alternatives. Structural Equation Modeling:A Multidisciplinary Journal, 6(1), 1–55.

Hunter, A., & Eastwood, J. D. (2018). Does state boredom cause failures of attention?Examining the relations between trait boredom, state boredom, and sustained at-tention. Experimental Brain Research, 236(9), 2483–2492.

Jackson, J. J., & Roberts, B. W. (2017). Conscientiousness. In T. A. Widiger (Ed.). TheOxford handbook of the five factor model (pp. 133–151). (New York).

Kaufman, A. S., & Kaufman, N. L. (2004). (KBIT-2) Kaufman Brief Intelligence Test (2nded.). . from https://www.wpspublish.com/store/p/2830/kbit-2-kaufman-brief-intelligence-test-second-edition.

Keller, U., Strobel, A., Martin, R., & Preckel, F. (2019). Domain-specificity of need forcognition among high school students. European Journal of Psychological Assessment,35, 607–616.

Klein, K., & Boals, A. (2001). The relationship of life event stress and working memorycapacity. Applied Cognitive Psychology: The Official Journal of the Society for AppliedResearch in Memory and Cognition, 15(5), 565–579.

Kuhn, D. (2000). Metacognitive development. Current Directions in Psychological Science,9(5), 178–181.

Larson, R. W., & Richards, M. H. (1991). Boredom in the middle school years - Blamingschools versus blaming students. American Journal of Education, 99(4), 418–443.

Linnenbrink-Garcia, L., Durik, A. M., Conley, A. M., Barron, K. E., Tauer, J. M.,Karabenick, S. A., & Harackiewicz, J. M. (2010). Measuring situational interest inacademic domains. Educational and Psychological Measurement, 70(4), 647–671.

Macklem, G. L. (2015). Boredom in the classroom: Addressing student motivation, self-reg-ulation, and engagement in learning. New York: Springer.

Marsh, H. W. (1990). A multidimensional, hierarchical model of self-concept: Theoreticaland empirical justification. Educational Psychology Review, 2(2), 77–172.

McHugh, R. K., Daughters, S. B., Lejuez, C. W., Murray, H. W., Hearon, B. A., Gorka, S. M.,& Otto, M. W. (2011). Shared variance among self-report and behavioral measures ofdistress intolerance. Cognitive Therapy and Research, 35(3), 266–275.

Meindl, P., Yu, A., Galla, B. M., Quirk, A., Haeck, C., Goyer, P., ... Duckworth, A. (2019).No pain, no gain: A brief behavioral measure of frustration tolerance predicts aca-demic achievement two years later. Emotion, 19(6), 1081–1092.

Nett, U. E., Goetz, T., & Hall, N. C. (2011). Coping with boredom in school: An experiencesampling perspective. Contemporary Educational Psychology, 36(1), 49–59.

Owens, M., Stevenson, J., Hadwin, J. A., & Norgate, R. (2012). Anxiety and depression inacademic performance: An exploration of the mediating factors of worry and workingmemory. School Psychology International, 33(4), 433–449.

Park, D., Tsukayama, E., Goodwin, G. P., Patrick, S., & Duckworth, A. L. (2017). A tri-partite taxonomy of character: Evidence for intrapersonal, interpersonal, and in-tellectual competencies in children. Contemporary Educational Psychology, 48, 16–27.

Park, D., Yu, A., Baelen, R. N., Tsukayama, E., & Duckworth, A. L. (2018). Fostering grit:Perceived school goal-structure predicts growth in grit and grades. ContemporaryEducational Psychology, 55, 120–128.

Parks-Stamm, E. J., Gollwitzer, P., & Oettingen, G. (2010). Implementation intentions andtest anxiety: Shielding academic performance from distraction. Learning andIndividual Differences, 20, 30–33.

Patterson, C. J., & Carter, D. B. (1979). Attentional determinants of children’s self-controlin waiting and working situations. Child Development, 50(1), 272–275.

Patterson, C. J., & Mischel, W. (1975). Plans to resist distraction. Developmental

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

15

Psychology, 11(3), 369–378.Patterson, C. J., & Mischel, W. (1976). Effects of temptation-inhibiting and task-facil-

itating plans on self-control. Journal of Personality and Social Psychology, 33(2),209–217.

Peake, P. K., Hebl, M., & Mischel, W. (2002). Strategic attention deployment for delay ofgratification in working and waiting situations. Developmental Psychology, 38(2),313–326.

Pekrun, R., Goetz, T., Daniels, L., Stupnisky, R. H., & Perry, R. (2010). Boredom inachievement settings: Exploring control–value antecedents and performance out-comes of a neglected emotion. Journal of Educational Psychology, 102(3), 531–549.

Pekrun, R., Hall, N. C., Goetz, T., & Perry, R. P. (2014). Boredom and academicachievement: Testing a model of reciprocal causation. Journal of EducationalPsychology, 106(3), 696–710.

Putwain, D. W., Becker, S., Symes, W., & Pekrun, R. (2018). Reciprocal relations betweenstudents’ academic enjoyment, boredom, and achievement over time. Learning andInstruction, 54, 73–81.

Raven, J. C., Court, J. H., & Raven, J. (1998). Mill Hill vocabulary scale. Oxford, UK:Oxford Psychologists Press.

Renk, K., & Phares, V. (2004). Cross-informant ratings of social competence in childrenand adolescents. Clinical Psychology Review, 24(2), 239–254.

Rhee, H., & Kim, S. (2016). Effects of breaks on regaining vitality at work: An empiricalcomparison of “conventional” and “smart phone” breaks. Computers in HumanBehavior, 57, 160–167.

Rohde, T. E., & Thompson, L. A. (2007). Predicting academic achievement with cognitiveability. Intelligence, 35(1), 83–92.

Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. Hoboken, New Jersey:John Wiley & Sons.

Ryans, D. G. (1938). An experimental attempt to analyze persistent behavior: I. Measuringtraits presumed to involve “persistence”. The Journal of General Psychology, 19(2),333–353.

Sansone, C., Weir, C., Harpster, L., & Morgan, C. (1992). Once a boring task always aboring task? Interest as a self-regulatory mechanism. Journal of Personality and SocialPsychology, 63(3), 379–390.

Sansone, C., Wiebe, D. J., & Morgan, C. (1999). Self-regulating interest: The moderatingrole of hardiness and conscientiousness. Journal of Personality, 67(4), 701–733.

Sarason, I. G., & Sarason, B. R. (1990). Test anxiety. In H. Leitenberg (Ed.). Handbook ofsocial and evaluation anxiety (pp. 475–495). New York, NY: Plenum Press.

Sharma, L., Markon, K. E., & Clark, L. A. (2014). Toward a theory of distinct types of“impulsive” behaviors: A meta-analysis of self-report and behavioral measures.Psychological Bulletin, 140(2), 374–408.

Silverman, I. W. (2003). Gender differences in resistance to temptation: Theories andevidence. Developmental Review, 23(2), 219–259.

Spencer, S. J., Steele, C. M., & Quinn, D. M. (1999). Stereotype threat and women’s math

performance. Journal of Experimental Social Psychology, 35(1), 4–28.Steinberg, L. (2005). Cognitive and affective development in adolescence. Trends in

Cognitive Sciences, 9(2), 69–74.Stipek, D., & Gralinski, J. H. (1996). Children’s beliefs about intelligence and school

performance. Journal of Educational Psychology, 88(3), 397–407.Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. New York, NY:

Pearson.Trautwein, U., Lüdtke, O., Schnyder, I., & Niggli, A. (2006). Predicting homework effort:

Support for a domain-specific, multilevel homework model. Journal of EducationalPsychology, 98(2), 438–456.

Tsukayama, E., Duckworth, A. L., & Kim, B. E. (2013). Domain-specific impulsivity inschool-age children. Developmental Science, 16(6), 879–893.

Tze, V. M., Daniels, L. M., & Klassen, R. M. (2016). Evaluating the relationship betweenboredom and academic outcomes: A meta-analysis. Educational Psychology Review,28(1), 119–144.

Van Der Schuur, W. A., Baumgartner, S. E., Sumter, S. R., & Valkenburg, P. M. (2015). Theconsequences of media multitasking for youth: A review. Computers in HumanBehavior, 53, 204–215.

Varao-Sousa, T. L., & Kingstone, A. (2019). Are mind wandering rates an artifact of theprobe-caught method? Using self-caught mind wandering in the classroom to test,and reject, this possibility. Behavior Research Methods, 51(1), 235–242.

Ventura, M., Shute, V., & Zhao, W. (2013). The relationship between video game use anda performance-based measure of persistence. Computers & Education, 60(1), 52–58.

Voyer, D., & Voyer, S. D. (2014). Gender differences in scholastic achievement: A meta-analysis. Psychological Bulletin, 140(4), 1174–1204.

Walton, G. M. (2014). The new science of wise psychological interventions. CurrentDirections in Psychological Science, 23(1), 73–82.

Wammes, J. D., Boucher, P. O., Seli, P., Cheyne, J. A., & Smilek, D. (2016). Mind wan-dering during lectures I: Changes in rates across an entire semester. Scholarship ofTeaching and Learning in Psychology, 2(1), 13–32.

Wieber, F., von Suchodoletz, A., Heikamp, T., Trommsdorff, G., & Gollwitzer, P. M.(2011). If-then planning helps school-age children to ignore attractive distractions.Social Psychology, 42(1), 39–47.

Yeager, D. S., Dahl, R. E., & Dweck, C. S. (2018). Why interventions to influence ado-lescent behavior often fail but could succeed. Perspectives on Psychological Science,13(1), 101–122.

Yeager, D. S., Hanselman, P., Walton, G. M., Murray, J. S., Crosnoe, R., Muller, C., ...Hinojosa, C. P. (2019). A national experiment reveals where a growth mindset im-proves achievement. Nature, 573(7774), 364–369.

Yeager, D. S., Henderson, M., Paunesku, D., Walton, G., D’Mello, S., Spitzer, B., &Duckworth, A. (2014). Boring but important: A self-transcendent purpose for learningfosters academic self-regulation. Journal of Personality & Social Psychology, 107(4),559–580.

C.A. Spann, et al. Learning and Individual Differences 80 (2020) 101870

16