The unobtrusive knowledge test: validity and stereotype threats

14
The unobtrusive knowledge test 577 Equal Opportunities International Vol. 28 No. 7, 2009 pp. 577-590 # Emerald Group Publishing Limited 0261-0159 DOI 10.1108/02610150910996416 Received 29 December 2008 Revised 30 March 2009, 16 July 2009 Accepted 23 July 2009 The unobtrusive knowledge test: validity and stereotype threats Daniel E. Martin and Carol F. Moore Department of Management, California State University, East Bay, Haywood, California, USA, and Carol Hedgspeth Department of Psychology, Morgan State University, Baltimore, Maryland, USA Abstract Purpose – The purpose of this paper is to validate the unobtrusive knowledge test (UKT) in a minority population, and examine its potential for limiting stereotype threat. Design/methodology/approach – Study One: (convergent validity): UKT and Wonderlic Personnel Test (WPT) scores were correlated for 131 students. Study Two: (stereotype threat) 202 minority students were placed into one of four groups based on whether or not they were given instructions to elicit stereotype threat, and whether they took the Excellence scale of the UKT or the WPT. Findings – Correlations provided evidence of convergent validity between the Excellence subscale of the UKT and the WPT. The stereotype threat study was inconclusive, with no differences being seen in the threat/non-threat conditions for the WPT, and higher scores in the threat condition than the non-threat condition for the UKT. Research limitations/implications – Unreliability of some scales and low correlations of others with the WPT, lessened the overall UKT’s convergent validity. Practical implications – The need to develop measures of intelligence not subject to adverse impact is clear, and the results of the current research provide justification for further research establishing the properties of the UKT as a selection tool. Originality/value – This paper offers new evidence of the usefulness of the UKT as a measure of cognitive ability for minority populations, and raises questions about the impact of stereotype threat on the UKT test. Keywords Selection, Cognition, Scoring procedures (tests), Intelligence tests, Prejudice Paper type Research paper Introduction A recent review of court decisions related to cognitive ability testing states that ‘‘General cognitive ability is likely the single best predictor of job performance, although it typically results in race-based adverse impact’’ (Shoenfelt and Pedigo, 2005, p. 271). This conclusion mirrors earlier research on cognitive ability/intelligence testing (Hunter, 1986; Hunter and Hunter, 1984; Murphy, 2002; Outtz, 2002; Ree and Earles, 1992; Ree et al., 1994; Schmidt and Hunter, 1998) and raises a dilemma for the Human Resources professional – is it possible to find an effective cognitive ability test that does not have adverse impact? One approach to solving this problem includes creating different forms of cognitive ability tests. Cattell and Cattell (1959) and Raven et al. (1998) both tried to reduce bias in intelligence measures due to exposure to language and values by creating abstract, knowledge-free measures of intelligence. These measures produced larger performance differences between African-Americans and White Americans than other available measures. Helms (1992) stated that framing ability test content in social contexts might reduce White-Black mean differences. DeShon et al. (1998) modified ability items to reflect everyday situations, but testing still produced group differences. Hough and colleagues (2001) suggested using tests of specific cognitive abilities, rather than tests The current issue and full text archive of this journal is available at www.emeraldinsight.com/0261-0159.htm

Transcript of The unobtrusive knowledge test: validity and stereotype threats

The unobtrusiveknowledge test

577

Equal Opportunities InternationalVol. 28 No. 7, 2009

pp. 577-590# Emerald Group Publishing Limited

0261-0159DOI 10.1108/02610150910996416

Received 29 December 2008Revised 30 March 2009,

16 July 2009Accepted 23 July 2009

The unobtrusive knowledge test:validity and stereotype threats

Daniel E. Martin and Carol F. MooreDepartment of Management, California State University, East Bay,

Haywood, California, USA, and

Carol HedgspethDepartment of Psychology, Morgan State University, Baltimore,

Maryland, USA

Abstract

Purpose – The purpose of this paper is to validate the unobtrusive knowledge test (UKT) in aminority population, and examine its potential for limiting stereotype threat.Design/methodology/approach – Study One: (convergent validity): UKT and Wonderlic PersonnelTest (WPT) scores were correlated for 131 students. Study Two: (stereotype threat) 202 minoritystudents were placed into one of four groups based on whether or not they were given instructions toelicit stereotype threat, and whether they took the Excellence scale of the UKT or the WPT.Findings – Correlations provided evidence of convergent validity between the Excellence subscale ofthe UKT and the WPT. The stereotype threat study was inconclusive, with no differences being seenin the threat/non-threat conditions for the WPT, and higher scores in the threat condition than thenon-threat condition for the UKT.Research limitations/implications – Unreliability of some scales and low correlations of otherswith the WPT, lessened the overall UKT’s convergent validity.Practical implications – The need to develop measures of intelligence not subject to adverseimpact is clear, and the results of the current research provide justification for further researchestablishing the properties of the UKT as a selection tool.Originality/value – This paper offers new evidence of the usefulness of the UKT as a measure ofcognitive ability for minority populations, and raises questions about the impact of stereotype threaton the UKT test.

Keywords Selection, Cognition, Scoring procedures (tests), Intelligence tests, Prejudice

Paper type Research paper

IntroductionA recent review of court decisions related to cognitive ability testing states that‘‘General cognitive ability is likely the single best predictor of job performance,although it typically results in race-based adverse impact’’ (Shoenfelt and Pedigo, 2005,p. 271). This conclusion mirrors earlier research on cognitive ability/intelligence testing(Hunter, 1986; Hunter and Hunter, 1984; Murphy, 2002; Outtz, 2002; Ree and Earles,1992; Ree et al., 1994; Schmidt and Hunter, 1998) and raises a dilemma for the HumanResources professional – is it possible to find an effective cognitive ability test thatdoes not have adverse impact?

One approach to solving this problem includes creating different forms of cognitiveability tests. Cattell and Cattell (1959) and Raven et al. (1998) both tried to reduce bias inintelligence measures due to exposure to language and values by creating abstract,knowledge-free measures of intelligence. These measures produced larger performancedifferences between African-Americans and White Americans than other availablemeasures. Helms (1992) stated that framing ability test content in social contexts mightreduce White-Black mean differences. DeShon et al. (1998) modified ability items toreflect everyday situations, but testing still produced group differences. Hough andcolleagues (2001) suggested using tests of specific cognitive abilities, rather than tests

The current issue and full text archive of this journal is available atwww.emeraldinsight.com/0261-0159.htm

EOI28,7

578

of general intelligence to reduce adverse impact, and showed some evidence that suchan approach is successful, especially when tests are tailored for specific jobs.

Sternberg (2000) has criticized methods of test validation that may lead to systematiccultural bias. He noted that intelligence can be represented in terms of a person’s talentsand the abilities that are valued in a particular sociocultural context. To the extent thatone’s behavior is different from that by socially valued behavior, these individuals will beviewed as less successful and intelligent. As an example, Wiesen (2002) argues sometests have unnecessarily high reading levels, and others reflect topics accrued throughacculturation (knowledge reflected in some societal norms). In a continuation of previouswork, Helms (2006) argues that intelligence test scores need to account for individualdifferences in test takers by considering ethnic cultural psychological constructs thatmay impact performance of individuals in designated groups.

A second approach to dealing with adverse impact involves understanding andmodifying the conditions which produce differential performance on cognitive abilitytests. Steele (1992, 1997) and Steele and Aronson (1995) argue that situational factorsmay provide a theoretical explanation for minority underachievement (Aronson et al.,1998). They also postulate that poor performance may result from pressures confrontinga minority with a negative stereotype about their group while taking a difficult test in arelevant domain (Steele, 1997; Steele and Aronson, 1995, 1998). Steele and Aronson (1995)suggest modifying testing instructions to remove conditions of stereotype threat, orapprehension related to manifesting a negative group stereotype. One recent studydemonstrating the effectiveness of this approach showed that when Sub-SaharanAfricans who lived in Belgium were told that people from Africa typically did worse onintelligence tests than Belgians, the Africans’ performance on such a test dropped. Studyparticipants who were not given such instructions showed no such drop (Klein et al.,2007). Other research shows that when people are told about a behavior that willcounteract the stereotype (for example, women being told to act aggressively during anegotiation), there is less of a drop in performance related to stereotype threats (Krayet al., 2001). In a 2007 review, Roberson and Kulik conclude that stereotype threat ispervasive in the workplace, and that an understanding of stereotype threat is imperativefor both eliminating adverse impact and managing diversity effectively.

The unobtrusive knowledge test (UKT) (Legree et al., 2000) is a cognitive abilitiestest that appears to be a survey. The test shows promise for use in unproctored testingsituations (i.e. a setting with no test facilitator to establish the veracity of individualanswers) because people perceive it as a survey rather than a test, even in testingsituations (Legree et al., 2000; Martin et al., 2007). This means that people who mighttake a UKT may perceive it as a survey, and answer honestly, without sufferingevaluation apprehension or stereotype threat. This feature may also make the UKTuseful as a test that does not arouse test takers’ negative group stereotypes, thusminimizing adverse impact. This paper presents two studies – one examining whetheror not the UKT shows convergent validity in a minority population with the WonderlicPersonnel Test (WPT), a widely used measure of cognitive ability, and anotherinvestigating the impact of stereotype threat conditions on both the UKT and the WPT.

The unobtrusive knowledge testThe UKT is a new form of cognitive ability test. Originally developed as a marketsegmentation tool for use in targeting potential military recruits, it appears to be asurvey that asks respondents to make estimates using Likert scales in five areas:Military Positions (the size of various Army job families); Word Frequency (the

The unobtrusiveknowledge test

579

frequency of usage of various English words); Excellence (the connotations of termsimplying degrees of excellence); Auto Reliability (the relative reliability of variousautomobiles) and Miles per Gallon (MPG; the fuel economy of various automobiles)(Legree et al., 2000). For example, one item of the MPG scale asks an individual to ratehow efficient/economical it would be to run a Ford Mustang, with a rating of 1indicating low MPG, and a rating of 9 indicating high MPG. Two of the scales, WordFrequency and Excellence, were designed to capture variance associated with theVerbal factor of the Armed Service Vocational Aptitude Battery (ASVAB; U.S.Department of Defense, 1984), while the other three scales were related to the ASVABTechnical factor (cf. Kass et al., 1983 and Ree and Carretta, 1994 for additionalinformation on ASVAB factor structure). The overall ASVAB score is generallyassumed to measure g or generalized intelligence (Ree et al., 1994). Recent worksuggests that the ASVAB estimates crystallized intelligence (which is linked tomaterial learned in formal education), rather than fluid intelligence (which is related toperceptual and memory processes) (Roberts et al., 2000).

The developers of the UKT were particularly interested in creating a measure whichwould be unobtrusive (takers would not realize they were taking a test); unlikely to becompromised (takers would not be likely to refer to outside sources when taking the test);short (the measure would be brief enough that it could be included in other surveys); andcorrelated with Psychometric g (Legree et al., 2000). Ultimately, the goal was to produce atest that could yield valid selection information even when given in unproctoredsituations where test control is minimal (e.g. with Internet or mail-based distribution).

In the initial validation study (Legree et al., 2000), 288 Air Force recruits who hadtaken the ASVAB during enlistment procedures, also completed the five UKT scalesnoted above. UKT scale responses were transformed to control for response bias usinga consensus-based scoring system. For any given sample, the mean is considered to bethe best performance, and the smaller the individual score distance from the mean, thebetter the respondent’s performance. Thus, a mean distance of 0 is the highest scorepossible, and as scores increase, performance on the test decreases. Final correlationresults were reflected so that higher performance on the ASVAB would be positivelycorrelated with higher performance on the UKT (see Figure 1 for sample UKT scale).

Four scales that best estimated g in the original UKT were developed to measuresimilar domains to the Armed Service Vocational Ability battery (ASVAB):

(1) Job Positions, the size of various job families, 15 items;

(2) Word Frequency, the frequency of usage of various English words, 30 items;

(3) Excellence, the connotations of terms implying degrees of excellence, 15 items;and

(4) Miles per Gallon (MPG), the fuel economy of various automobiles, 18 items.

Figure 1.Example of unobtrusive

knowledge scales

EOI28,7

580

It should also be noted that the scoring system for the UKT is completely sampledependent, given this consensus-based approach. The overall correlation betweenUKT and ASVAB scores was 0.80 after correction for attenuation. Population estimatesof the correlation of each UKT scale with the overall ASVAB (ASVAB-g) ranged from0.07 (for the Auto Reliability scale) to 0.66 (for the Miles per Gallon scale). Theremainder of the scales had correlations with the ASVAB-g above 0.42, suggesting thatthe UKT showed convergent validity with the ASVAB. Confirmatory factor analysesshowed that the UKT scales loaded on the expected factors of the ASVAB, except forMPG, which loaded on both the Verbal and Technical factors of the ASVAB. Thesefindings, taken together, offer evidence of construct validity, especially for the WordFrequency, Excellence and Military Positions scales of the UKT.

Current studiesThis research consisted of two studies examining UKT results in a non-military,minority population. We sought minority populations (specifically African-Americanschools, as this is where the stereotype threat hypothesis is most relevant). In bothstudies (which used separate and different experimental pools), students were part of thepsychology experimental pool and had to provide consent to participate in theexperiments, which had received Institutional Review Board clearance. All participationwas voluntary, and all students received extra credit, resulting in a 100 percentparticipation rate. The first study sought to establish the convergent validity of the UKTscales by correlating the UKT with another intelligence test commonly used in jobselection, the WPT. Convergent validity is the degree to which the measure is similar to(converges on/correlates with) other measures to which it theoretically should be similar.

The WPT is a 50-item, 12 min test of cognitive ability (Wonderlic, 1992) that hasbeen shown to measure general intelligence (g), fluid intelligence and crystallizedintelligence (Bell et al., 2002, Matthews and Lassiter, 2007). It is also of interest becauseit is used most often in non-military settings – clients include everyone from banks(Porter, 1996) to retail stores (Hayes, 2004) to the National Football League (Roberts,2006; Barra, 2006). The Wonderlic company released norms in 2005 that covered theuse of its tests for more than 200 employers and 2000 jobs (Higley, 2008). The WPTshows predictive validity with job performance in over seventy occupations(Gottfredson, 1997; Wonderlic, 1992). As the WPT is a well known, job related measureof intelligence, we used it to establish the convergent validity of the UKT scales.

Along with convergent validity information, divergent validity is important inunderstanding the measure being used (Campbell and Fiske, 1959). Divergent validityprovides information regarding the lack of correlation between theoretically differentconstructs. As part of a larger concurrent project, we also collected data using a scalefrom the Multidimensional Inventory of Black Identity (MIBI) (Sellers et al., 1998). TheMIBI is a paper and pencil instrument developed to measure Black identity-basedconstructs (specifically racial centrality, ideology, and regard). The Racial Centralityscale measures whether race is considered part of one’s self-identity. As intelligencescales (the WPT and UKT scales) and identity scales (Racial Centrality scale) clearlyaim to measure different constructs, we used the racial centrality scale to establishdivergent validity.

In testing both convergent and divergent validity for the UKT, it was hypothesizedthat:

H1. Significant correlations will be found between the WPT and the UKT.

The unobtrusiveknowledge test

581

H2. No significant correlations will be found between the UKT scales and theRacial Centrality Scale of the MIBI

The second study examined the impact of stereotype threat instructions on the twotests. In this study, we expected that:

H3. WPT performance scores will be lower in the stereotype threat condition thanthe non-stereotype threat condition, but the UKT performance will not differbetween the stereotype threat and non-stereotype threat conditions (as it isperceived and experienced as a survey).

Study 1InstrumentsAn overview of the instruments used in this study is given above. The commerciallyavailable version of the WPT (12 min, 50 items) was used. Because the scoring systemfor this test is proprietary, the tests were sent to the Wonderlic company for scoring.Each individual who took the test got an overall score, but it was not possible for theauthors to calculate standard inter-item reliabilities for these scores. The Wonderliccompany reports test-retest reliabilities ranging from the 0.70 to 0.90 s for this test(Wonderlic, 1992) and a high correlation with the Wechsler Adult Intelligence Scale(Revised) (r ¼ 0.92). The version of the UKT used in this study was a modification ofthe original UKT described above. The ‘‘Military Positions’’ scale, which asksrespondents to estimate the size of various Army job families (or Military OccupationalSpecialties), was modified to reflect civilian occupations.

The eight-item racial centrality scale of the The Multidimensional Index of BlackIdentity (MIBI) questionnaire (Sellers et al., 1998) provided a measure of the constructof racial identity (Cronbach’s alpha 0.78), and was used to test the divergent validity ofthe UKT scales. The racial centrality scale is a measure of the construct of Black racialidentity. Participants also completed a set of demographic questions that provided age,gender, and ethnicity information about the sample.

ParticipantsOne hundred and thirty-one psychology students from a Historically Black universityin the Mid-Atlantic region of the United States participated in Study 1. Studentsvolunteered and received academic credit for this study. Due to participants’ non-response to various demographic questions during the data collection, twenty-one (21)questionnaires were not included in the demographic analyses, resulting in missingdata and a total n of 110. Seventy-two percent were female, 27 percent were male, 87percent were African-American, 3 percent were African, and 10 percent wereCaribbean. Ninety-four percent of participants were between the ages of 16 and 22,with the oldest participant being 28 (see Table I).

All participants were informed of the procedures by the experimenter and gaveconsent before participating in the study.

MethodThe participants were randomly placed into one of two conditions to control for ordereffects. The first group completed the UKT first, then the WPT, while the second groupcompleted the WPT first, then the UKT. Participants were tested in small groups of nomore than 25. Each participant provided demographic information after taking the twotests.

EOI28,7

582

Manipulation checkBecause the UKT was designed to be an unobtrusive measure, the researchers ran amanipulation check to see if a minority sample would see the test in the same way asthe sample on which the test was originally developed and normed. After completingthe experiment, participants were asked to indicate whether they thought theexperimental measures were tests or surveys. Two chi-square tests are reported here,as the order of the tests was also manipulated to control for order effects. Again, due toparticipants’ non-response to various questions during the data collection, two (2)questionnaires were not included in the second (UKT first, then WPT) chi-squareanalyses, resulting in missing data with a total n of 129. Half the participants weregiven the UKT first, and half were given the WPT first. The test not taken first wasthen given as the second test. The percentage of participants that perceived the UKT asa survey vs the WPT as a test was significant, X2(1, n ¼ 131) ¼ 23.09, p < 0.001 whenthe UKT was given first, as well as when it was given second X2(1, n ¼ 129) ¼ 53.40,p < 0.001 (see Table II).

Results of this manipulation check indicate that this minority sample indeed sawthe UKT as a survey as did the sample with which the test was developed and normed.

ResultsThe mean and standard deviation for the WPT scores were 20.95 and 5.168,respectively. As participation was voluntary we did not have full participation on someof the measures, resulting in missing data. Means and Cronbach alphas for the UKTscales are presented in Table III.

Reliabilities were acceptable for all UKT scales. A correlation matrix among theintelligence variables is presented in Table IV.

Table I.Demographiccharacteristics ofparticipants in StudyOne (n ¼ 110)

Characteristic n Valid (%)

Agea (mean ¼ 19.35 years)16-22 104 94.523-25 4 3.626-28 2 1.8

GenderMales 30 27Females 78 72

EthnicityAfrican American 96 87African 4 3Caribbean 10 10

Note: Totals may not equal 110 nor 100 percent, as not all participants responded to alldemographic items

Table II.Order of instrument vsinstrument type

Was survey or test Was survey or testInstrument Survey Test Instrument Survey test

UKT 93 38 WPT 23 106

The unobtrusiveknowledge test

583

Correlations between the WPT and the Excellence scale supported hypothesis one thatthe Excellence UKT scale and the WPT were tapping into the same construct –intelligence (see correction for range restriction below r ¼ 0.63). This supports anassessment of convergent validity. Accordingly, we chose to use only the Excellencescale to conduct our second study on stereotype threat effects. It is of interest that thethree scales that did not correlate with the WPT, Job Positions, Word Frequency, andMPG were all significantly correlated with each other. Further research is needed to seeif these scales may tap into an aspect of intelligence that is not measured by the WPT,perhaps the tacit or practical intelligence proposed by Sternberg and colleagues(Sternberg, 1982; Sternberg and Hedlund, 2002).

The lack of significant correlations between the UKT scales and the RacialCentrality scale of the MIBI provides support for hypothesis two, and demonstrates thedivergent validity of the UKT scales.

Correction for range restrictionAs we used the WPT and sampled from a group of university students who wereselected on the basis of intelligence as measured by standardized college entrance exams(SATor ACT), we have a restricted sample. The students had an average Wonderlic scoreof 20.98, and a standard deviation of 5.168. The average for working adults in the USA is21.75, with a standard deviation of 7.6 (Wonderlic, 1992). Table V presents both correctedand uncorrected correlations between UKT scales and the WPT.

Corrected correlations show a pattern of results similar to the uncorrectedcorrelations. Of particular interest is the correlation between the Excellence scale andthe WPT, which increases substantially to �0.63. Based on the results of Study 1,we found partial support for our first hypothesis that significant correlations will be

Table III.Reliability of UKT scales

UKT scale No. of cases Mean Std. deviation Cronbach alpha

Jobs 102 1.64 0.49 0.70Frequency 104 1.45 0.44 0.81Excellence 107 1.04 0.59 0.85MPG 104 1.66 0.48 0.77

Table IV.Intelligence measures

correlational matrix

UKT Scales

WPTJob

positionsWord

frequency Excellence MPGRacial

centrality

WPT 1.00Job positions �0.16 1.00Word frequency �0.11 0.27** 1.00Excellence �0.48** 0.00 0.12 1.00MPG �0.13 0.27** 0.25* 0.08 1.00Racial centrality 0.01 0.03 �0.02 �0.02 0.13 1.00

Notes: *Correlation is significant at the 0.05 level (2-tailed); **Correlation is significant at the 0.01level (2-tailed)

EOI28,7

584

found between the WPT and the experimental UKT scales (specifically the Excellencescale), again demonstrating convergent validity.

Factor analysesConfirmatory factor analysis (CFA) is a statistical technique often used to determine thefactor structure of a set of observed variables within a study. In the current study, CFA wasused to better understand the relationship between the measures of intelligence on bothtests (the UKT and the WPT) and their underlying latent factors. In order to extractcommon factors, a principal-components factor analysis using Varimax rotation was used.Kaiser’s criterion (Nunnally, 1978) was applied prior to factor rotation in order to retainonly those factors with an eigenvalue of 1.0 or greater. In addition, using extraction criteriabased on proportion of variance accounted for (a standard of greater than 20 percent wasused in order to keep only those factors accounting for the most variance in the data) aswell as interpretability, the principal-components analysis yielded a two-factor solution.Extracted factors were examined and named based on an analysis of the instrument scalesthat loaded on each factor. Factor 1 was called Public Awareness, whereas Factor 2 wasnamed Cognitive Aptitude. Factor analysis results are reported in Tables VI and VII.

Table V.Range restrictioncorrections for entireworking population

UKT measuresUncorrected correlations

with WPTCorrected correlations

with WPT

Job Positions �0.16 �0.23Frequency of use �0.11 �0.16Excellence �0.48** �0.63**Miles per Gallon �0.13 �0.19

Notes: *Significant at the 0.05 level; **significant at the 0.001 level

Table VI.Total variance accountedfor using principal-components analysiswith Varimax rotation(intelligence measures inWPT and UKT)

Initial eigenvaluesComponent Total % of variance Cumulative %

1 1.79 29.90 29.902 1.27 21.17 51.083 0.96 16.02 67.104 0.75 12.61 79.715 0.71 11.88 91.606 0.50 8.40 100.00

Table VII.Rotated componentmatrix for intelligencemeasures (PCA, Varimaxwith Kaisernormalization)

Factor1 (Public awareness) 2 (Cognitive aptitude)

MPG 0.68Job Position 0.67Word Frequency 0.62 0.22Price 0.56 �0.10Excellence 0.86Total WPT �0.17 �0.76

The unobtrusiveknowledge test

585

The WPT and Excellence scale clearly load on the second factor, accounting for 21percent of the variance, providing qualified support for the hypothesis that the WPTand the UKT assess the same intelligence domains thus our subsequent focus on theExcellence scale only. Again, for subsequent analyses the only experimental scale thatmet standard criteria (the Excellence Scale) will be used.

Study 2ParticipantsStudy 2 utilized a different sample of 202 undergraduate psychology students from thesame historically Black university in the Mid-Atlantic region of the United States.These students all volunteered for the study and received academic credit for theirparticipation. Seventy-four percent of the subjects were female, 26 percent were male,82 percent were African-American, 11 percent were Caribbean and 7 percentrepresented other ethnicities (Middle eastern – 1.5 percent, Caucasian – 2 percent, andAsian – 2.5 percent). Subjects were between the ages of 17 and 33 with 98 percent beingbetween the ages of 17 and 22. (see Table VIII for details.)

MethodIn this study, stereotype threat conditions were induced by giving participantsinstructions that indicated that their performance would be compared to theperformance of students in other universities. Because the experiment was performedat a historically Black university, comparison universities would be presumed to bepredominately Caucasian. In addition, the experimenter was a white male in his early1930s, and this combination was successful in inducing stereotype threat in otherstudies (Sloan et al., 2008). Participants were presented one of two booklets thatdescribed the group comparison (threat/non-threat) conditions. In the non-threatcondition the participant received instructions saying that their performance would benot being examined individually, but in aggregate across groups. Thus, participantswere randomly placed into one of four groups: UKT/threat, UKT/non-threat, WPT/threat, or WPT/non-threat.

After being given their instructions, the participants had the opportunity to askquestions. They were then given 12 min to complete the WPT or 15 min to complete theUKT. When the test was finished, students provided demographic information.

Table VIII.Demographic

characteristics ofparticipants in Study 2

Characteristic n Valid %

Agea (mean ¼ 19.07 years)17-22 198 9823-33 4 2

GenderMales 149 74Females 53 26

EthnicityAfrican-American 166 82Caribbean 22 11Other 14 7

Note: a(n ¼ 202); totals may not equal 202 nor 100 percent, as not all participants responded toall demographic items

EOI28,7

586

ResultsAs the between-subjects experiments took place separately (with randomly assignedparticipants taking only one test in either threat condition), one way analyses ofvariance (ANOVAs) were used to establish significant differences within the threatcondition for each test type (WPT vs UKT). Performance on the WPT was reflected inthe number of items correctly answered. A one-way ANOVA was used to analyze theresults. No significant results were found (see Table IX).

A one-way ANOVA was also used to analyze the effect of a stereotype threatcondition on the UKT Excellence scale. Surprisingly, there was a significant differencebetween the two conditions such that those in the threat condition scored significantlyhigher than those in the non-threat condition (see Table X).

This finding seems contrary to our third hypothesis, as participants in thestereotype threat condition performed significantly better (M ¼ 0.67) than those in thenon-threat condition (M ¼ 0.84). (Remember that means closer to zero represent higherperformance on this test.) This raises the interesting possibility that because the UKTis perceived as a survey rather than a test, takers are less likely to be affected bystereotype threats. This finding lends support the notion to use the Excellence Scale ofthe UKT as an unobtrusive measure.

LimitationsIn Study 1, changes to the original UKT scales might have influenced the outcomes ofthe research. More information needs to be collected to determine exactly what therevised Jobs scale measures. For Study 2, the stereotype threat manipulation waslimited by the lack of a non-minority sample. While the phenomenon of stereotypethreat is thought to apply predominantly to minority populations, data from majoritypopulations could have facilitated a clearer understanding of the impact of ourmanipulation.

Implications and conclusionsThe correlation between the Excellence Scale of the UKT and the WPT suggests thatthis scale has the potential to be used as a brief, unobtrusive measure of intelligence.Such a test would have a wide variety of uses, from research to selection. It would beespecially useful in situations where social desirability presents an issue. Furtherresearch must be done to fully understand the UKT and its utility in each of theaforementioned settings. One possibility is to perform factor analyses to see how thedifferent scales in the UKT are related to each other in different samples. Another is toexamine the relationship of the UKT scales to different types of performance related

Table IX.ANOVA for WPT(Threat vs non-threat)

Dependent variable Threat mean Non-threat mean F df p Partial �2

Total Wonderlic score 21.00 21.57 0.284 1,99 0.596 0.003

Table X.ANOVA for ExcellenceUKT (Threat vs non-threat)

Dependent variable Threat mean Non-threat mean F df p Partial �2

Excellence scale score 0.67 0.84 8.03 1,99 0.006 0.075

The unobtrusiveknowledge test

587

tests. As noted above, it would be particularly interesting to see how the UKTcorrelated with tests of tacit intelligence.

Our second study did not yield results consistent with one of our hypotheses. Thismay have been due to the manipulation itself or the characteristics of the two tests.With regard to the manipulation, although subjects were told that their scores would becompared to the scores of students at other colleges, they were not told explicitly thattheir scores would be compared to the scores of white students. While this had been asuccessful manipulation in previous studies, it may be that our sample responded tothe manipulation differently than previous samples.

If the manipulation was successful, the reversal of the manipulation effects for theUKT provides a great potential for further research. One explanation for this findingmay be found in Duval and Wicklund’s (1972) objective self-awareness theory, whichsuggests that people focus on themselves as a way of considering how they may beseen by others. They attribute performance differences to an increased focus onpersonal effort to avoid the negative experience of missing a goal. With a task that isperceived as simple, the effort may result in better performance, but with tasks that areperceived as complex or difficult, it may result in worse performance. Participants mayhave performed better in the threat condition for the UKT (a ‘‘simple’’ survey) becausethey felt compared to and more competitive with others, while for the more ‘‘difficult’’WPT, this same comparison resulted in worse performance.

The present research supported aspects of our hypotheses, and provided someinsight into the stereotype threat paradigm. Our inability to replicate Steele’s findingsthrough the manipulation of threat may suggest that such findings occur in highlyspecific conditions (i.e. using written stimulus only) and populations. The finding ofconvergent validity between the UKT and the WPT offers some intriguing possibilitiesfor future research.

The fact that the majority of the test takers perceived the UKT as a survey supportsits use as a short and inexpensive measure of intelligence in non-proctoredenvironments. Because of its relationship to both the ASVAB and the WPT, it ispossible the UKT could be used to facilitate estimates of intelligence in a wide range ofapplied arenas where social desirability, evaluation apprehension or test anxiety/stereotype threat could prevent accurate assessment: personnel, clinical, and forensicassessment, as well as marketing research.

References

Aronson, J., Quinn, D.M. and Spencer, S.I. (1998), ‘‘Stereotype threat and the academicunderperformance of minorities and women’’, in Swim, I.K. and Stangor, C. (Eds),Prejudice: The Targets Perspective, Academic Press, San Diego, CA, pp. 83-103.

Barra, A. (2006), ‘‘Do these NFL scores count for anything?’’ Wall Street Journal – EasternEdition, Vol. 247 No. 96, p. D6.

Bell, N.L., Matthews, T.D., Lassiter, K.S. and Leverett, J.P. (2002), ‘‘Validity of the wonderlicpersonnel test as a measure of assessment’’, North American Journal of Psychology, Vol. 4No. 1, pp. 113-20.

Campbell, D.T. and Fiske, D.W. (1959), ‘‘Covergent and discriminant validation by the multitrait-multimethod matrix’’, Psychological Bulletin, Vol. 56, pp. 81-105.

Cattell, R.B. and Cattell, A.K.S. (1959), The Culture Fair Test, Institute for Personality and AbilityTesting, Champaign, IL.

EOI28,7

588

DeShon, R.P., Smith, M.R., Chan, D. and Schmitt, N. (1998), ‘‘Can racial differences in cognitive testperformance be reduced by presenting problems in a social context?’’ Journal of AppliedPsychology, Vol. 83 No. 3, pp. 438-51.

Duval, S. and Wicklund, R.A. (1972), A Theory of Objective Self Awareness, Academic Press, NewYork, NY.

Gottfredson, L.S. (1997), ‘‘Why g matters: the complexity of everyday life’’, Intelligence, Vol. 24No. 1, pp. 79-133.

Hayes, R. (2004), ‘‘Predicting the job performance of store detectives’’, Security Journal, Vol. 17,pp. 7-20.

Helms, J.E. (1992), ‘‘Why is there no study of cultural equivalence in standardized intelligencetesting?’’ American Psychologist, Vol. 47 No. 9, pp. 1083-101.

Helms, J.E. (2006), ‘‘Fairness is not validity or cultural bias in racial-group assessment: aquantitative perspective’’, American Psychologist, Vol 61 No. 8, pp. 845-59.

Higley, J. (2008), ‘‘A few deep thoughts for 2008’’, Hotel and Motel Management, Vol. 223 No. 1, p. 6.

Hough, L.M., Oswald, F.L. and Ployhart, R.E. (2001), ‘‘Determinants, detection, and ameliorationof adverse impact in personnel selection procedures: issues, evidence, and lessons learned’’,International Journal of Selection & Assessment, Vol. 9 Nos. 1/2, pp. 152-95.

Hunter, J.E. (1986), ‘‘Intelligence, intelligences, job knowledge, and job performance’’, Journal ofVocational Behavior, Vol. 29, pp. 340-62.

Hunter, J.E. and Hunter, R.F. (1984), ‘‘Validity and utility of alternate predictors of jobperformance’’, Psychological Bulletin, Vol. 96, pp. 72-98.

Kass, R.A., Mitchell, K.J., Grafton, F.C. and Wing, H. (1983), ‘‘Factorial validity of the ArmedServices Vocational Aptitude Battery, Forms 8, 9 and 10: 1981 Army applicant sample’’,Education and Psychological Measurement, Vol. 43, pp. 1077-87.

Klein, O., Pohl, S. and Ndagijimana, C. (2007), ‘‘The influence of intergroup comparisons onAfricans’ intelligence test performances in a job selection context’’, Journal of Psychology,Vol. 141 No. 5, pp. 453-68.

Kray, L.J., Thompson, L. and Galinsky, A. (2001), ‘‘Battle of the sexes: gender stereotypeconfirmation and reactance in negotiations’’, Journal of Personality and Social Psychology,Vol. 80, pp. 942-58.

Legree, P., Martin, D.E. and Psotka, J. (2000), ‘‘Measuring intelligence using unobtrusiveknowledge tests: a new survey technology’’, Intelligence, Vol. 28 No. 4, pp. 291-308.

Martin, D.E., Moore, C.F., Sloan, L.R. and Legree, P.J. (2007), ‘‘An application of the unobtrusiveknowledge test: personnel selection’’, Journal of Business and Behavioral Sciences(International Issue), Vol. 16 No. 1, pp. 4-15.

Matthews, T.D. and Lassiter, K.S. (2007), ‘‘What does the Wonderlic personnel test measure?’’Psychological Reports, Vol. 100 No. 3, pp. 707-12.

Murphy, K. (2002), ‘‘Can conflicting perspectives on the role of g in personnel selection beresolved?’’ Human Performance, Vol. 15 Nos. 1/2, pp. 173-86.

Nunnally, J.C. (1978), Psychometric Theory, McGraw-Hill, New York, NY.

Outtz, J.L. (2002), ‘‘The role of cognitive ability tests in employment selection’’, HumanPerformance, Vol. 15 Nos. 1/2, pp. 161-71.

Porter, J. (1996), ‘‘Find the right person for any job’’, Bank Marketing, Vol. 28 No. 6, p. 60.

Raven, J., Raven, J.C. and Court, J.H. (1998), Manual for Raven’s Progressive Matrices andVocabulary Scales. Section 1: General Overview, Oxford Psychologists Press, Oxford/ThePsychological Corporation, San Antonio, TX.

Ree, M.J. and Carretta, T.R. (1994), ‘‘Factor analysis of ASVAB: confirming a vernon-likestructure’’, Educational and Psychological Measurement, Vol. 54, pp. 457-61.

The unobtrusiveknowledge test

589

Ree, M.J. and Earles, J.A. (1992), ‘‘Intelligence is the best predictor of job performance’’, CurrentDirections in Psychological Science, Vol. 1, pp. 86-9.

Ree, M.J., Earles, J.A. and Teachout, M.S. (1994), ‘‘Predicting job performance; Not much morethan g’’, Journal of Applied Psychology, Vol. 79, pp. 518-24.

Roberson, L. and Kulik, C.T. (2007), ‘‘Stereotype threat at work’’, Academy of ManagementPerspectives, Vol. 21 No. 2, pp. 24-40.

Roberts, S. (2006), ‘‘But can Wonderlic scramble or pass?’’ New York Times, Vol. 155 No. 53530,Section 8, pp. 1-4.

Roberts, R.D., Goff, G.N., Anjoul, F., Kyllonen, P.C., Pallier, G. and Stankov, L. (2000), ‘‘The armedservices vocational aptitude battery (ASVAB)’’, Learning & Individual Differences, Vol. 12No. 1, pp. 81-102.

Schmidt, F.L. and Hunter, J.E. (1998), ‘‘The validity and utility of selection methods in personnelpsychology: Practical and theoretical implications of 85 years of research findings’’,Psychological Bulletin, Vol. 124 No. 2, pp. 262-74.

Schoenfelt, E.L. and Pedigo, L.C. (2005), ‘‘A review of court decisions on cognitive ability testing,1992-2004’’, Review of Public Personnel Administration, Vol. 25 No. 3, pp. 271-87.

Sellers, R.M., Chavous, T.M. and Cooke, D.Y. (1998), ‘‘Racial ideology and racial centrality aspredictors of African American college students’ academic performance’’, Journal of BlackPsychology, Vol. 24 No. 1, pp. 8-27.

Sloan, L.R., Wilburn, G., Van Camp, D., Barden, J., Price, T., Mixon, L. and Martin, D.E. (2008),‘‘Out-group (White) presense and evaluation potential may be necessary to inducestereotype threat impacts within in-group minority diagnostic testing settings’’, paperpresented at the 9th Annual Meeting of the Society for Personality and Social Psychology,Albequerque, NM, pp. 372-3.

Steele, C.M. (1992), ‘‘Race and the schooling of black Americans’’, The Atlantic Monthly, Vol. 269No. 4, p. 68.

Steele, C.M. (1997), ‘‘A threat in the air: How stereotypes shape intellectual ability andperformance’’, American Psychologist, Vol. 52 No. 6, pp. 13-629.

Steele, C.M. and Aronson, J. (1995), ‘‘Stereotype threat and the intellectual test performance ofAfrican-Americans’’, Journal of Personality and Social Psychology, Vol. 69 No. 5, pp. 797-811.

Steele, C.M. and Aronson, J. (1998), ‘‘Stereotype threat and the test performance of academicallysuccessful African Americans’’, in Jencks, C. and Phillips, M. (Eds), The Black-White TestScore Gap, Brookings Institution Press, Washington, DC.

Sternberg, R. (1982), Handbook of Human Intelligence, Cambridge University Press, New York, NY.

Sternberg, R.J. (2000), ‘‘Implicit theories of intelligence as exemplar stories of success: whyintelligence test validity is in the eye of the beholder’’, Psychology, Public Policy, and Law,Vol. 6, pp. 159-67.

Sternbert, R. and Hedlund, J. (2002), ‘‘Practical intelligence, g, and work psychology’’, HumanPerformance, Vol. 15 Nos. 1/2, pp. 143-60.

US Department of Defense (1984), Test Manual for the Armed Services Vocational Aptitude Battery(DoD 1340.12AA). US Military Entrance Processing Command, North Chicago, IL.

Wiesen, J. (2002), Possible Reasons for the Black-White Mean Score Differences Seen With ManyCognitive Ability Tests, Applied Personnel Research, Newton, MA.

Wonderlic, E. (1992), Wonderlic Personnel Test, Wonderlic, Libertyville, IL.

Further Reading

Legree, P.J. (1995), ‘‘Evidence for an oblique social intelligence factor established with a Likertbased testing procedure’’, Intelligence, Vol. 21, pp. 247-66.

EOI28,7

590

About the authorsDaniel E. Martin is an assistant professor of Management at Cal State East Bay. His areas ofresearch interest include Human Resource Management, Survey Methodology, Ethics, Racismand Prejudice, Assessment, Evaluation Research, Intelligence, and Humor. Dan is also co-founder and vice president of Alinea Group, a California-based firm with offices in Washington,DC and San Francisco; Alinea Group provides Industrial/Organizational Psychology andbusiness management expertise to private and public organizations. Formerly a Research Fellowfor the US Army Research Institute as well as a Personnel Research Psychologist for the USOffice of Personnel Management, he has worked with a wide array of organizations in personnelselection, organizational assessment, executive coaching and workforce planning. Dan holds aPhD in Social/Industrial/Organizational psychology from Howard University. Dan is publishedin various professional journals including the Journal of Applied Psychology, Intelligence, Ethicsand Behavior, Management Research News, Military Psychology, and Skeptic Magazine. DanielE. Martin is the corresponding author and can be contacted at: [email protected]

Carol F. Moore is an adjunct professor in the Department of Management at the CaliforniaState University, East Bay. She received her PhD in Industrial/Organizational Psychology fromPurdue University. Her research interests include performance management, motivation,creativity, and power and influence. She is also the President of the Performance ConsultingGroup, an employee development organization.

Carol Hedgspeth is an assistant professor in the Department of Psychology at the MorganState University. She received her PhD in Educational Psychology from Howard University. Herresearch interests include evaluation, psychometrics and moral development.

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints