At Sea With Synthetic Validity

13
Industrial and Organizational Psychology, 3 (2010), 371–383. Copyright © 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 RESPONSE At Sea With Synthetic Validity PIERS STEEL University of Calgary JEFF W. JOHNSON Personnel Decisions Research Institutes P. RICHARD JEANNERET Valtera Corporation CHARLES A. SCHERBAUM Baruch College CALVIN C. HOFFMAN Los Angeles County Sheriff’s Department and Alliant University JEFF FOSTER Hogan Assessment Systems Abstract We expected that the commentary process would provide valuable feedback to improve our ideas and identify potential obstacles, and we were not disappointed. The commentaries were generally in agreement that synthetic validity is a good idea, although we also received a fair amount of suggestions for improvements, conditional or tempered praise, and explicitly critical comments. We address the concerns that were raised and conclude that we should move forward with developing a large-scale synthetic validity database, incorporating the suggestions of some of the commentators. In 1957, researchers representing industry, government, and academia came together in Chicago for a symposium on synthetic validity. The contents of this symposium were published almost intact 2 years later in Personnel Psychology. In 2009, we came together for a similar symposium in Correspondence concerning this article should be addressed to Piers Steel. E-mail: [email protected] Address: Piers Steel, Haskayne School of Business, University of Calgary, SH444 - 2500 University Drive, N.W. Calgary, Alberta, Canada T2N 1N4; Jeff W. Johnson, Personnel Decisions Research Institutes; Charles A. Scherbaum, Department of Psychology, Baruch College; Calvin C. Hoffman, Los Angeles County Sheriff’s Department and Alliant University; P. Richard Jeanneret, Valtera Corporation; Jeff Foster, Hogan Assessment Systems. New Orleans. Taking inspiration from what happened 52 years ago, we collaborated to put forth this focal article on synthetic validity. Our goals were to summarize how far we as a field have come, to address challenges, and to inspire us to go the remaining distance. How did we do? Let’s look to the commentaries. On the positive side, there seems to be general agreement that synthetic validity is a good idea. Bartram, Warr, and Brown (2010) write, ‘‘We strongly support the prin- ciple of synthetic validity and agree with much of the focal article, seeing that form of validity as the only general approach for the future.’’ Hollweg (2010) thinks ‘‘a database that promotes this goal would be 371

Transcript of At Sea With Synthetic Validity

Industrial and Organizational Psychology, 3 (2010), 371–383.Copyright © 2010 Society for Industrial and Organizational Psychology. 1754-9426/10

RESPONSE

At Sea With Synthetic Validity

PIERS STEELUniversity of Calgary

JEFF W. JOHNSONPersonnel Decisions Research Institutes

P. RICHARD JEANNERETValtera Corporation

CHARLES A. SCHERBAUMBaruch College

CALVIN C. HOFFMANLos Angeles County Sheriff’s Department and Alliant University

JEFF FOSTERHogan Assessment Systems

AbstractWe expected that the commentary process would provide valuable feedback to improve our ideas and identifypotential obstacles, and we were not disappointed. The commentaries were generally in agreement that syntheticvalidity is a good idea, although we also received a fair amount of suggestions for improvements, conditional ortempered praise, and explicitly critical comments. We address the concerns that were raised and conclude thatwe should move forward with developing a large-scale synthetic validity database, incorporating the suggestionsof some of the commentators.

In 1957, researchers representing industry,government, and academia came togetherin Chicago for a symposium on syntheticvalidity. The contents of this symposiumwere published almost intact 2 years laterin Personnel Psychology. In 2009, wecame together for a similar symposium in

Correspondence concerning this article should beaddressed to Piers Steel.E-mail: [email protected]

Address: Piers Steel, Haskayne School of Business,University of Calgary, SH444 - 2500 University Drive,N.W. Calgary, Alberta, Canada T2N 1N4; Jeff W.Johnson, Personnel Decisions Research Institutes;Charles A. Scherbaum, Department of Psychology,Baruch College; Calvin C. Hoffman, Los AngelesCounty Sheriff’s Department and Alliant University;P. Richard Jeanneret, Valtera Corporation; Jeff Foster,Hogan Assessment Systems.

New Orleans. Taking inspiration from whathappened 52 years ago, we collaboratedto put forth this focal article on syntheticvalidity. Our goals were to summarize howfar we as a field have come, to addresschallenges, and to inspire us to go theremaining distance. How did we do? Let’slook to the commentaries.

On the positive side, there seems to begeneral agreement that synthetic validityis a good idea. Bartram, Warr, and Brown(2010) write, ‘‘We strongly support the prin-ciple of synthetic validity and agree withmuch of the focal article, seeing that formof validity as the only general approachfor the future.’’ Hollweg (2010) thinks ‘‘adatabase that promotes this goal would be

371

372 P. Steel et al.

a worthy achievement and add significantlyto the scientific body of the profession.’’McCloy, Putka, and Gibby (2010) arguethat substantial funds for realizing it willmaterialize ‘‘given the potential value to thefield.’’ Murphy (2010) admits ‘‘there is nodoubt that synthetic validity is a great idea.’’Oswald and Hough (2010) are ‘‘convincedthat the payoff [for building a syntheticvalidity database] would directly benefit thewelfare of organizations (in real dollars) aswell as employees and the science of workbehavior.’’ Russell (2010) strongly agreesthat ‘‘synthetic validity is ‘practically useful’and . . . that it also holds value in developingtheory.’’ Schmidt and Oh (2010), despiteseveral pointed criticisms, ultimately indi-cate that they are ‘‘favorable to the ideaof synthetic validity.’’ Finally, Vancouver(2010) ‘‘advocates for the development ofa synthetic validity database along the linesdescribed by Johnson et al. (2010), notonly because of its practical utility but alsobecause of its potential contribution to thescience of I-O psychology generally.’’

Almost unanimously, synthetic validityis highly praised, with the majority of com-mentators believing that this is the futurefor our field. Not to be misleading, wealso received a fair amount of suggestionsfor improvements, conditional or temperedpraise, and explicitly critical comments. Wesorted these comments into six categories.The first two categories address discriminantvalidity concerns, specifically (a) ‘‘are yousure you can discriminate validity betweenjobs?’’ and (b) ‘‘are you sure you candiscriminate between performance dimen-sions?’’ The third category is ‘‘Sure it’s bet-ter, but is it better enough?’’ This is mostlya reply to Murphy (2010), who questionswhether synthetic validity is significantlybetter than validity generalization, engag-ing us to make sure that the added effortneeded to create synthetic validity is worththe result. The fourth category, ‘‘the per-sonal touch,’’ addresses how expert opinionfits into the synthetic validity process. Fifth,‘‘what else can it do’’ considers other appli-cations and research questions that syn-thetic validity can address. Finally, we end

with ‘‘been there, done that.’’ In this section,we recognize that there are many near-synthetic validity projects and smaller syn-thetic validity projects currently underway.Their number underscores how feasible alarge-scale synthetic validity project can beonce we have developed the proper tools.

Following these six sections, we outlinenext steps. Borrowing Murphy’s (2010)metaphor that synthetic validity is a ship thathas already sailed, we argue that syntheticvalidity has been successfully sailing theselection seas for decades, stopping intoport periodically for retrofitting. Syntheticvalidity is ready for another upgrade, solet’s lay out the gangplank. As the consensusof opinion revealed here, we have plentyof people who want to come onboardand bring synthetic validity to its nextdestination.

Are You Sure You CanDiscriminate ValidityBetween Jobs?

The most ardent criticisms of syntheticvalidity come from Schmidt and Oh’s(2010) commentary. Frank Schmidt alongwith Jack Hunter must be given recognitionfor pioneering some of the basic work thatgave rise to the modern synthetic validityframework. Notably, their work disputingthe then widespread notion that validitiesare situationally specific made modern syn-thetic validity possible. Our field is deeplyindebted to their efforts, as ‘‘this was amongthe most significant changes in organi-zational research over the past centurybecause it allowed knowledge to accu-mulate much more rapidly across studies’’(Steel & Kammeyer-Mueller, 2008, p. 55).However, their viewpoint reflects an oppo-site extreme that there is no substantive sit-uational specificity for validity coefficientsand what little remains can be accountedfor by general level of complexity.

For job component validity (JCV) towork, there must be predictable variancein validity coefficients across different jobs.Consequently, we agree with Schmidt andOh’s (2010) basic idea that job complexity

At sea with synthetic validity 373

does moderate validity coefficients. We alsoagree with Schmidt and Oh that validitycoefficients are not easy to predict, espe-cially because of measurement error, andthat Hunter’s (1983) DOT complexity scalewill reach significance if reexamined withfar more cases. However, this statementcan be made of almost any predictor (e.g.,Meehl’s crud factor), and it is not the focusof the issue. We want to know the bestway to operationalize job complexity sothat it predicts those hard-to-predict valid-ity coefficients, not just if job complexitycan predict. This requires us to determine ifO*NET or PAQ dimensions are superior tothose derived from the DOT. How can wedetermine which is better?

Schmidt and Oh (2010) are correct that‘‘typically one would classify jobs by com-plexity level and then look to see whethermean validity for a given test increasedas complexity increased.’’ However, thetypical procedure is not necessarily cor-rect. Comparing means without consider-ation of within-group variance is a funda-mentally flawed procedure, with Harvey(2010) reminding us about the problemsof predicting central tendency while ignor-ing variability. Means alone do not tellthe percentage of variance accounted for,and this is what we are interested in(Lubinski & Humphreys, 1996). Weightedleast-squares regression provides the req-uisite statistic, a recommended techniquefor meta-analytic data if accompanied withoutlier analysis (Hunter & Schmidt, 2004;Steel & Kammeyer-Mueller, 2002). To thisend, Steel and Kammeyer-Mueller (2009)showed that the PAQ predicts approxi-mately 30% of the variance of generalmental ability (GMA) validity coefficients,with the DOT data dimension predictingabout 1%. Schmidt and Oh hypothesizethat the PAQ does better because of overfit-ting because it uses more predictors. Thereis no need for conjecture on this point.Using Alf and Graf’s (2002) maximum like-lihood correction, the R2 shrinks from .30 to.25. Steel and Kammeyer-Mueller also didan exploratory estimate based on 32 sub-dimensions, acknowledging that a larger

database is needed to give a more accurateestimate. Corrected, this R2 shrinks from.40 to .28. Accounting for about 25 timesmore variance, the PAQ is a better wayto operationalize job complexity than theDOT data dimension. More importantly, jobcomplexity does effectively predict validitycoefficients.

Finally, Schmidt and Oh (2010) notethat Steel and Kammeyer-Mueller’s (2009)results are questionable because Schmidtand colleagues have estimated that between82% and 87% of observed variabilityamong validity coefficients is attributableto measurement error. In conjunction withwhat was predicted by the PAQ, this meansthat between 107% and 115% of varianceis accounted for, which is impossible. How-ever, it is also impossible to give a genericpercentage of variance that is because ofmeasurement error. Exactly how much mea-surement error exists in a meta-analyticdatabase is primarily determined by samplesize (Murphy, 2000), which will vary fromcase to case but can be readily computed.In Steel and Kammeyer-Mueller’s analysis,sampling error comprises 55% of the vari-ance for GMA validity coefficients, whereasin the article relied on by Schmidt and Ohfor their estimates, the comparable estimateis 81% (Schmidt et al., 1993). Also, Schmidtet al. used a more homogenous sample,sorting by job families and then averagingresults, which will further reduce variancepredictable by differences between jobs. Inother words, they removed the exact vari-ance we are trying to predict. As Murphy(2000) concludes, ‘‘a frequent criticism ofvalidity generalization [meta-analysis] pro-cedures is that they are biased in favor ofthe hypothesis that validity is consistent’’(p. 199). Perhaps this viewpoint has somemerit. There is plenty of variability left forJCV to predict validity coefficients after tak-ing into account the measurement error.

Given that there is relevant variance toaccount for and that we have already suc-cessfully done so, the subsequent questionis how to proceed. Should we use theO*NET or the PAQ? Vancouver (2010)makes the sensible conclusion of perhaps

374 P. Steel et al.

neither as ‘‘additional constructs should beadded.’’ So far, our endeavors to predictvalidity coefficients have primarily focusedon those involving GMA, and it is not clearwhether similar successes would occur withvalidity coefficients based on other pre-dictors. Consequently, we should considerwhat we have achieved here as a good startbut not the finished product. One avenue toexplore is offered by McCloy et al. (2010)as well as Oswald and Hough (2010), whosuggest examining several O*NET factors,such as work context scales, as moderators.Also, Vancouver suggested we considertime on the job or experience as a mod-erator as well, although he was concernedthat it would make the project too onerous.We do not believe this to be a problem.As Steel, Huffcutt, and Kammeyer-Mueller(2006) conclude, we can add time-varyingcovariates later as the database expands tocapture this variance. Although modelingtime on the job would improve syntheticvalidity, it does not have to be immedi-ately pursued because it isn’t a necessity.For synthetic validity to be viable, it doesn’thave to account for all the variance amongvalidity coefficients, just a useful amount ofvariance and this is a threshold we havealready passed.

Are You Sure You CanDiscriminate Validity BetweenPerformance Dimensions?

Although it is not necessary for job com-ponent validation, an implicit assumptionof the job requirements matrix approach tosynthetic validation is that we can differen-tiate among performance dimensions. Wedo not see discriminant validity among per-formance dimensions as a ‘‘critical require-ment if synthetic validity is to have a soundbasis and be credible’’ (Schmidt & Oh,2010), but it is certainly a desirable qual-ity. The job requirements matrix approachis a construct-oriented approach to person-nel selection, using job analysis informationto link relevant predictors to specific jobcomponents. This not only allows for thedemonstration of validity evidence in jobs

that are too small for a traditional criterion-related validation study but also ensuresthat the selection procedure is job related,which should enhance legal defensibilityand user acceptance. Although Schmidt andOh warn us that ‘‘critics’’ will argue that thisprocess is ‘‘more akin to Voodoo than to sci-ence,’’ we believe this is a highly scientificapproach to personnel selection (cf. Schnei-der, Hough, & Dunnette, 1996) regardlessof whether an alternative weighting of thepredictors provides a nearly equal validitycoefficient.

At this time, there is too little researchto draw definitive conclusions about thediscriminant validity of synthetic validityequations. The degree to which a validitycoefficient for a job-specific equation differsfrom the validity coefficient generated byanother job’s equation depends on the typeof predictors available and the similarity ofthe jobs. If jobs are similar enough that thesame set of predictors can be used to predictperformance for all of them, we shouldnot be surprised that the weighting schemeused for one job does not provide verydifferent results from the weighting schemeused for another job. It has been consistentlydemonstrated that different weights appliedto the same predictors often produce similarcomposite scores and thus similar validities(Koopman, 1988; Ree, Carretta, & Earles,1998). Research on the discriminant validityof synthetic validity equations that wecited in our focal article has been limitedto applying different weights to the samepredictors. On the other hand, a syntheticvalidation strategy allows for the selectionof different predictors for different jobs, soa true test of discriminant validity would becomparing equations that include differentpredictors. For example, if one equationincludes a mechanical comprehension testand another replaces it with a languageusage test, we are more likely to seediscriminant validity when applying theequation for one job to the other job.

A more fundamental concern is thatSchmidt and Oh (2010) contend that wecannot differentiate between job compo-nents in the first place. For example, King,

At sea with synthetic validity 375

Hunter, and Schmidt (1980) found that jobperformance dimensions accounted for anaverage of 8% of the variance in observedratings. This is the traditional viewpoint onthe topic, but it is not the final word as anew understanding of this issue is emerging.Previous studies, including many of thosecited by Schmidt and Oh, made the assump-tion that ‘‘if it ain’t trait it must be method’’(Lance, Baranik, Lau, & Scharlau, 2009,p. 339). This assumption is not necessarilytrue. Raters may differ in their performanceratings, but all may be still rating perfor-mance, just based on different standards ofperformance or based on different experi-ences with the ratee (e.g., a person actsdifferently with colleagues than around theboss). More recent research has demon-strated empirically that it is more appropri-ate to interpret rater and source effects assources of performance true-score-relatedvariance rather than error (Lance, Daw-son, Birkelbach, & Hoffman, 2010). Using amore sophisticated approach to model suchtrue-score idiosyncratic rater effects, Hoff-man, Lance, Bynum, and Gentry (2010)found results ‘‘consistent with three pri-mary models of performance ratings, eachspecifying the impact of empirically dis-tinguishable dimensional performance onratings’’ [italics added] (p. 144). Again, asBartram et al. (2010) point out, even moredifferentiation should occur with motivatedraters who are appropriately trained andwho are using a well-designed performanceappraisal measure.

Having established that it is indeed pos-sible to differentiate between performancedimensions, the next move is to deter-mine what dimensions should be used ina large-scale synthetic validity database.To this end, Bartram et al. (2010) closelyconsider the benefits of assessing specificperformance dimensions instead of justoverall performance. We wholeheartedlyagree with their assessment, which pro-vides additional reasons for using specificperformance dimensions beyond the factthat Job Requirements Matrix (JRM) method-ology needs them by its very nature. Fur-thermore, we find their two-stage alignment

solution very compatible with what weproposed (i.e., most performance ratingswould be collected at a specific leveland then classified into broader categories).Still, the basic problem remains: What willthese dimensions be? We are not sure.In truth, our scientific field is still deter-mining what aspects of performance weare capable of distinguishing and underwhat conditions. As Bartram et al. con-clude, ‘‘the primary limitation on progressis coming from the criterion space andnot from the predictor space.’’ Address-ing this issue, we suggested starting outwith broad performance categories andthen going to more specific categories asmore data are accumulated and letting thedata determine when sufficient specificityhas been reached. Bartram et al. also use-fully point out that there will be no singleanswer to this issue as several differenttaxonomies will likely be effective, includ-ing SHL’s Great Eight and O*NET’s BroadNine. Ideally, a finished synthetic validitysystem will allow users to choose amongseveral configurations of performance.Developing an appropriately flexible per-formance taxonomy is our first order ofbusiness.

Sure It’s Better, But Is ItBetter Enough?

Murphy (2010) provides one of the harderissues to address. It is not a question of canwe build a synthetic validity system, whichhe believes we can, but whether we should.Although synthetic validation provides sig-nificantly better results than most localvalidation studies (in terms of lower stan-dard errors around the validity coefficient,smaller sample sizes necessary within jobs,and greater flexibility for estimating validityin jobs not included in the original study),how does it compare to validity generaliza-tion? Validity generalization and syntheticvalidity are like brothers, related techniquesthat are based on similar assumptions (Jean-neret, 1992). Although we would expectsome sibling rivalry, it appears we mustdetermine which is to be our favorite son.

376 P. Steel et al.

The case for validity generalization isthat it provides a simpler and immedi-ately available way of devising a selectionsystem—simply use something like GMAand Conscientiousness on the basis of theirmean validity coefficients across primarystudies. This is not a bad idea as bothconstructs are typically good predictors ofperformance and they both are universallyvalid (i.e., credibility intervals do not crosszero) because jobs everywhere benefit fromhaving smart and hardworking employees.For many reasons, however, we do not seevalidity generalization to be superior to syn-thetic validation.

First, the fact that a test is universallyvalid just means its validity coefficient willalways be on one side of zero. To prop-erly design and evaluate a selection system,however, we need to know the size of thevalidity coefficients. It tells us how muchto weigh predictors relative to each other,and it is vital for utility analysis, to deter-mine how much we should invest in theentire selection process. We also need pre-cise validity coefficients to calculate thelikelihood of a candidate meeting a cutoff,both for minimum and maximal perfor-mance. The Productivity Measurement andEnhancement system, for example, buildsaround the insight that higher levels of per-formance are not always significantly morevaluable, that there are diminishing returns(Pritchard, Paquin, DeCuir, McCormick, &Bly, 2002).

Validity generalization avoids this prob-lem if the credibility intervals around esti-mates are small. Ideally, all the variabilitywould be accounted for, the mean wouldnot vary much across situations, and therewould be no need for synthetic validity.Unfortunately, this is not the case, noteven for GMA validity coefficients. Con-sider Hunter and Hunter (1984), who founda 90% credibility interval ranging from .29to .61. For Hulsheger, Maier, and Stumpp(2007), the credibility interval was from .30to .77, and for Bertua, Anderson, and Sal-gado (2005), it was from .09 to .87. ForConscientiousness, its 90% credibility inter-val can be calculated as .06–.42, with even

larger ranges for its facets (Dudley, Orvis,Lebiecki, & Cortina, 2006). Again, this rep-resents plenty of practical variability thatshould be predictable using synthetic vali-dation.1

Furthermore, once you go beyond GMAand Conscientiousness, predictors tend notto generalize in dependable ways. Murphy(2009a) gets around this problem by assert-ing that this is all we really need becausemost other predictors are derivable fromthese two (i.e., positive manifold). Later,Murphy (2009b) admits that this is moretrue for the cognitive than the personalitydomain, highlighting that there are still sev-eral relevant predictors that cannot be jus-tified through validity generalization alone(Tett & Christiansen, 2007). Still, Murphyis quite right that the remaining predictorsalmost certainly will not be as importantas GMA and Conscientiousness, and thatwe have encountered a law of diminishingreturns. However, we interpret this situa-tion as emphasizing the need for syntheticvalidity. The selection field is beginning tostagnate as there are no more easy pick-ings left and it is not cost-effective for anysingle organization to pursue what remains.We contend that there are dozens of small,incremental steps left to be taken, such asevaluating personality at a facet level (Roth-stein & Goffin, 2006; Schneider et al., 1996;Tett & Christiansen, 2007), each accountingfor just a few percentage points of incremen-tal predictive variance but collectively mak-ing a substantive difference. In most cases,there will not be a large-enough sample sizeto recognize they exist, but as we mentionedin our focal article, this is exactly the type

1. Mathematically, the techniques used to estab-lish validity generalization systematically under-estimate variance, as Steel and Kammeyer-Mueller(2008) review. For example, we should adjust meta-analytic variance estimates by K/(K − 1) but almostnever do; it corrects for a downward bias caused byusing the observed instead of true mean during cal-culations. For example, a meta-analysis based on20 studies (K) will underestimate the true amountof variance by a little over 5%. The larger the meta-analyses, the less this will be a concern, but it is fairto point out that these credibility intervals, broad asthey are, are still to some degree underestimates.

At sea with synthetic validity 377

of advantage that can be accrued throughtechnical economies by using synthetic val-idation. Sticking entirely to validity general-ization, we are left with a field that can onlyappreciate the largest and most obvious ofpredictors: GMA and Conscientiousness.

Compounding the matter, so far wehave only dealt with the credibility inter-vals for predicting performance, the pri-mary but not the only criterion of syntheticvalidity. The synthetic validation method-ology can be used to match individualswith jobs (person–job fit), occupations orvocations (person–vocational fit), teamsor groups (person–group fit), and orga-nizations (person–organization fit; Kristof-Brown, Zimmerman, & Johnson, 2006). Toassess all these different types of fit, we areinterested not only in performance of indi-viduals but how satisfied they are in thejob. Like performance, job satisfaction islargely a function of individual differences,the work context, and their interaction, allof which can be assessed during a syn-thetic validation project. Simply by knowingdetails about individuals and their poten-tial work situation, we should be able topredict whether an occupation or an orga-nization is a good match in terms of bothperformance and job satisfaction. Unfor-tunately, predictors of job satisfaction donot generalize quite as readily as those forperformance (Griffeth, Hom, & Gaertner,2000; Judge, Heller, & Mount, 2002). Theylargely depend on a fit between particularindividuals and particular environments.

Even if all these previous issues did notoccur, validity generalization has one moremajor limitation, one that Murphy (2009a)himself brings up: It is not acceptable. Ashe concludes, ‘‘tests that are manifestly rel-evant to the jobs in question are likely tobe more acceptable to organizations andother stakeholders (e.g., unions and polit-ical groups) than tests that are not linkedin any clear way (other than their empiri-cal validity) to the job’’ (p. 460). Becausesynthetic validity customizes the results tospecific situations based upon a job analysisthat heavily involves the stakeholders, it canlargely avoid this weakness. We specifically

discussed this advantage in the focal articlewhen we evaluated synthetic validity’s legalstanding.

Validity generalization does have a sig-nificant advantage over synthetic valida-tion in that validity generalization can beapplied to small jobs in the absence ofan appropriate synthetic validity database.This is a practical advantage that we hopeto remedy through the development of thelarge-scale synthetic validity database thatwe have advocated. Conceptually speaking,however, we believe that synthetic validityis superior to validity generalization. It isnot just that synthetic validity is the olderbrother, preceding validity generalization inits development. Rather synthetic validityhas all the advantages of validity general-ization plus the ability to further predictvalidity coefficients through effective mod-erator searches. If we want the science ofI-O psychology to move beyond ‘‘generalcognitive ability and Conscientiousness pre-dict overall job performance,’’ we need tolook at more specific relationships.

The Personal Touch

Synthetic validity represents a mechanical,or more aptly, an algorithmic system. Welet the numbers do the talking. Although weare happy when such math optimizes ourInternet searches, Hollweg (2010) arguesthat it might work against synthetic validityby causing it to be used as a reductionistrobotic formula. He worries that a widelyavailable synthetic validity database couldproduce an overreliance on a mechanizedsystem that underemphasizes the impor-tance of expert judgment within our field.We share this concern to some extent, butwe believe that marketing, education, andpossibly limiting accessibility to qualifiedprofessionals are tools that can do a greatdeal to alleviate this concern. The purposeof the synthetic validity system would beto facilitate expert judgment and, if usedappropriately, would allow practitioners toexplore multiple possibilities when creatinga selection system for a client. Certainly,overall validity is not the only criterion

378 P. Steel et al.

to consider, but output from this systemcould help practitioners weigh other con-cerns such as costs and accessibility alongwith more readily accessible and accuratevalidity estimates.

Hollweg (2010) raised other concernsabout how the database could hurt practi-tioners. Although these concerns are valid,we do not envision a future in which rep-utable I-O consultants have been replacedby a synthetic validity database. Take theadvent of validity generalization as anexample of a huge technical advance. Wehave known for many years now that gen-eral cognitive ability is a good predictorof overall job performance for just aboutany job, but the employment test publish-ing industry is much stronger now than itwas before validity generalization becamemainstream. What we are proposing isreally just the next logical step, estab-lishing validity generalization for specificpredictors and specific criteria. It is impor-tant to remember that the public databasewould not link specific tests to job com-ponents but specific predictor constructs.People still need measures of those con-structs, and in this age of high-tech cheating,new measures of the same constructs needto be continuously developed. Publisherswill continue to try to develop new pre-dictor constructs delivered in new formatsin an attempt to do something new andprovide value beyond what is available inthe database. This may even spur on thenext great technological advance. Consul-tants may wind up marketing against thedatabase by demonstrating through theirown research that their instruments aresuperior to the average level of validity thatwould be documented for those constructsby the database.

Also advocating a personal touch areMcCloy et al. (2010), who suggest an alter-native path for developing the syntheticvalidity system. Instead of empirically deriv-ing correlations, they can be estimatedbased on groups of subject matter experts(SMEs). McCloy et al. review several stepsthat would help maximize the accuracy ofthese estimates, but many of us still have

doubts about the underlying mechanism,which is opinion. For example, Harvey(2010) characterized ‘‘subjective ‘expertjudgment’ based methods anchoring thelower end of the scale’’ that represents‘‘a continuum of convincingness.’’ Still,McCloy et al.’s technique has given usinspiration about how to improve our own.There is a ‘‘wisdom of crowds,’’ and webelieve that SME-derived coefficients arelikely to reflect reality with some degreeof accuracy. We just need to determineto what degree. Consequently, once wehave developed a basic but empty syn-thetic validity matrix, establishing only whatthe predictors and job components are, wewould engage SIOP experts to start filling itout. As we collect empirical data, we cancompare these SME-derived coefficients tothose observed. If SME-derived coefficientsprove to be at least as accurate as whatwould be derived from a typical local val-idation study, we could start using the syn-thetic validity system immediately. Even-tually, this preliminary SME-derived matrixwill evolve into an empirically derived oneas we continue to gather data and swapout coefficients. Meanwhile, we still have aworking system.

What Else It Can Do

Other commentaries focused on what elsewe could do with synthetic validity, sug-gesting that we were not taking into con-sideration its full potential. They are right.A lot of features can be incorporated withina synthetic validity platform. In particular,Oswald and Hough (2010) make a seriesof suggestions. We could explore how tobest administer a validation study to maxi-mize correlations. For example, who shouldmake the performance ratings and how?How do differences between job analy-sis protocols affect results? How can wereduce adverse impact, and what are thetradeoffs with validity? We also are notlimited to assessing just individual perfor-mance but can consider how the individualwould impact team or group performance

At sea with synthetic validity 379

as well as expand into retention, recruit-ment, and job satisfaction criteria. Differentcriteria also open the door to a different setof predictors (Kristof-Brown et al., 2006).As we mentioned and as Harvey (2010)reiterates, synthetic validity is not just forselection. Instead of selecting people forjobs, you can reverse the process and selectjobs for people, that is, career guidance andoccupational exploration.

Harvey (2010) also makes some otherpoint-on suggestions, although he precedesit with a critique of mean-based JCV. Again,we view mean-based JCV as the ‘‘PlanB’’ solution to synthetic validity after earlyattempts at validity coefficient-based JCVwere unsuccessful. As we reviewed in ourfocal article, recent research has revitalizedvalidity coefficient-based JCV, and we areback to ‘‘Plan A,’’ which addresses manyof Harvey’s concerns. The remaining issuesthat Harvey brings up are all related to theidea that the linear approach that dom-inates selection today is far too simple.We agree, especially because these limi-tations apply most strongly to traditionalcriterion-related studies. Vancouver (2010)brings up the exact same point that ‘‘wehave tended to assume linear relationshipsbetween IDs and performance.’’ As Van-couver suggests, these interaction effectsand nonlinear relationships can be explic-itly incorporated during a large syntheticvalidity project (i.e., they are just additionalcolumns of predictors). Synthetic validity isa way for us to start collecting and codi-fying these insights as well as immediatelydistributing the benefits across the selectionfield once they are confirmed. We are goingto be making improvements and fine tuningthe process for a long while yet. Withoutsynthetic validity, as we stress, we rely onlocal validity studies, which typically willnot have the power to notice anything butthe most pedestrian of findings.

Been There, Done That

Russell (2010) asserts that synthetic validityis much more prevalent than we suggest,as he already possesses or has seen many

synthetic validity studies.2 We are awarethat synthetic validity research is out there,but these are generally proprietary techni-cal reports and there has been very littlepublished research on synthetic validity inits nearly 60-year history. Thus, we standby our contention that synthetic validityis underutilized. In addition, although wehave not viewed all the studies to whichRussell refers and might readily be provenwrong, what we have seen indicates theyare not quite synthetic validity. Rather, theyare transportability databases, which area collection of jobs all with previouslydeveloped selection systems. This type ofdatabase is used to find a job reasonablysimilar to a target job and then ‘‘trans-port’’ the validity of the selection systemfor the similar job to the target job. Trans-portability, like validity generalization, isrelated to synthetic validity, as all three arestrategies for generalizing validity evidence(Society for Industrial and OrganizationalPsychology, 2003). Because transportabil-ity is not quite synthetic validity, it has itsown challenges as Gibson and Caplinger(2007) review. For example, for transporta-bility all the problems associated with smallsample sizes which occurred when origi-nally developing any selection system arealso transported to each new application.Also, determining if two jobs are adequatelyor relevantly similar is largely a subjectiveexercise.

On the other hand, the fact that thereare large transportability databases under-scores how close we are to building a ‘‘fullyloaded’’ synthetic validity system, with allthe features we described included. Thesetransportability databases can be convertedinto synthetic validity as soon as the selec-tion systems they are composed of are

2. Of note, we also agree with Russell (2010) thatthere should be a better analogy than synthetictime, especially because synthetic falsely connotesan ersatz reproduction here. Unfortunately, this isthe analogy Lawshe (1952) stuck us with, as per‘‘The concept is similar to that involved when thetime study engineer establishes standard times fornew operations’’ (p. 32). This is why Lawshe coinedthe term ‘‘synthetic validity.’’

380 P. Steel et al.

developed with the appropriate infrastruc-ture. As we stress, we need to measureperformance with a standard taxonomy (i.e.,for JRM) and for each job to be assessed withan appropriate job/context/organizationanalysis measure (i.e., for JCV). This is agood example of our suggestion that ‘‘thecreation of synthetic validity can ‘piggy-back’ on what we already profitably do: cre-ate selection systems.’’ One caveat though:As Russell (2010) indicates, these privatedatabases seem to have focused on a fewjob families and do not seem generalizableacross the entire employment landscape.Still, once properly outfitted, they are agood start and will eventually expand.

The Next Destination

We are very grateful to the commentaryauthors for showing interest in our focalarticle and taking the time to share theirencouragement, ideas, and concerns. Thiscommentary process has sharpened ourown thinking, and we are sure that it hashelped clarify issues for the reader. Ourprimary goal was to publicize synthetic val-idation techniques as preferred methods fordocumenting validity evidence for selectionprocedures, and we feel confident that wehave achieved that goal. Another goal wasto start the wheels turning on the devel-opment of a large-scale, widely availablesynthetic validity database to advance thescience and practice of I-O psychology.We expected that the commentary pro-cess would provide valuable feedback toimprove our ideas and identify potentialobstacles, and we were not disappointed.Although we do not expect smooth sailingahead, we do not see any icebergs.

We still contend that SIOP is the bestinstitution to initiate the synthetic valid-ity database we have in mind. Similarly,McCloy et al. (2010) believe that ‘‘SIOPwould be an ideal host,’’ although Hollweg(2010) suggests that we could do better,arguing a hypothetical institution modeledmore closely after the Cochrane Collabo-ration would be preferable. In all, SIOPappears to be the best existing institution for

this project. Consequently, we recommendthat SIOP initiate a steering committee tobring this concept to fruition. Members ofthe committee should come from all rele-vant stakeholders, including representativesfrom academia, business, and government.

Originally, we did not expect that wecould obtain the necessary support fromgovernment officials but that has changed.Right now, America is experiencing analmost unprecedented extended periodof greatly elevated unemployment (Peck,2010). The costs of this deep recessionare severe and multifaceted, both for thenation and its citizens. Synthetic validityhas the potential to help. The job mar-ket, like all markets, becomes faster andmore efficient if you can improve the speedand quality of relevant information. Addinga synthetic validity component to O*NETseems like a logical step. O*NET alreadyhas the taxonomies of predictor constructs,work dimensions, and potential moderators,as well as job analysis data for hundreds ofoccupations in the U.S. economy. With thedata-collection experience gained throughthe development of O*NET, it may not beas difficult as we imagine to collect pre-dictor and criterion data on selected con-structs from incumbents in a wide variety ofoccupations.

In addition to the economic benefits ofimproved incremental validity in selection,synthetic validity can help select jobs forpeople instead of just people for jobs. Usedin conjunction with O*NET, which alreadyhas a vocational guidance component(Converse, Oswald, Gillespie, Field, &Bizot, 2004), a synthetic validity system thatincludes the opportunity to self-administertests measuring various predictor constructscan more accurately identify individualstrengths, point people toward occupationsin demand, estimate the likelihood that theywould be good at them, and help determinewhether they would enjoy them. Peoplewould also quickly learn what training theyneed to initiate a career change. Sucha sophisticated system that helps peopleself-select into ideal careers would havea noticeable effect on national GDP, with

At sea with synthetic validity 381

best estimates in the several hundred billiondollar range (Hunter & Schmidt, 1982; Steel& Kammeyer-Mueller, 2006).

Consequently, in return for governmen-tal sponsorship, we provide a very healthyreturn on investment as we help to max-imize national productivity. Pursuing thispath allows us to gather the selectiondata systematically and according to a uni-formly high quality. This would be harderto ensure if we took the commercial part-nership route, a path that Bartram et al.(2010) pointed out is difficult to realize.Also, government sponsorship would allowa longer and more sustained research effort.Although we can quickly get a workingsynthetic validity system off the ground,especially if we allow some SME estimationas per McCloy et al. (2010), we would wantto continue refining and optimizing this sys-tem (Oswald & Hough, 2010). Our abilityto improve will continue as we explore,for example, not only the best combina-tion of constructs to predict performancebut how each construct should be bestoperationalized.

Historically, governments have fundedexactly this sort of venture to tackle unem-ployment (Primoff & Fine, 1988). The SocialScience Research Council, the NationalResearch Council, and the OccupationalResearch Program of the U. S. EmploymentService all initiated similar efforts. In par-ticular, the U.S. Civil Service Commission,now the Office of Personnel Management,was involved in a massive project involv-ing over 100,000 employees back in 1919.Notably, this is not only a larger numberof participants than needed to build syn-thetic validity today, it was done during atime when it was considerably harder to getthem.

The selection industry will also benefitfrom this initiative in two important ways.First, they need to equip their transporta-bility databases with a few refinements toenable JCV and JRM. Having the appropri-ate methodology, performance taxonomies,and job analysis for synthetic validity is anecessary outcome of this project. If testdevelopers and selection houses choose

to develop their own proprietary syntheticvalidity systems, using the tools from thedatabase will make it much easier for them.Second, if users want to use this public syn-thetic validity system for personnel selec-tion, they would need to show that the teststhey are using are essentially equivalent tothe tests from the synthetic validity system.Such a practice will avoid validity degrada-tion, as would inevitably happen for any sin-gle widely used selection battery. Also, sucha straightforward validation process opensup what is known as the ‘‘long tail.’’ Insteadof marketing selection exclusively to largecompanies, all companies become viablecustomers. This is especially true for smallcompanies, which have long been expectedto benefit from the use of synthetic validity.

We have established we can build syn-thetic validity, even demonstrating it withworking prototypes. Many agree that thefield would hugely benefit from a nationalsynthetic validity database. Gary Becker,who won the Nobel Prize for economics,believes, ‘‘in a modern economy, humancapital [the work people do] is by far themost important form of capital in creat-ing wealth and growth’’ (Wheelan, 2002,p. 106). Human capital and making themost of it is what synthetic validity is allabout. This brings up one final question:What are we waiting for?

ReferencesAlf, E. F., Jr., & Graf, R. G. (2002). A new maximum

likelihood estimator of the population squaredmultiple correlation. Journal of Educational andBehavioral Statistics, 27, 223–235.

Bartram, D., Warr, P., & Brown, A. (2010). Let’s focuson two-stage alignment not just on overall perfor-mance. Industrial and Organizational Psychology:Perspectives on Science and Practice, 3, 335–339.

Bertua, C., Anderson, N., & Salgado, J. F. (2005). Thepredictive validity of cognitive ability tests: AUK meta-analysis. Journal of Occupational andOrganizational Psychology, 78, 387–409.

Converse, P. D., Oswald, F. L., Gillespie, M. A., Field,K. A., & Bizot, E. B. (2004). Matching individualsto occupations using abilities and the O*NET.Personnel Psychology, 57, 451–487.

Dudley, N., Orvis, K., Lebiecki, J., & Cortina, J. (2006).A meta-analytic investigation of conscientiousnessin the prediction of job performance: Examiningthe intercorrelations and the incremental validityof narrow traits. Journal of Applied Psychology, 91,40–57.

382 P. Steel et al.

Gibson, W., & Caplinger, J. (2007). Transportation ofvalidation results. In S. M. McPhail (Ed.), Alter-native validation strategies: Developing new andleveraging existing validity evidence (pp. 29–81).San Francisco: John Wiley & Sons.

Griffeth, R. W., Hom, P. W., & Gaertner, S. (2000).A meta-analysis of antecedents and correlates ofemployee turnover: Update, moderator tests, andresearch implications for the next millennium.Journal of Management, 26, 463–488.

Harvey, R. J. (2010). Motor oil or snake oil: syntheticvalidity is a tool not a panacea. Industrialand Organizational Psychology: Perspectives onScience and Practice, 3, 351–355.

Hoffman, B., Lance, C. E., Bynum, D., & Gentry, W.(2010). Rater source effects are alive and well afterall. Personnel Psychology, 63, 119–151.

Hollweg, L. (2010). Synthetic oil is better for whom?Industrial and Organizational Psychology: Perspec-tives on Science and Practice, 3, 363–365.

Hunter, J. E. (1983). Validity generalization for 12,000jobs: An application of synthetic validity andvalidity generalization to the General Aptitude TestBattery (GATB). Washington, DC: U.S. Departmentof Labor Employment Service.

Hunter, J. E., & Schmidt, F. L. (1982). Fitting people tojobs: The impact of personnel selection on nationalproductivity. In M. D. Dunnette, & E. A. Fleishman(Eds.), Human performance and productivity: Vol.1. Human capability assessment (pp. 223–272).Beverly Hills, CA: Sage.

Hulsheger, U. R., Maier, G. W., & Stumpp, T. (2007).Validity of general mental ability for the predictionof job performance and training success inGermany: A meta-analysis. International Journalof Selection and Assessment, 15, 3–18.

Hunter, J. E., & Hunter, R. F. (1984). Validity andutility of alternative predictors of job performance.Psychological Bulletin, 96, 72–98.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting for error and bias in researchfindings across studies (2nd ed.). Thousand Oaks,CA: Sage.

Jeanneret, P. R. (1992). Applications of job compo-nent/synthetic validity to construct validity. HumanPerformance, 5, 81–96.

Johnson, J. W., Steel, P., Scherbaum, C. A., Hoffman, C.C., Jeanneret, P. R., & Foster, J. (2010). Validationis like motor oil: Synthetic is better. Industrialand Organizational Psychology: Perspectives onScience and Practice, 3, 305–328.

Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job satisfaction: Ameta-analysis. Journal of Applied Psychology, 87,530–541. doi: 10.1037//0021-9010.87.3.530.

King, L. M., Hunter, J. E., & Schmidt, F. L.(1980). Halo in a multidimensional forced choiceperformance and evaluation scale. Journal ofApplied Psychology, 65, 502–516.

Koopman, R. (1988). On the sensitivity of a compositeto its weights. Psychometrika, 53, 547–552.

Kristof-Brown, A. L., Zimmerman, R. D., & John-son, E. C. (2006). Perceived applicant fit: Distin-guishing between recruiters’ perceptions of person-job and person-organization fit. Personnel Psychol-ogy, 53, 643–671.

Lance, C. E., Baranik, L. E., Lau, A. R., & Schar-lau, E. A. (2009). If it ain’t trait it must be method:

(Mis)application of the multitrait-multimethodmethodology in organizational research. InC. E. Lance, & R. J. Vandenberg (Eds.), Statisticaland methodological myths and urban legends:Received doctrine, verity, and fable in organiza-tional and social research (pp. 339–362). NewYork: Routledge.

Lance, C. E., Dawson, B., Birkelbach, D., & Hoff-man, B. J. (2010). Method effects, measurementerror, and substantive conclusions. OrganizationalResearch Methods. doi: 10.1177/1094428109352528.

Lawshe, C. H. (1952). Employee selection. PersonnelPsychology, 6, 31–34.

Lubinski, D., & Humphreys, L. (1996). Seeing the forestfrom the trees: When predicting the behavior orstatus of groups, correlate means. Psychology,Public Policy, and Law, 2, 363–376.

McCloy, R. A., Putka, D. J., & Gibby, R. E. (2010).Developing an online synthetic validation tool.Industrial and Organizational Psychology: Perspec-tives on Science and Practice, 3, 366–370.

Murphy, K. R. (2000). Impact of assessments of validitygeneralization and situational specificity on thescience and practice of personnel selection.International Journal of Selection and Assessment,8, 194–206.

Murphy, K. R. (2009a). Content validation is usefulfor many things, but validity isn’t one of them.Perspectives on Science and Practice, 2, 453–464.

Murphy, K. R. (2009b). Is content-related validationuseful for in validating selection tests? Perspectiveson Science and Practice, 2, 517–526.

Murphy, K. R. (2010). Synthetic validity: A great ideawhose time never came. Industrial and Organi-zational Psychology: Perspectives on Science andPractice, 3, 356–359.

Oswald, F. L., & Hough, L. M. (2010). How syn-thetic validation contributes to personnel selection.Industrial and Organizational Psychology: Perspec-tives on Science and Practice, 3, 329–334.

Peck, D. (2010, March). How a jobless era willtransform America. Atlantic Monthly, 305, 42–56.

Primoff, E., & Fine, S. (1988). A history of job analysis.In S. Gael (Ed.), The job analysis handbook forbusiness, industry and government (pp. 14–29).Toronto: John Wiley & Sons.

Pritchard, R. D., Paquin, A. R., DeCuir, A. D.,McCormick, M. J., & Bly, P. R. (2002). Measur-ing and improving organizational productivity: Anoverview of ProMes, the Productivity Measure-ment and Enhancement system. In R. D. Pritchard,H. Holling, F. Lammers, & B. D. Clark (Eds.),Improving organizational performance with theProductivity Measurement and Enhancement sys-tem: An international collaboration (pp. 3–50).Huntington, NY: Nova Science.

Ree, M., Carretta, T., & Earles, J. (1998). In top-downdecisions, weighting variables does not matter: Aconsequence of Wilks’ theorem. OrganizationalResearch Methods, 1, 407–420.

Rothstein, M. G., & Goffin, R. D. (2006). The use ofpersonality measures in personnel selection: Whatdoes current research support? Human ResourceManagement Review, 16, 155–180.

Russell, C. J. (2010). Better at what? Industrial andOrganizational Psychology: Perspectives on Sci-ence and Practice, 3, 340–343.

At sea with synthetic validity 383

Schmidt, F. L., Law, K., Hunter, J. E., Rothstein,J. R., Pearlman, K., & McDaniel, M. A. (1993).Refinements in validity generalization procedures:Implications for the situational specificity hypothe-sis. Journal of Applied Psychology, 78, 3–13.

Schmidt, F. L., & Oh, I.-S. (2010). Can synthetic valid-ity methods achieve discriminant validity? Indus-trial and Organizational Psychology: Perspectiveson Science and Practice, 3, 344–350.

Schneider, R. J., Hough, L. M., & Dunnette, M. D.(1996). Broadsided by broad traits: How to sinkscience in five dimensions or less. Journal ofOrganizational Behavior, 17, 639–655.

Society for Industrial and Organizational Psychology(2003). Principles for the validation and use ofpersonnel selection procedures. Bowling Green,OH: SIOP.

Steel, P., & Kammeyer-Mueller, J. (2002). Comparingmeta-analytic moderator search techniques underrealistic conditions. Journal of Applied Psychology,87, 96–111.

Steel, P., & Kammeyer-Mueller, J. (2008). Bayesianvariance estimation for meta-analysis: Quantifying

our uncertainty. Organizational Research Methods,11, 54–78.

Steel, P., & Kammeyer-Mueller, J. (2009). Using ameta-analytic perspective to enhance job com-ponent validation. Personnel Psychology, 62,533–552.

Steel, P., Huffcutt, A., & Kammeyer-Muller, J. (2006).From the work one knows the worker: A systematicreview of the challenges, solutions, and steps tocreating synthetic validity. International Journal ofSelection and Assessment, 14, 16–36.

Tett, R. P., & Christiansen, N. D. (2007). Personalitytests at the crossroads: A response to Morge-son, Campion, Dipboye, Hollenbeck, Murphy,and Schmitt (2007). Personnel Psychology, 60,967–993.

Vancouver, J. B. (2010). Improving I-O sciencethrough synthetic validity. Industrial and Organi-zational Psychology: Perspectives on Science andPractice, 3, 360–362.

Wheelan, C. (2002). Naked economics: Undressingthe dismal science. New York. W. W. Norton.