International validity generalization of GMA and cognitive abilities: A European Community...

33
PERSONNEL PSYCHOLOGY 2003, S6.573-605 INTERNATIONALVALIDITY GENERALIZATION OF GMA AND COGNITIVE ABILITIES: A EUROPEAN COM M U N ITY META-AN ALY S IS JESUS F. SALGADO Department of Social Psychology University of Santiago de Compostela, Spain NEIL ANDERSON Department of Psychology University of Amsterdam, The Netherlands SILVIA MOSCOSO Department of Social Psychology University of Santiago de Compostela, Spain CRISTINA BERTUA Department of Psychology Goldsmiths College, University of London, Great Britain FILIP DE FRUYT Department of Psychology Ghent University, Belgium This article reports on a series of meta-analyses into the criterion valid- ity of general menatl ability (GMA) and specific cognitive ability tests for predicting job performance ratings and training success in the Eu- ropean Community (EC). Meta-analyseswere computed on a large EC database examining the operational validity of GMA and other specific cognitive abilities, includingverbal, numerical, spatial-mechanical, per- ceptual and memory (N ranged from 946 to 16,065) across 10 EC member countries. The results showed that tests of GMA and spe- cific cognitive ability are very good predictors of job performance and training success across the EC. Evidence for the international validity generalization of GMA and specific cognitive abilities was presented. The results for the EC meta-analyses showed a larger operational va- lidity than previous meta-analyses in the U.S. for predicting job per- formance. For training success, the European and American results are very similar. Implications for the international generalizability of GMA test validities, practical use of cognitive ability tests for personnel selection, and directions for future research are discussed. The authors thank three anonymous reviewers for their comments on a previous ver- sion of this article. The preparation of this manuscript was supported by the Ministerio de Ciencia y Tecnologia grant no. BS02001-3070 to Jescs F. Salgado, and by a review grant from the British Army Research Establishment to Neil Anderson. Correspondence and requests for reprints should be addressed to Jes6s E Salgado, De- partamento de Psicologia Social, Universidad de Santiago de Compostela, 15782 Santiago de Compostela, Spain; [email protected]. COPYRIGHT 0 2003 PERSONNELPSYCHOLOGY,INC. 573

Transcript of International validity generalization of GMA and cognitive abilities: A European Community...

PERSONNEL PSYCHOLOGY 2003, S6.573-605

INTERNATIONAL VALIDITY GENERALIZATION OF GMA AND COGNITIVE ABILITIES: A EUROPEAN COM M U N ITY M ETA-AN ALY S IS

JESUS F. SALGADO Department of Social Psychology

University of Santiago de Compostela, Spain

NEIL ANDERSON Department of Psychology

University of Amsterdam, The Netherlands

SILVIA MOSCOSO Department of Social Psychology

University of Santiago de Compostela, Spain

CRISTINA BERTUA Department of Psychology

Goldsmiths College, University of London, Great Britain

FILIP DE FRUYT Department of Psychology Ghent University, Belgium

This article reports on a series of meta-analyses into the criterion valid- ity of general menatl ability (GMA) and specific cognitive ability tests for predicting job performance ratings and training success in the Eu- ropean Community (EC). Meta-analyses were computed on a large EC database examining the operational validity of GMA and other specific cognitive abilities, including verbal, numerical, spatial-mechanical, per- ceptual and memory (N ranged from 946 to 16,065) across 10 EC member countries. The results showed that tests of GMA and spe- cific cognitive ability are very good predictors of job performance and training success across the EC. Evidence for the international validity generalization of GMA and specific cognitive abilities was presented. The results for the EC meta-analyses showed a larger operational va- lidity than previous meta-analyses in the U.S. for predicting job per- formance. For training success, the European and American results are very similar. Implications for the international generalizability of GMA test validities, practical use of cognitive ability tests for personnel selection, and directions for future research are discussed.

The authors thank three anonymous reviewers for their comments on a previous ver- sion of this article. The preparation of this manuscript was supported by the Ministerio de Ciencia y Tecnologia grant no. BS02001-3070 to Jescs F. Salgado, and by a review grant from the British Army Research Establishment to Neil Anderson.

Correspondence and requests for reprints should be addressed to Jes6s E Salgado, De- partamento de Psicologia Social, Universidad de Santiago de Compostela, 15782 Santiago de Compostela, Spain; [email protected].

COPYRIGHT 0 2003 PERSONNELPSYCHOLOGY, INC.

573

574 PERSONNEL PSYCHOLOGY

Tests of general mental ability (GMA) and specific cognitive abil- ity are popular elements in organizational selection procedures in both the U.S.A. and the European Community (EC). Over the years, many meta-analyses of the predictive or criterion-related validity of GMA tests have been performed in the U.S. (e.g., Hunter & Hirsh, 1987; Hunter & Hunter, 1984; Levine, Spector, Menon, Narayanan & Cannon-Bowers, 1996; Pearlman, Schmidt & Hunter, 1980; Schmitt, Gooding, Noe & Kirsch, 1984). These meta-analyses have found that GMA and cogni- tive ability tests are valid predictors of job performance and training suc- cess and, furthermore, that validity generalizes across samples, criteria, and occupations (see Schmidt, 2002, for an extensive review of all major studies). For example, in a seminal article, Hunter and Hunter (1984) reanalyzed the results of previous studies consisting of 515 primary stud- ies, and also reported the main results of a series of meta-analyses car- ried out with the USES database of the General Aptitude Test Battery (GATB) validity studies. The average corrected predictive validity of the ability composite was .53 for predicting job performance. Many other meta-analyses were published in the U.S. in the last 25 years and the conclusions were convergent, claiming that GMA measures are the best single predictors of job performance and training success (see Schmidt, 2002, for a review of these findings; Schmidt & Hunter, 1998). In this sense, Murphy (2002) has suggested that GMA measures are likely to be correlated with performance in virtually any job.

The criterion validity of specific cognitive abilities (i.e., verbal, nu- merical, spatial, and memory) was a further issue examined in previous meta-analyses of the validity studies conducted in civil settings and in large-sample validity studies conducted with military samples in the U.S. For example, meta-analyses by Schmidt, Hunter, and their colleagues (Hunter & Hunter, 1984; Pearlman, Schmidt, & Hunter, 1980; Schmidt, Hunter & Caplan, 1981; Schmidt, Hunter, & Pearlman, 1981; Schmidt, Hunter, Pearlman, & Shane, 1981) demonstrated that specific abilities showed small validity over and beyond GMA measures or cognitive com- posites. Ree and his colleagues have examined the validity of the specific abilities for predicting job performance and training success using large military samples (Carretta & Ree, 1997,2000,2001; Olea & Ree, 1994; Ree & Earles, 1991; Ree, Earles, & Teachout, 1994). They found that the specific abilities showed very small contributions over GMA for pre- dicting these two criteria. However, some authors have suggested that the use of specific abilities can have important implications for personnel selection. For example, Kehoe (2002) has suggested that if the selection decision was based on specific abilities rather than on GMA measures, the specific abilities can reduce the group differences and may result in more positive applicant reactions. Murphy (2002) has suggested that

JESUS F. SALGADO ET AL. 575

different applicants may be selected depending on the specific abilities emphasized. Consequently, these are compelling reasons for examining the magnitude of the specific abilities in the EC.

Even in the light of their important findings, these meta-analyses still contain several characteristics that limit the generalized applicability of their findings. For example, (a) they only included American primary studies, and, therefore the international validity generalization was not examined, (b) they did examine the predictive validity of specific cogni- tive abilities (e.g., verbal, numerical, spatial), but, due to the differential characteristics of the EC, the magnitude of the relations between the specific cognitive abilities and performance might be different from the relations found in the U.S. These two characteristics are particularly rel- evant, as the international validity generalization of these findings can- not be taken for granted based on these data sets. Therefore, it is particu- larly important to establish the criterion-related validity of GMA tests in countries other than the U.S. in order to demonstrate international valid- ity generalization regardless of cultural, economic, social, and legislative differences between the U.S. and EC countries (Herriot & Anderson, 1997). As Newel1 and Tansley (2001) point out, societal effects, institu- tional networks, national regulatory environment, economic factors, and national culture may all have impact upon the use and criterion-related validity of personnel selection methods.

GMA and Cognitive Ability Testing in the EC

The international validity generalization of GMA and cognitive abil- ity tests for predicting job performance and training success is an area of particular relevance for EC countries. According to different surveys (see Salgado & Anderson, 2002, for a review), cognitive ability tests are, in general, more prevalently used for selection purposes in the EC than in the U.S. The EC is a large multinational and multicultural commu- nity consisting at present of 15 member countries, including Austria, Bel- gium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Lux- embourg, Portugal, Spain, Sweden, United Kingdom, and The Nether- lands, but with a large number of other countries waiting to become new members in the near future. The current population is over 400 mil- lion people. An illustration of the diversity within the EC is the coex- istence of a number of individualistic cultures (e.g., United Kingdom, Germany, The Netherlands) and collectivist cultures (e.g., France, Italy, Spain, Portugal, Greece; see Hofstede, 1980, Spector et al., 2001). In some countries (e.g., United Kingdom, The Netherlands) the Protestant religion is dominant, but in others, the Catholic religion (e.g., Austria, Italy, Spain, Portugal) or other religions (e.g., the Orthodox religion in

576 PERSONNEL PSYCHOLOGY

Greece) are more prevalent. Consequently, such a mix of typographies could produce a differential effect on the validity of GMA and other cognitive abilities (e.g., verbal, numerical, spatial), due to the fact that different systems of values, approaches to power, supervision styles, and interpersonal communication may be operating simultaneously within the EC.

According to Levy-Leboyer (1994), there are important differences between the U.S. and the Europe Community in personnel selection practices. For example, the American selection psychologists

have been strongly influenced by lawsuits requiring proofs of the validity decisions that were made based on test results, so that work psychologists in the U.S. have concentrated on legitimizing their selection procedures by showing nondiscrimination. European work psychologists were (and still are) much more concerned than U.S. psychologists with the need to protect applicants by defending their privacy; they see selection as a participative process, where both the applicant and the organization must have their interest represented. In Europe, public action is focused on limiting an organization’s right to access an applicant’s privacy. (Levy- Leboyer, 1994, pp. 183-184)

Other researchers have also shown that the meaning of selection is dif- ferent in the EC and the U.S. For example, Roe (1989) pointed out that the European view is not limited to the classical psychometric model of prediction (i.e., the validity of the procedure) and the whole selec- tion process is characterized as a negotiation between employers and applicants in which they must make a decision considering the costs and the benefits, taking into account the societal context of selection, such as legal regulations and the labor market conditions (e.g., unemploy- ment rates; degree of diversity in the countries). Indeed, the 15 current members of the EC are not completely homogenous in their practices of personnel selection, but, as Viswesvaran and Ones (2002) also pointed out, the EC countries individually considered are relatively homogenous in comparison with the U.S. as they have less within-countq diversity. Other relevant differences between the U.S. and the EC are, for exam- ple, unemployment rates larger in the EC than in the U.S., large vari- ability in the educational systems, and differences in the demographics. For example, the differences in demographic variables within the coun- tries can also have effects on the answers to the test items, reflecting such sociodemographic differences.

W h y EC klidities May Be Lower

There are also differences between the U.S. and the EC with re- gard to the meaning of job performance. Companies in the EC may be

JESUS F. SALGADO ET AL. 577

more interested in the contextual dimensions of job performance than in task performance dimensions (Borman, Penner, Allen, & Motowidlo, 2001). This may be attributed to the effects of the collectivistic values of some national cultures (e.g., Spain, France, Belgium, Portugal) and the European view of the personnel selection process as one of negotia- tion in which the personal characteristics are particularly relevant. Mur- phy and Shiarella (1997) have suggested that organizations may differ in their definitions of what constitutes good job performance and the dif- ferences in the definitions have implications for the validity of selection tests. For example, as Borman et al. (2001) have speculated, GMA mea- sures would show lower validity for predicting contextual performance than for task performance. Therefore, if the European conceptualiza- tion of what is good job performance included more contextual connota- tions relative to the conceptualization in the U.S., then one would expect that GMA and cognitive ability measures show lower predictive validity in Europe than in the U.S.

With regard to the measurement of training performance, there are also remarkable differences. In the EC, the majority of the studies as- sess training success using supervisoIy ratings rather than using objec- tive tests, as it is typical in the U.S. studies. The objective measures used in training studies are often assessments of job knowledge but the rat- ings are assessments of job performance. This is a relevant difference between the EC and the US., because it is well known that there are larger differences in reliability in these two types of measures. Further- more, the American studies may be interested in the maximal perfor- mance of the individuals as it is assessed with objective measures and the European studies may be more interested in the typical performance, as it is assessed with ratings (Ackerman, 1994; Ackerman & Humphreys, 1991). Consequently, one would expect that the validity for training suc- cess would be lower in the EC than in the U.S.

Moreover, all these contextual and conceptual differences between the U.S. and the European view of the personnel selection process al- low to speculate that there may exist other factors producing differences in the magnitude of the relations between GMA-work performance in the EC. For example, there could exist differences in the reliability of performance measures and these differences would affect the criterion- related valdity. In addition, the differences in the unemployment rates could produce differences in the selection ratio that would produce dif- ferences in the restriction of range in the GMA measures and, subse- quently, the range restriction underestimates the criterion validity dif- ferentially. Therefore, the magnitude of GMA validity could be lower in the EC than in the U.S. Furthermore, as the European approach is more concerned with the societal effects than with the predictive validity

578 PERSONNEL PSYCHOLOGY

of the personnel selection processes, there will exist less validity studies in the EC than in the U.S.

why EC Validities May Be Higher

Currently, over 20 different languages are spoken throughout the cur- rent member countries but the number of languages may be doubled, or even tripled, with the imminent joining of new member states. This phenomenon together with the powerful cultural differences within the EC countries relative to the smaller differences within the U.S. regions could make working in the EC more complex and increase the demand for GMA. Furthermore, as a whole, the majority of jobs included in these studies are medium or high complexity jobs. Hunter and Hunter (1984) demonstrated that job complexity is a powerful moderator of GMA va- lidity and found that GMA validity is remarkably larger for high com- plexity jobs relative to low complexity ones. Consequently, it is possible that these two factors (i.e., many languages and more complex jobs) op- erate to produce a larger operational validity in the EC than in the U.S.

Although it is true that international validity generalization may ex- ist, as Schmidt, Ones, and Hunter (1992) and Salgado (1997) first con- jectured, and that the findings may be transferable to other countries, there is no empirical support for this claim because there are only Amer- ican meta-analyses carried out exclusively with American primary stud- ies. Therefore, a meta-analytic review of the validity studies conducted in the EC is necessary. Such a study would make three important con- tributions: (a) to report the results for the EC countries, (b) to compare the EC findings with the American ones, and (c) to explore if there is in- ternational validity generalization for GMA and specific cognitive mea- sures. In view of this, this paper reports the results of new meta-analyses conducted on a very large database of validity studies of GMA and spe- cific cognitive abilities conducted in EC member countries. Provided the considerations stated previously, our study has two main objectives: (a) to present the results of a European quantitative synthesis on the relationship between GMA and job performance and training success; and (b) to examine the magnitude of the predictive validity of specific abilities for predicting job performance and training success.

Method

Search for Studies

Based on the goals of this research, a database was developed con- taining European validity studies. These studies had to meet three crite

JESUS F. SALGADO ET AL. 579

ria in order to be included within the data base: (a) to report on valid- ity coefficients between job performance and training success measures and cognitive ability measures; (b) the samples should be applicants, em- ployees, or trainees, but not students; (c) only civilian jobs would be con- sidered, excluding military occupations, in order to compare the present meta-analytic results with previous American findings.

The search was made using seven strategies. First, a computer search was conducted in the PsycLit database. Second, an article-by-article manual search was carried out in a large number of European journals. This list includes: Journal of Occupational and Organizational Psychology, Journal of the National Institute of Industrial Psychology, Human Factors, Occupational Psychology, British Journal of Psycholoa, The Occupational Psychologists, International Journal of Selection and Assessment, Revista de Psicolog’a General y Aplicada, Psicotecnia, Revista de Psicologia del Trabajo y las Organizaciones, Bolletino di Psicologia Applicata, Travail Hu- main, Bulletin du CERE: BINOE Review de Psychologie Appliquee, Journal of Organizational Behaviol; Zeitschrip fur Argewande Psychologie, Praktis- che Psychologie, Industrielle Psychotechnik, Ergonomics, Applied Psychol- ogy: An International Review, Annak de Hnstitut de Orientacio, L ’Annee Psychologique, Irish Journal of Psychology, Reports of the Industrial Fatigue Research Board, Reports of the Industrial Health Research Board, Scandi- navian Journal of Psychology, Revue Belge de Psychologie et Pedagogie, Psy- chologica Belgica, Nederlands Tijdschrij? voor Psychologie, International Journal of Aviation Psychology, Accidents Analysis and Prevention, and Transport Journal. We also reviewed the following American journals: Journal of Applied Psychology, Personnel Psychology, Journal of Person- nel Research, Industrial Psychology, and Psychological Bulletin. Third, we reviewed the proceedings of the Congress of the International Associa- tion of Psychotechnique (later called International Association ofApplied Psychology) from 1920 to 1998. Fourth, test manuals were checked look- ing for criterion validity studies. Fifth, several test publishing companies were contacted and asked for reports in which validity studies were re- ported. Sixth, the reference sections of articles obtained were checked to identify more papers. Finally, several well known European researchers were contacted in order to obtain additional papers and supplementary information related to published papers (e.g., complete matrix of corre- lation coefficients).

The literature search resulted in 102 papers, including 234 indepen- dent samples. These papers consisted of 80 (163 samples) published and 22 (71 samples) unpublished studies. Combining the published and un- published studies conducted across the EC, the final database contained 142 independent samples with training success as the criterion and 120 samples with job performance as criterion. Training success and job per-

580 PERSONNEL PSYCHOLOGY

formance were simultaneously used within 14 samples. The following countries contributed studies to the database: Belgium (3), France (18), Germany (9), Ireland (l) , The Netherlands (13), Portugal (l), Scandi- navian countries (2), Spain (18), and the United Kingdom (37). Conse- quently, not all the current members of the EC are represented in the database. Specifically, we have not found studies from Austria, Italy, Greece and Luxembourg. The reasons are different for the various countries. For example, cognitive ability tests are rarely or not at all used in Italy or Greece (see Ryan, MacFarland, Baron, & Page, 1999) and studies are not published in Finland (Levy-Leboyer, 1994; Petteri Ni- itamo, personal communication, July 19,1999). The total population of the five countries represents less than 20% of the population of the EC.

Procedure

Two researchers served as judges, working independently to code every study. Each researcher was given a list and a definition of the abilities. The abilities used in this research were general mental abil- ity (GMA), verbal ability, numerical ability, spatial-mechanical ability, perceptual ability, and memory. Examples of the tests included in each ability category are given in Table 1. If the two researchers agreed on the ability, the test was coded in that ability category. Disagreements were solved through discussion until the researchers agreed on the clas- sification of the ability. The studies conducted in the U.K. (36% of the total) were used to examine the reliability of the coding process. Agree- ment between researchers (prior to the consensus) was .94, .90,.93, .87, .85, and 1 for general mental ability, verbal ability, numerical ability, spatial-mechanical ability, perceptual ability, and memory, respectively. All single tests were then assigned to a single ability. Only one overall validity coefficient was used per sample for each ability condition. In situations in which we found more than one coefficient for each ability (e.g., several tests were used to assess the same ability), they were consid- ered as conceptual replications and linear composites with unit weights for the components were formed. Linear composites provide more con- struct valid estimates than the use of the average correlation. Nunnally (1978, pp. 166-168) and Hunter and Schmidt (1990, pp. 457-463) pro- vide formulas for the correlation of variables with composites.

After the studies were collated and their characteristics recorded, the following step was to apply the psychometric meta-analytic formulas of Hunter and Schmidt (1990). Psychometric meta-analysis estimates how much of the observed variance of findings across studies is due to artifac- tual errors. The artifacts considered here were sampling error, criterion and predictor reliability, and range restriction in ability scores. Some of

JESUS E SALGADO ET AL. 581

TABLE 1 Examples of Tests Conttibuting Nlidity Coefficients in Each Ability Category

Ability Test

GMA Definition: A neneral cauacitv of an individual consciouslv to

Verbal

I . - adjust his thinking to new requirements; it is general mental adaptability to new problems and conditions of life.

Examples: DAT (US.), GATB (US.), T2 (U.K.), ASDIC (U.K.), IST-70 (D), WIT (D), GVK (U.K.), PMA (US.), AMPE (E), Matrix (U.K.), Factor G (U.K.), Otis (US.), Alpha Test (U.S.), Intelligence Logique (F) CERP-14 (F), Domino (U.K.), NIIP33 (U.K.)

Definition: Ability to understand meaning of words and using them effectively; ability to comprehend language, to understand relationship between words, and to understand meanings of whole sentences and paragraphs. Examples: IST-70 WA (D), DAT-VR (US.), GATB-Vocabulary (US.), Reading Comprehension Test (UK) , Mill-Hill Vocabulary Test (U.K.)

Definition: Ability to understand numerical relations and using numbers effectively; ability to comprehend quantitative material.

Examples: IST-70 RA (D), DAT-NR (U.S.), ATS Arithmetic (U.K.), Mathematical Test (U.K.), Arithmetic Test (U.K.), Arithmetic Reasoning (U.K.), Vernon’s Arithmetic Test (U.K.)

Spatial-Mechanical Definition: Ability to understand and managing objects in in a two-dimensional and three-dimensional space; ability to comprehend relations between objects.

(U.S.), Dial Reading (U.S.), WAlS Blocks (U.S.),, DAT-SR (U.S.) Squares (U.K.), Spatial Intelligence (U.K.), Figures Rotation (E), Judgment of Distance (U.S.), Visualization Test (U.K.).

Mechanical tests examples: F! Rennes’ Test of Mechanics (F), Bennett’s Test (US.), DAT-MR (U.S.), lTB-MT4(U.K.), (U.K.). Test of Lievers (F) APU Mechanical Comprehension

and to give fast and accurate answers;

Test (F), Dichotic Attention Test (Israel), Stroop F s t (U.S.), Cancelation Test (F),Caras (E), Selective Attention Test (F), Diffused Attention Test (F), Forster-Germain Test (E), Percepto- taquimetro (E), Instrument Reading (U.S.) D2 (D)

Definition: Ability to remember information presented in different perceptual modalities (e.g. visual, auditory);

IST-70 ME (D), Visual Memory Test (U.K.), Memory of Words (F), Topographique Memory Test (F), Associative Memory Test (F), Recognition Memory Test (U.S.),

Numerical

Spatial Tests Examples: IST-70 FA (D), Coordinate Reading

Perceptual Definition: Ability for perceiving stimuli very quick

Perceptual tests examples: DAT-CSA (U.S.), Toulouse-Pieron

Memory

Note: D = Germany; E = Spain; F = France; U.S. = United States of America; U.K. = United Kingdom.

582 PERSONNEL PSYCHOLOGY

TABLE 2 Empirical Distributions of Test-Retest Reliability of General Mental Ability and Specific Cognitive Abilities in European Studies of Criterion Validity

Ability Num. coef. r7, SD,, Max Min Interval time (weeks) GMA (+Battery) 31 .83 .09 .65 .95 24 Verbal 15 .83 .15 .97 S O 25 Numerical 22 .85 .10 .95 .64 24 Spatial 13 .77 .10 .88 .60 24

Mechanical 14 .77 .07 .88 .64 24 Spatial+Mechanical 27 .77 .08 .88 .60 24 Memory 1 .64 - - 78

Perceptual 5 .67 .18 .90 .52 37

-

these artifacts reduce the correlation below its operational value (e.g., criterion reliability and range restriction) and all of them produce arti- factual variability in the observed validity. In our analyses, we corrected the observed mean validity for criterion reliability and range restriction in the predictor. In order to correct the empirical validity for these last three artifacts, the most common strategy is to develop specific distribu- tions for each of them.

Predictor Reliability

According to Schmidt and Hunter (1999), the ideal reliability coef- ficient is the coefficient of equivalence and stability, which is estimated as the correlation between two parallel forms given on different occa- sions. However, because this type of coefficient was not reported in any of the studies, test-retest reliability was used. This affords an accurate alternative because it is only consequential in the estimation of the vari- ability in p, if at all. Consequently, the reliability of predictors was es- timated: (a) from the coefficients reported in the studies included in the meta-analysis, and (b) using the coefficients published in the various test manuals. For each ability, an empirical distribution of test-retest reliability was developed (see Table 2). The average reliabilities were .83, .83, .85, .77, .67, and .64 for GMA, verbal ability, numerical ability, spatial-mechanical ability, perceptual ability, and memory, respectively. As the interest is in the operational validity of the cognitive abilities and not in its theoretical value, predictor reliability estimates are only used to eliminate artifactual variability in SD,.

Criterion Reliability

In the present research, only validity studies in which job perfor- mance ratings and training success were used as criterion were consid-

JESUS F. SALGADO ET AL. 583

TABLE 3 Empirical Distributions of Criteria Reliability in European Studies of Criterion Validity for General Mental Ability and Cognitive Abilities

Criterion K N T Y Y SDr,,

Job performance ratings 19 1,900 .52 .19 Training success 15 2,897 .56 .09

ered. This choice was based on two arguments: (a) American meta- analyses have only used these two criteria and one of our objectives is to provide a comparison with those meta-analyses; (b) other criteria such as tenure, turnover, promotions, or output have been used in only a very small number of studies and not with all cognitive abilities, therefore, we would not be able to carry out meta-analyses for all abilities. Be- cause not all of the studies provided information regarding the criterion reliability, empirical distributions were developed to estimate criterion reliability for both these criteria. These were developed using those re- liability coefficients reported in a number of studies, and are presented in Table 3.

For job performance ratings, the interrater reliability coefficient is the one of interest (Hunter, 1986; Schmidt & Hunter, 1996) because this type of reliability corrects for most of the unsystematic error in su- pervisor ratings (Hunter & Hirsh, 1987). We found 19 studies reporting interrater coefficients of job performance ratings, producing an average coefficient weighted by sample size of S 2 . This coefficient is slightly lower than the coefficient used by Hunter and Hunter (1984), but, inter- estingly, it is exactly the same as the coefficient found by Viswesvaran, Ones, and Schmidt (1996) in their meta-analysis of the interrater reliabil- ity of job performance ratings. All the coefficients included in Viswes- varan et al.’s database are computed using American studies. Conse- quently, it is interesting to note that we arrived at the same value using an independent data set. In addition, Rothstein (1990) showed that the asymptotic value for job performance ratings was .52.

In the case of training success, ratings by trainers or supervisors were used in the majority of primary studies included in our meta-analyses. A small number of primary studies used padfail qualifications given by the supervisor or the ratings given by trainers or external examiners in both theoretical and practical examinations. Therefore, all training success scores used in the present meta-analysis can be considered as a form of ratings. In fact, when a theoretical and a practical examination were given to the trainees, the correlation between examination ratings ranged from .46 to .73. Fifteen studies reporting training success reliabil- ity were found in total, leading to a weighted sample-size average relia-

584 PERSONNEL PSYCHOLOGY

TABLE 4 Empirical Distributions of Range Resh‘ction Ratio for

General Mental Abilify and Cognitive Abilities. (U = S D ~ ~ . ~ ~ ~ ~ / S D ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ )

w e K N UlIJ SDw

Ability-JPR studies 20 1,795 .62 .25 Ability-training studies 12 2,717 .67 .14

Note: JPR = Job performance ratings

bility of .56. This figure is remarkably lower than the value of .81 used by Hunter and Hunter (1984). However, we considered our estimate to be representative of training success reliability in the EC. The difference with Hunter and Hunter’s estimate is due to three reasons: (a) it was empirically developed using civilian studies but Hunter and Hunter’s es- timate was an assumed one; (b) Hunter’s estimate was found for training success measures in the U.S. Navy (Hunter, 1986) and these measures typically consisted of objective examinations (Vineberg & Joyner, 1989), and (c) all the coefficients found in this research were calculated using ratings given by trainers or supervisors but objective examinations may be more frequent in the U.S.

Range Restriction Distribution

The distribution for range restriction was based on two strategies: (a) some range restriction coefficients were obtained from the studies that reported both restricted and unrestricted standard deviation data, and (b) another group of range restriction coefficients was obtained using the reported selection ratio. In order to use the reported selection ratio, we used the formula derived by Schmidt, Hunter, and Urry (1976). This double strategy produced a large number of range restriction estimates. We coded these estimates according to the criterion used in the study and used only one coefficient for each sample. In cases where we had several estimates for the same sample, the average was calculated and this figure was used for computing the empirical distributions. The de- scriptive statistics of the two distributions appear in Table 4. We found a range restriction ratio (u) of .67 for training success and .62 for job per- formance. These figures are very similar to those found by Hunter and Hunter (1984) in the U.S. studies, because they found .60 for training and .67 for job performance ratings. In order to be sure that the two em- pirical distributions are representative of the restriction in the abilities, we also developed specific empirical distributions for each ability, and the results were very similar (although calculated on a smaller number of estimates).

JESUS E SALGADO ET AL. 585

Results

Validiiy of GMA and Cognitive Abilities to Predict Job Pegormance Ratings

The meta-analytic results of the relationship between general mental ability and other cognitive abilities and job performance appear in Ta- ble 5. From left to right, the first two columns show the number of valid- ity coefficients and the total sample size. The next four columns list the average observed validity weighted by the sample size, the observed vari- ance weighted by the sample size, the sampling error variance, and the observed standard deviation weighted by the sample size. The following two columns show the operational validity (observed validity corrected by criterion reliability and predictor range restriction) and the standard deviation of the operational validity. The next two columns indicate the percentage of variance accounted for by artifactual errors (i.e., predictor and criterion reliability, predictor range restriction, and sampling error) and the 90% credibility value (i.e., minimum value expectable for the 90% of the coefficients included in the distribution). The operational validity was estimated as the observed validity corrected for criterion unreliability and range restriction.

A large number of studies were found for each ability examined, ex- cept memory, for which only 14 studies were found. Despite this, the number of studies is still an acceptable number of coefficients upon which to carry out a meta-analysis. With regard to this point, Ashworth, Osburn, Callender, and Boyle (1992) have developed a method for as- sessing the vulnerability of validity generalization results to unrepre- sented or missing studies. Ashworth et al. (1992) suggested calculating the effects on validity when 10% of studies are missing and their validity is zero. Therefore, we calculated additional estimates to represent what the validity would be if we were unable to locate 10% of the studies car- ried out and if these studies showed 0 validity. The last three columns report these new estimates: lowest p value, standard deviation, and the 90% credibility value.

The results for general mental ability were exceptionally good. The operational validity was .62, which means that GMA is an excellent pre- dictor of job performance ratings. To our knowledge, as reported at present in the scientific literature, no meta-analyses for any single per- sonnel selection procedure shows an operational validity of this magni- tude. This finding confirms Schmidt and Hunter’s (1998) suggestion that GMA is the best predictor of job performance ratings. The 90% credi- bility value found in the EC indicates that GMA has generalized validity across samples, occupations, measures, and European countries for the job performance criterion. With regard to the percentage of explained

TAB

LE 5

Met

a-An

alys

is of

Gen

eral

Men

tal A

bilit

y and

oth

er C

ogni

tive A

bilit

y Te

sts for

Pred

ictin

g Job

Pev

orm

ance

Rat

ings

v

m

sour

ce

KN

r

S2

r S

2e

S

Dr

P SO

P %V

E so

%cv

L

RH

O

NS

D

LC

V

ga

GMA

93

9,554

.29

.0

31

.008

.1

76

.62

.19

75

.37

.56

.25

.24

Ver

bal

44

4,78

1 .1

6 .0

26

.009

.1

61

.35

.24

53

.04

.32

.25

.OO

Num

erica

l 48

5,

241

.24

.014

.OO

8 .1

18

.52

.OO

100

.52

.47

.15

.28

Spat

ial-M

echa

nica

l 40

3,75

0 .2

3 ,0

38

.010

.1

95

.51

.29

52

.13

.46

.31

.06

Perc

eptu

al

38

3,79

8 .2

4 .0

28

.009

.167

S

2 .1

9 73

.2

a .4

7 .2

3 .1

7 M

emor

y 14

94

6 .2

6 .0

17

.013

.130

.5

6 .00

100

.56

.5 1

.16

.30

~~

~~

~~

Not

es:

K =

num

ber

of s

tudi

es, N

=

tot

al s

ampl

e; r

= o

bser

ved

mea

n va

lidity

; S

2r

= o

bser

ved

vari

ance

; S2

e =

sam

plin

g er

ror

vari

ance

; SD

r =

obs

erve

d sta

ndar

d de

viat

ion;

p =

ope

ratio

nal v

alid

ity;

SD

p =

stan

dard

dev

iatio

n of

the

ope

ratio

nal v

alid

ity; %

VE =

var

ianc

e ac

coun

ted

for

by a

rtifa

ctua

l err

ors;

9O%

CV

= c

redi

bilit

y va

lue;

LR

HO

= l

owes

t (h

ypot

hetic

al)

p; N

SD

= h

ypot

hetic

al s

tand

ard

devi

atio

n of

LR

HO

; L

CV

= lo

wes

t (hy

poth

etic

al) c

redi

bilit

y val

ue.

JESOS E SALGADO ET AL. 587

variance, we found that 75% of the observed variance is accounted for by the four artifactual errors considered here. This estimate is exactly the figure of 75% initially suggested by Schmidt and Hunter (1977), as the percentage of variance explained by artifactual errors in the case of cognitive tests. Moreover, this magnitude of explained variance suggests that the remaining variability may be accounted for by other artifactual errors not considered here (e.g., range restriction in the criterion, im- perfect construct measurement of X and Y, typographical and clerical errors, and so on; see Hunter and Schmidt, [1990] for a list of these pos- sible errors).

The second meta-analysis was carried out for verbal ability. Here, the results showed an operational validity of .35, which is remarkably lower than the operational validity of GMA. The 90% credibility value was .04 and the explained variance was 53%. These last two results in- dicate that validity may be moderated by other variables. The results may partially reflect the effects of an outlier with a large sample size. In effect, we found a study for managers reporting a value of -.02 with a sample size of 437individuals. When this coefficient is removed, the new operational validity is .39, the 90% credibility value (CV) is .11, and the explained variance is 62%. An anonymous reviewer suggested another explanation for the lower validity of verbal ability. Tests of verbal abil- ity are in different languages across the countries included in the meta- analysis. Different language-based test complexity across countries may be responsible for these results. Other tests, such as numerical, spatial, or perceptual, are probably less prone to this effect due to the better equivalence across different languages than between tests of verbal abil- ity in different languages. Because GMA is defined as the higher order factor to all specific abilities, the “equivalence” issue would affect GMA tests less than purely verbal tests.

For numerical ability, the operational validity was .52 and all the observed variance was explained by artifactual errors. These results indicate that numerical ability is a good predictor of job performance and that it has generalized validity across samples and countries. In this case, the predicted variance exceeds the observed variance due to second- order sampling error (Hunter & Schmidt, 1990) and it was rounded to 100% of the variance accounted for. The second-order sampling error is due to the fact that the available studies are not completely randomly distributed and, consequently, by chance, the average validity and the standard deviation may differ slightly by some amount from the average effect size (and SD) for the entire research domain.

Spatial-mechanical ability was the next ability analyzed. We found an operational validity of .51. This figure shows that spatial-mechanical aptitude predicts job performance very well, to a similar extent as numer-

588 PERSONNEL PSYCHOLOGY

ical ability. The 90% credibility value indicates that spatial-mechanical ability has generalizedvalidity but its magnitude is small. This result, to- gether with that of the percentage of explained variance (52%), suggests that some moderator may affect the operational validity.

Perceptual ability showed an operational validity of similar magni- tude to those of numerical ability and spatial-mechanical ability. The operational validity was .52 and the 90% credibility value was .28. These two results indicate that perceptual ability is a good predictor of job per- formance and has generalized validity across samples, occupations, and countries. The magnitude of the explained variance was 73%, similar to the one found for GMA, thus suggesting that the remaining variability may be due to other artifactual errors not considered here.

The results for memory suggest that this ability is the second best cog- nitive predictor of job performance. The operational validity found was .56 and all the variability was accounted for by artifactual errors. There- fore, memory demonstrated generalized validity across samples, occupa- tions, and countries. However, as in the case of numerical ability, there is evidence of second-order sampling error, suggesting that the number of studies located may not completely be representative of the total pop- ulation. In addition, it should be considered that the total sample size of this meta-analysis is the lowest, being only one-tenth of the size of the GMA sample. Nevertheless, Verive and McDaniel(l996) have also shown that memory tests are valid predictors of job performance.

As a whole, the results of the meta-analyses carried out for GMA and cognitive abilities for predicting job performance ratings showed that GMA is the best predictor and has generalized validity across samples, occupations, and EC countries. The other cognitive abilities also predict job performance, showing large operational validities, although lower than that of GMA. Finally, the magnitude of the operational validity for the European studies is remarkably larger than the magnitude reported in the American meta-analyses (e.g., Hartigan & Wigdor, 1989; Hunter & Hunter, 1984, Levine et al., 1996; Schmitt et al., 1984).

Despite the relevance of the latter findings, the overriding point is that our results suggest that GMA and specific cognitive abilities show international validity generalization for predicting job performance. In effect, our results indicate that GMA and cognitive abilities have gen- eralized validity across the EC countries, and the minimum magnitude of validity generalization is considerable because the 90% CV for GMA was .37. This value is larger than the operational validity found for the majority of personnel selection procedures. For example, according to Schmidt and Hunter (1998), in American studies, work-sample tests and GMA tests are the predictors with the largest operational validity fig- ures, showing an average validity of .54 and .51 respectively (see Salgado,

JESUS F. SALGADO ET AL. 589

1999; Salgado, Ones, & Viswesvaran, 2001; and Schmidt & Hunter, 1998, for reviews). This finding, together with the validity generalization findings found in the American meta-analyses, strongly supports the hy- pothesis that there is international validity generalization for GMA in Europe and America. Our results also show that, for specific cognitive abilities, validity generalization is also evident across EC countries, al- though to a lower magnitude.

Validity of GMA and Cognitive Abilities for Predicting Training Success

The results of the meta-analyses of validity studies of GMA and cog- nitive abilities for predicting training success as criterion appear in Ta- ble 6. As in the case of the job performance criterion, the meta-analyses for training success were also carried out with a large number of studies and large total sample sizes, in all cases.

GMA showed an operational validity of .54 and was the best cognitive predictor for training success. This figure is the same as the one found by Hunter and Hunter (1984) for the USES database. Therefore, GMA showed that it is an excellent predictor of training success. The 90% CV was .29 and, consequently, GMA had generalized validity across samples, occupations, and EC countries.

Verbal ability showed an operational validity of .44 and a 90% CV of .20. These two findings indicate that verbal ability is a good predictor of training success and that its validity generalizes in the EC. The per- centage of explained variance is similar to that found for GMA. Numer- ical ability showed an operational validity of .48 and was the second best cognitive predictor of training success. According to its 90% CV, which was the second largest, numerical ability also demonstrated generalized validity across studies. spatial-mechanical ability showed an operational validity equal to .40 and they also showed validity generalization because its 90% CV was .16. The four artifacts considered here accounted for 43% to 47% of the observed variance of these four abilities.

The last two cognitive abilities, perceptual and memory, showed the lowest operational validities. The validity for perceptual ability was .25, which is a value very similar to the figure found by Hunter and Hunter (1984) and lower than the value found by Hartigan and Wig- dor (1989) and Levine et al. (1996). Furthermore, the 90% CV for perceptual ability was 0, indicating that the validity for training success does not generalize in the EC. Memory showed an operational valid- ity of .34 and its 90% CV was only .08. This last value indicates that the validity for memory tests generalizes to some extent. The explainedvari- ance for perceptual ability and memory was very similar, .34 and .35, respectively.

wl

v) 0

TABL

E 6

Met

a-A

nabs

is of

Gen

eral

Men

tal A

bilit

y and

Cog

nitiv

e Abi

lity

Tests

for P

redi

cting

Tra

inin

g Suc

cess

Sour

ce

K

N

r L?

r S

ae

SD

r p

SDp

%V

E

9O%

CV

L

RH

O

NS

D

LCV

GM

A

97

16,0

65

.28

.020

.005

.141

.54

.1

9 47

.2

9 .4

9 .24

.1

9 V

erba

l 58

11

,123

.2

3 .0

17

.005

.1

30

.44

.19

45

.20

.40

.22

.12

Num

eric

al

58

10,8

60

.25

.017

.005

.130

.4

8 .1

8 46

.2

4 .4

4 .22

.1

5 Sp

atia

l-Mec

hani

cal

84

15,8

34

.20

.017

.0

15

.130

.4

0 .1

9 43

.1

6 .36

.2

1 .0

9 Pe

rcep

tual

17

3,

935

.13

.015

.0

04

.122

.2

5 .20

34

.00

.23

.20

-.03

Mem

ory

15

3,32

3 .1

7 .0

17

.004

.1

30

.34

.20

35

.08

.31

.21

.03

Not

es:

K =

num

ber

of s

tudi

es, N

= t

otal

sam

ple;

r =

obs

erve

d m

ean

valid

ity;

S2

r =

obs

erve

d va

rianc

e; S

2e

= s

ampl

ing

erro

r va

rianc

e;

SDr

= o

bser

ved

stan

dard

dev

iatio

n; p

= o

pera

tiona

l val

idity

; SD

p =

sta

ndar

d de

viat

ion

of t

he o

pera

tiona

l val

idity

; %

VE

= v

aria

nce

acco

unte

d fo

r by

arti

fact

ual

erro

rs;

9O%

CV

= c

redi

bilit

y va

lue;

LR

HO

= l

owes

t (h

ypot

hetic

al)

p;

NS

D =

hyp

othe

tical

sta

ndar

d de

viat

ion

of L

RH

O;

LC

V =

low

est (

hypo

thet

ical

) cre

dibi

lity v

alue

.

JESUS F. SALGADO ET AL. 591

In summary, the results of the EC meta-analysis of GMA and cog- nitive abilities as predictors of training success showed that GMA is the best cognitive predictor for this criterion. Its operational validity is sim- ilar to the value found in the American meta-analyses. The variance accounted for by artifactual errors is also similar in American and Euro- pean studies. The results for perceptual ability are also similar in both continents. However, no comparisons are possible for verbal, numerical, spatial-mechanical, and memory.

Similarly to job performance, GMA has shown validity generaliza- tion for training success across the EC countries and across the Ameri- can studies. Therefore, the hypothesis of the international validity gen- eralization is also well supported for this criterion. The international validity generalization is also supported for some specific abilities (i.e., verbal, numerical, and spatial-mechanical), although it is not supported for perceptual ability. In the case of memory, international validity gen- eralization also exists but is of a very small magnitude.

Discussion

The major finding of the present meta-analytic investigation is that the criterion-related validity of GMA tests in the U.S. and the EC is no- tably similar for different criteria, and that the reported validity coef- ficients generalize internationally. This has important implications for the use of such tests in employee selection on both continents and also for the ongoing debate between validity generalization and situational specificity alluded to earlier in this paper. As evidenced here, validity generalized internationally for GMA and cognitive ability tests for pre- dicting job performance and training success, disconfirming the situa- tional specificity hypothesis.

VaZidity Genera Zization of GMA and Specific Cognitive Abilities

The EC data set collected here contained validity studies carried out in 10 member countries of the European Commu-nity, all conducted us- ing civilian samples. Regarding the operational validitfr of GMA, we found it to be .62 for predicting job performance and .54 for predict- ing training success. These values are some of the largest reported in the meta-analytic literature of personnel selection. These results lead us to conclude that, in comparison with other personnel selection pro- cedures, GMA measures are the best predictors of job performance and training success in the EC, regardless of the country in question (Bar- rick, Mount, & Judge, 2001; Salgado, 1997,2002; Salgado & Anderson, 2002; Salgado, Viswesvaran, & Ones, 2001). The validity of the spe-

592 PERSONNEL PSYCHOLOGY

cific cognitive abilities was also large for both criteria. In the case of job performance, validity ranged from .35 (.39 if we excluded the possible outlier) for verbal ability to .56 for memory. For training success, va- lidity ranged from .25 for perceptual ability to .48 for numerical ability. Consequently, the second conclusion is that specific cognitive abilities are good predictors for job performance and training success, although the magnitude of their validities is lower than that of GMA.

Regarding the validity generalization of GMA and specific cogni- tive abilities, our findings fully support the hypothesis that these mea- sures have generalized validity across settings. For job performance, the 90% CV of GMA was .37, a very large value, and the 90% CVs for numerical ability and memory tests were even larger, .52 and .56, re- spectively. However, these last two estimates may have been affected by second-order sampling error, in which case they could vary slightly. Similarly, for training success, all cognitive measures also showed valid- ity generalization with the exception of perceptual ability. The 90% CVs ranged from .29 for GMA to -08 for memory. The 90% CV for percep- tual ability was 0 and, consequently, measures of perceptual ability did not show validity generalization.

These results, together with the American results, suggest our third and more relevant and robust conclusion: There is international validity generalization for GMA and specific cognitive abilities for predicting job performance and training success. In other words, the criterion validity of cognitive measures generalizes across different conceptualizations of job performance and training, differences in unemployment rates, dif- ferences in tests used, and differences in cultural values, demographics, and languages. The only exception found within the current investiga- tion was for perceptual ability tests as predictors of training success.

Similarities and Differences Between American and EC Findings

It is also of interest to comment on the similarities and differences between the American and the EC findings. There is a very close agree- ment for training success, as the magnitude of the operational validity is almost similar in both continents. There is also a similarity between our findings and the results of a recent meta-analysis by Kuncel, Hez- lett, and Ones (2001) on the validity of cognitive abilities for predicting educational criteria. Specifically, the results in both meta-analyses are similar when we compare our results for training success with Kuncel et al.3 results for the faculty rating criterion. For example, we found an operational validity of .44 and .48 for verbal and numerical abilities, re- spectively, and Kuncel et al. (2001) found an operational validity of .42 and .47 for the same abilities.

JESUS F. SALGADO ET AL. 593

However, there are important differences in the case of job perfor- mance. The magnitude of the validity for GMA in the Europe Com- munity is larger than the validity found in the American studies. Fur- thermore, GMA predicts job performance better than training success in the EC, although GMA predicts training success better than job per- formance in the U.S. These differences suggest that job performance may be predicted differentially in the EC compared to the U.S. A possi- ble explanation is that training success is assessed as typical performance in the EC but it is assessed as maximal performance in the U.S. (Acker- man, 1994; Ackerman & Humphryes, 1991). In addition, it is possible that objective measures of training success and ratings are assessing dif- ferent constructs. In the case of the European studies, as the majority of them used ratings, the construct assessed may be job performance but, in the case of the U.S. studies, the construct examined could be job knowledge (i.e., declarative knowledge). A third possible explanation is of methodological order. Many European studies have used a dichoto- mous criterion for assessing training. This is an artifactual error that produces an underestimation of the real validity. The magnitude of the underestimation depends on the proportion of cases in the successful and unsuccessful groups. For example, if the proportion is S O , then the understimation would produce a figure that is 80% of the real validity. However, the underestimation could be as dramatic as only producing 25% of the realvalue. Another explanation for these different findings is that the data set consisted of different proportions of studies containing more complex jobs in the EC than in the US.

Other relevant similarities were found for job performance reliability and the range restriction of scores in the cognitive predictors. We found that the average interrater reliability for job performance ratings was .52. This was exactly the same estimate found in the large scale meta-analysis carried out by Viswesvaran et al. (1996), using studies mainly conducted in the U.S. This convergence in the interrater reliability for job perfor- mance ratings shows that reliability is independent of the type of scale and is not nationally limited. Our results, as also shown by Viswesvaran et al. (1996), support the value used by Hunter and Hunter (1984) in their meta-analysis of the GATB validity and contradicts the position of the National Science Council Panel (Hartigan & Wigdor, 1989), when this last considered that .60 (i.e., Hunter’s value) was an unrealisticvalue for job performance ratings. In fact, our results suggest that Hunter’s es- timate was a conservative one. Another similarity was found regarding the magnitude of the range restriction estimates. Hunter (1986; Hunter & Hunter, 1984) used avalue of .67 as the range restriction ratio of GMA scores in studies carried out in civil occupations when predicting job per- formance and .60 for studies using training success. In our studies, we

594 PERSONNEL PSYCHOLOGY

found a range restriction ratio of .62 for studies using job performance and .67 for studies using training success. Therefore, there is a small difference between Hunter’s estimates and our estimates.

Implications for the Theory and Practice of Personnel Selection

The findings of this research have important implications for the the- ory and practice of personnel selection. Crucially, our findings contra- dict the view that criterion-related validity for GMA tests is moderated by country culture, religion, language, socioeconomic, or employment legislation differences (e.g., Herriot & Anderson, 1997; Newel1 & Tans- ley, 2001). At least across EC countries, these results demonstrate un- equivocal international generalizability for cognitive ability tests. By im- plication, these findings hint at the scientific feasibility of a general the- ory of personnel selection, applicable at least to the U.S. and EC coun- tries and probably all over the world. In such a theory, GMA would have a core position, as the magnitude of its validity appears to be the largest of all personnel selection procedures. However, caution is warranted to- ward claiming wider, global generalizability, as Europe and the US. do share aspects of cultural similarity, which may not be equally shared by all cultures.

From a practical point of view, our findings scientifically support the use of GMA and cognitive measures for selection purposes for all kinds of occupations in the EC and U.S. The findings also suggest that the evidence from European surveys regarding the wide use of cognitive measures in personnel selection for all jobs in all countries has a sci- entific basis. There is now unequivocal evidence to indicate that GMA tests are very good predictors of job performance and training success across the U.S. and the EC, and, therefore, the transfer of such find- ings to organizational practices in employee selection is important (An- derson, Herriot & Hodgkinson, 2001). Although some selection practi- tioners may believe that tests of specific cognitive abilities produce su- perior criterion-related validity than GMA, our results refute this no- tion. That is, tests of specific abilities such as verbal, numerical, spa- tial-mechanical, perceptual, and memory, failed to demonstrate higher validity than GMA measures. It is thus prudent to reiterate the main practical implication of this finding that GMA tests predicted these two criteria most successfully.

Some Limitations of the Present Research

This research has some limitations. First, only two criteria were con- sidered, yet job activity is a multidimensional construct, which incorpo-

JESlj'S E SALGADO ET AL. 595

rates both actions and omissions related to work goals. In this sense, job performance and training success are only facets of job activity, but job activity includes both productive actions (e.g., job performance, train- ing success, outputs, career advancement, citizen behavior, knowledge sharing) and counterproductive actions (e.g., thefts, accidents, absences, norm violations, etc.; see Campbell, McHenry, & Wise, 1990; Sackett, 2002; Viswesvaran, 2002; Viswesvaran & Ones, 2000). Another limita- tion is related to possible group differences in GMA (Hough, Oswald, & Ployhart, 2001). In the EC context, scarce research has addressed this issue. However, some studies have suggested that there is evidence of no differential prediction for minority groups (te Nijenhuis & van der Flier, 1997). A third limitation is that we have not examined whether the criterion-related validity of GMA and specific cognitive abilities is mod- erated by occupation type. In spite of these three main limitations noted, the present research focused upon, as an initial investigation, somewhat more restricted questions into the internationalvalidity generalization of GMA tests in personnel selection in the countries of the EC. A call for future research to address these limitations is therefore warranted. In addition, similar meta-analytic validity studies should also be conducted on data sets obtained from Africa and Asia, and the American countries not represented in the U.S. meta-analyses.

REFERENCES References marked with an asterisk indicate studies included in the meta-analysis. Ackerman PL. (1994). Intelligence, attention, and learning: Maximal and typical perfor-

mance. In Ditterman DK (Ed.), Current topics in human intelligence: Theories of intelligence (vol4, pp. 1-27). Norwood, NJ: Ablex.

Ackerman PL, Humphreys LG. (1991). Individual differences theory in industrial and or- ganizational psychology. In Dunnette MD, Hough LM (Eds), Handbook of indus- trial and organuationalpsychology. (vol 1, pp. 223-282). Palo Alto, C A Consulting Psychologists Press.

*Amthauer R. (1973). I-S-T 70. Intelligenr-Struktur-Test. I-S-T 70. [Intelligence-Structre- Test]. Gottigen: Hogrefe.

*Amthauer R, Brocke B, Liepmann D, Beauducel A. (1999). I-S-T 2000. Inttelligenz- Stmkrur-Test 2000. [I-S-T- 2000. Intelligence-Structure- st 20001. Gottingen: Hogrefe.

*Anderberg R. (1936). Psychoteschnische rekutierungmethoden bei den schwedischen staasbahnen. [Psychotechnical methods of recruitment in the Swedish railways]. Industrielle Psychotechnike, 13,353-383.

Anderson N, Herriot P, Hodgkinson GP. (2001). The practitioner-researcher divide in industrial, work, and organizational (IWO) psychology: Where are we now, and where do we go from here. Joumal of Occupational and Organizational Pychology, 74,391441.

'Anstey E. (1963). 0-48. Manual. Madrid: Tea. Ashworth SD, Osburn HG, Callender JC, Boyle KA. (1992). The effects of unrepresented

studies on the robustness of validity generalization results. PERSONNELPSYCHOMGY, 45,341-361.

596 PERSONNEL PSYCHOLOGY

*Bacqueyrisse L. (1935). Psychological tests in the Paris tramway and omnibus services. Human Factors, 9,231-234.

*Banister D, Slater P, Radzan M. (1962). The use of cognitive tests in nursing candidate selection. Occupational Psychologist, 36,75-78.

Barrick MR, Mount MK, Judge T (2001). Personality and performance at the beginning of the new millennium: What do we know and where do we go next. international Journal of Selection and Assessment, 9,9-30.

*Bartram D, Dale HCA. (1982). The Eysenck Personality Inventory as a selection test for military pilots. Journal of Occupational Psychology, 55,287-296.

'Beaton R. (1982). Use of tests in a national retail company. In Miller K (Ed.), Psycho- logical testing in personnel assessment (pp. 137-148). Epping, U.K.: Gower Press.

'Blanco MJ, Salgado JF. (1992). Disedo y experimentacidn de un modelo de intervencidn de la psicologia en el reconocimiento y seleccidn de conductores profesionales. [Design and experimentation of a intervention psychological model in diagnosis and selection of profesional drivers]. In AA. W. (Eds). Conducta y seguridad vial. (5-57). Madrid: FundaciBn Mapfre.

'Boerkamp RG. (1974). Een criteriuminstmment voor de bedrij~selectie-situatie. [A crite- rion measure for applicant-selection in companies.] Unpublished doctoral disser- tation, University of Amsterdam, The Netherlands.

'Bonnardel R. (1949). L'emploi des mtthodes psychomttriques pour le controle des con- dictions psychologiques du travail dans les ateliers [Use of psychometric methods for controling the psychological condictions of work in the ateliers.] Le Travail Hu- main, 12,7545.

'Bonnardel R. (1949). Examens psychometriques et promotion ouvriere (8tude portant sur un groupe d'ouvriers tlecticiens en cours de perfectionnement) [Psychometric tests and worker promotion (A study conducted with a group of electrician workers in a training course]. Travail Humain, 22, 113-1 17.

*Bonnardel R. (1949). Recherche sur la promotion des ouvriers dans le cadres de maitrise [Research on the promotion of workers to industry supervisor]. Le Travail Humain,

*Bonnardel R. (1954). Apprtciations profesionnelles et notations psychomttriques ttude portant sur un groupe de jeunes ouvriers [Professionnal qualifications and psycho- metric scores. Study conducted with a group of young workers]. Le TravailHumain,

*Bonnardel R. (1954). Examen de chauffeurs de caminos au moyen de tests de reactions [Examination of train drivers using reaction tests] . Le Travail Humain, 17,272-281.

*Bonnardel R. (1956). Un example des difficult& souledes par la question des critkres professionnels [An example of the difficulties raised for the question of professional criteria]. Le Travail Humain, 19,234-237.

*Bonnardel R. (1958). Comparaison d'examens psychomttriques de jeunes ingenieurs et de cadres administratifs [Comparison between psychometric test in young engineers and administrative supervisors]. Le Travail Humain, 21,248-253.

*Bonnardel R. (1959). Liaisons entre rtsultats d'examens psychomttriques et rtussite dans le perfectionnement profesionnel ouvrier [Relations between the results of psychometric exams and performance in the improvement of professional worker]. Le Travail Humain, 22,239-246.

*Bonnardel R. (1961). Recherche sur la promotion des ouvriers dans la maitrise [Research on the promotion of workers in industry]. Le Travail Humain, 24,21-34.

*Bonnardel R. (1962). Recherche sur le recruitment a des cours d'ttude du travail au moyen de techniques psychomttriques [Research on the recruitment by psychome- tric techniches]. Le Travail Humain, 25,73-78.

12,245-256.

17,119-125.

JESUS E SALGADO ET AL. 597

Borman W, Penner LA, Allen TD, Motowidlo SJ. (2001). Personality predictors of citi- zenship performance. International Journal of Selection and Assessment, 9,5249.

‘Bruyere MJ. (1937). Quelques donndes sur I’intelligence logico-verbale et les aptitudes techniques pour I’orientation vers la carrier d‘ingenieur. Bulletin de l‘lnstitut Na- tionalle d’orientation Profesionelle, 9, 141-147.

*Burt C. (1922). Test for clerical occupations. Journal of the National Institute of Industrial

Campbell JP, McHenry JJ, Wise LL. (1990). Modeling job performance in a population

*Campos E (1947). Selecci6n de aprendices. [Apprentice selection] Revista de Psicologfa Generaly Aplicada, 2,427-438.

Carretta TR, Ree MJ. (1997). Expanding the nexus of cognitive and psychomotor abili- ties. International Journal of Selection and Assessment, 5,149-158.

Carretta T, Ree MJ. (2000). General and specific cognitive and psychomotor abilities in personnel selection: The prediction of training and job performance. International Journal of Selection and Assessment, 8,227-236.

Carreta T, Ree MJ. (2001). Pitfalls of ability research. International Journal of Selection and Assessment, 9,325-335.

*Castillo-Martin J. (1963). Estudio del Otis en sujetos de la industria eldctrica. [A study of Otis test with electrical industry workers]. Revista de Psicologin General y Aplicada,

*Castill0 J, Bueno R, Moldes F, FernLndez J, Barras M. (1969). Estudio estadistico del test de matrices progresivas de Raven, escalas general y especial. [A statistical study of Raven’s Progressive Matrices Tests, general and special scales]. Revista de Psicologfa General y Aplicada, 24, 1004-1009.

‘Castle PFC, Garforth FI. (1951). Selection, training and status of supervisors. Occupa- tional Psychologist, 25,109-123.

*Cattell RB. (1981). Manualdel test Factor G. Escalas 2y3. [Manual for the Factor GTkst, Scales 2 and 31. Madrid: Tea.

*Chi]& R. (1990). Graduate and managerial assessment data suplement. Windsor, Berk- shire, U.K.: ASE, NFER-Nelson

*Chleusebairgue A. (1939). Industrial psychology in Spain. Occupational Psychology, 13, 33-41.

*Chleusebairgue A. (1940). The selection of drivers in Barcelona. Occupational Psychol- ogy, 14,146-161.

*Cleary TA. (1968). Test Bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5,115-124.

*de Buvry de Mauregnault D. (1998). Evaluatie selectieprocedure automatisenngsmedew- erkers, validatieondeaoek aan de hand van task performance en contextual peflor- mance. [Evaluation of the selection procedure of automation employees: valida- tion research of task and contextual performance]. Major paper of work and orga- nizational psychology, University of Amsterdam.

*Decroly 0. (1926). Etude sur les aptitudes necessaires au relieur. [Study on the aptitudes for worker]. Bulletin de l’oficnce d’Orientation Pmfesionnelle, 26,l-31.

*Evers A. (1977). De constructie van een testbatereij voor deselectie van leerling-verpkgenden aan de “De Tjonsgerschans.” [The construction of a test battery for the selection of student courses in De Tjonsgerschans]. Unpublished first report.

*Farmer E. (1930). A note on the relation of certain aspects of character to industrial proficiency. British Journal of Psychology, 21,46-49.

*Farmer E. (1933). The reliability of the criteria used for assessing the value of vocational tests. Brisfish Journal of Psycholoa, 24,109-119.

Psycholoa, I, 23-27; 79-82.

of jobs. PERSONNEL PSYCHOLOGY, 43,313-333.

18,1015-1020.

598 PERSONNEL PSYCHOLOGY

*Farmer E, Chambers EG. (1936). The prognostic value of some psychological tests. Indusm'al Health Research Board report no. 74. London: HM Stationary

'Feltham R. (1988). Validity of a police assessment centre: A 1-19 year follow-up. Journal of Occupational Psychology, 61,129-144.

'Fokkema SD. (1958). Aviation psychology in The Netherlands. In Parry JD, Fokkema SD (Eds.), Aviation psychology in western-Europe and a report on studies of pilot proficiency measurement (pp. 58-69). Amsterdan: Swetz & Zeitlinger.

*Foster WR. (1924). Vocational selection in a chocolate factory. Journal ofthe National Institute oflndurtrial Psychology, 3,159-163.

'Frankford AA. (1932). Intelligence tests in nursing. Human Factors, 6,451-453. 'Frisby CB. (1962). The use of cogntive tests in nursing candidate selection: A comment.

Occupational Psychology, 36,7941. *Garcia-Izquierdo A. (1998). Validacidn de unproceso de seleccidn depersonal en un centro

de formacidn del sector de la consfruccidn. [Validity of a personnel selection process ina training center of the construction industry]. Unpublished doctoral disertation, University of Oviedo, Spain.

*Germah J, Pinillos JL, Garcia-Moreno E, Aberasturi NL. (1969). La validez de unas pruebas selectivas para conductores. [Validity of drivers' selection tests]. Revba de Psicologia General y Aplicada, 24,1067-11 14.

'Goguelin I-! (1950). Recherches sur la stlection des conducteurs de vthicules. Le Travail Humain, I3,9-?5.

'Goguelin €! (1951). Etude du poste dtlectricien de tableau et examen de selection pour le poste. Le Travail Humain, 14, 15-65.

'Goguelin P. (1953). l h d e du poste de dispatcher dans I'industrie eltctrique et de la selection pour ce poste. Le Travail Humain, 16,197-205.

'Gonzalez-Peir6 A, Sanz-Cid A, Ferrer-Martin A. (1984). Prevencidn de accidentes medi- ante una bateria psicojiioldgica. [Prevention of accidents using a psychophisiolog- ical battery]. Internacional Meeting on Psychology and Traffic Security, Valencia, Spain.

*Greuter MAM, Smit-Voskuyl OF. (1983). Psychologish ondenoek geevalueeerd, een valid- eringsondenoek naar de psychologische selectie bij de WG. [Psychological assess- ment investigated A validation study of the psychological selection process in VNG]. Research report, University of Amsterdam.

*Handyside JD, Duncan DC. (1954). Four years later: A follow-up of an experiment in selecting supervisors. Occupational Psycholoo, 28,9-23.

'Hansgen KD. (2000). Evaluation des eignungstestsfur das medizinstudium in der schweb- suverliissigkeit der vorhersage yon siudienerfolg. [Evaluation of aptitude test for medicine studies in Swizertland. Prognosis reliability of the studies success]. CTD Centre for the Test Development and the Diagnostic, Department of Psychology, University of Fribourg, Switzeland.

Hartigan JA, Wigdor AK. (Eds). (1989) Fairness in employment testing. Validity general- ization, minority issues, and the General Aptitude Test Battery. Washington, D.C.: National Academy Press.

*Hebling H. (1987). The self in carrer development, theory, measurement and counseling. Unpublished doctoral dissertation. University of Amsterdam.

*Hebling JC. (1964). Results of the selection of R.N.A.E air traffic and fighter control officers. In Cassie A, Fokkema SD, Parry JB (Eds.). Aviation psychology. Studies on accident liability, proficiency criteria, and personnel selection (pp. 71-76). The Hague: Mouton.

* Heim AW. (1946). An attempt to test high-grade intelligence. British Journal of Psychol- OW, 37-38,71-81.

JESUS E SALGADO ET AL. 599

‘Henderson P, Boohan M. (1987). SHL‘s technical test battery: Validation. Guidance and Assessment Review, 3(6), 3-4.

*Henderson P, Hopper E (1987). SHL‘s TT’B: Development of norms. Guidance and Assessment Review, 3(6), 5-6.

‘Henderson F’, Lockhart H, O’Reilly S. (1987). SHL‘s technical test battery: Test-retest reliability. Guidance and Assessment Review, 3,(3), 4-5.

‘Henderson P, O’Hara R. (1990). Norming and validation of SHL‘s personnel test battery in Northern Ireland. Guidance and Assessment Review, 6(4), 2-3.

Herriot P, Anderson NR. (1997). Selecting for change: How will personnel selection psychology survive? In Anderson N, Herriot P (Eds.), International handbook of selection and assessment (pp. 1-34). London, U.K.: Wiley.

Hofstede G. (1980). Culture’s consequences: International differences in work-related val- ues. Beverly Hills, CA Sage.

Hough LM, Oswald FL, Ployhart RE. (2001). Determinants, detection, and amelioration of adverse impact in personnel selection procedures: Issues, evidence, and lessons learned. International Journal of Selection and Assessment, 9,152-193

Hunter JE. (1986). Cognitive ability, Cognitive aptitudes, job knowledge, and job perfor- mance. Intelligence, 29,340-362.

Hunter JE, Hirsh HR. (1987). Applications of meta-analysis. International review of industrial and organizational psycholom, vol 2. (pp. 321-357). Chichester, U.K.: Wiley.

Hunter JE, Hunter RF. (1984). Validity and utility of alternative predictors of job perfor- mance. Psychological Bulletin, 96,72-98.

Hunter JE, Schmidt FL. (1990). Methods of meta-ana&sis. Correcting error and bias in research findings. Newbury Park, CA: Sage.

*Jager AO, Althoff K. (1994). Der WILDE-Intelligenz Test (WIT). [WILDE Intelligence Test]. Gottigen: Hogrefe.

*James DJ. (1964). Prediction of performance in the early stages of flying-training. In Cassie A, Fokkema SD, Parry JB (Eds.), Aviation psychology. Studies on accident liabili& proficiency criteria, and personnel selection (pp. 78-82). The Hague: Mou- ton.

‘Jones ES. (1917). The Woolley-test series applied to the detection of ability in telegraphy. Journal of Educational Psychology, 8,27-34.

Kehoe J. (2002). General mental ability and selection in private sector organizations: a commentary. Human Performance, I S , 97-106.

‘Keizer L, Elshout JJ. (1969). Een onderzoek naar de validiteit van selectiempporten en enkele andere voorspellers. [A study of the validity of selection protocols and other predictors]. Unpublished report.

‘Kokorian A, Valser C. (1999, October). Computer-based assessment for aircrafipilot: The pilot atifude tester (PILAPT). 41st Annual Conference of the International Military Testing Association (IMTA). Monterey, CA.

* Kragh U. (1960). The defense mechanism test: a new method for diagnosis and person- nel selection. Journal of Applied Psychology, 44,303-309.

*Krielen E (1975). De validiteit van de testbatterijen “Middelbaar Algemeen” en “Middel- baarAdministratief: ” [The validity of the test batteries “Middelbaar Algemeen” and ‘‘Middelbaar Administratief”]. Unpublished doctoral dissertation. University of Amsterdam.

Kuncel N, Hezlett !L4, Ones DS. (2001). A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychological Bulletin, 127, 162-181.

*Lahy JM. (1933). Sur la validitt des tests exprimte en “pourcent” d’tchecs. Le Tmvail Humain, I , 24-31.

600 PERSONNEL PSYCHOLOGY

*Lahy JM. (1934). La selection profesionnelle des aiguilleurs. Le Travail Humain, 2, 15- 38.

*Lahy JM, Korngold S. (1936). Recherches experimentales sur les causes psychologiques des accidents du travail. Le Travail Humain, 4,21-59.

*Lahy JM, Korngold S. (1936). Sdlection des operatrices des machines a perforer “samas” et “hollerith.” Le Travail Humain, 4,280-290.

*Ledent R, Wellens L. (1935). La selection et la surveillance des conducteurs des tram- sways unifies de Liege et extensions. Le Travail Humain, 3,401405.

Levine EL, Spector PE, Menon S, Narayanan L, Cannon-Bowers J. (1996). Validity generalization for cognitive, psychomotor, and perceptual tests for craft jobs in the utility industry. Human Performance, 9,1-22.

Levy-Leboyer C. (1994). Selection and assessment in Europe. In Triandis HC, Dun- nette MD, Hough LM (Eds.), Handbook of industrial and organizationalpsychology, (vol. 4, pp. 173-190). Palo Alto, CA. Consulting Psychologists Press.

*Mace CA. (1950). The human problems of the building industry: Guidance, selection and training. Occupational Psychology, 24,96-104.

*McKenna Fp, Duncan J, Brown ID. (1986). Cognitive abilities and safety on the road a re-examination of individual differences in dichotic listening and search for em- bedded figures. Ergonomics, 29,649-663.

*Meili R. (1951). Experiences et observations sur I’intelligence pratique-technique [Expe- riences and observations on technical-practical intelligence]. L ’Anna Psychologique, 50,557-574.

*Miles GH. (1924-25). Economy and safety in transport. Journal of the National Institute of Industrial Psychology, 2,192-197.

*Miller K. (1982). Choosing tests for clerical selection. In Miller K (Ed.), Psychological testing in personnel assessment (pp. 109-121). Epping, U.K: Gower Press.

‘Miret y Alsina FJ. (1964). Aviation psychology in the Sabena. In Parry JD, Fokkema SD (Eds.), Aviation psychology in western-Europe and a report on studies of pilot proficiency measurement (pp. 22-30). Amsterdan: Swetz & Zeitlinger.

*Mira E. (1922-23). La selecci6 del xbfers de la companyia general d‘autbmnibus. (Selec- tion of drivers for the general company of buses]. Annals de L ‘Institut d’Orientacid Professional, 3-4,60-71.

*Montgomery GWG. (1962). Predicting success in engineering. OccupationalPsychologist, 36,59-80.

*Moran A. (1986). The reliability and validity of Raven’s standard progressive matrices for Irish apprentices. International Review ofApplied Psychology, 35,533-538.

*Morin J. (1954). Notation psychometrique et profesionnelle de jeunes ingtnieurs. Le Travail Humain, 20,201-211.

‘Mulder F. (1974). Characteristics of violators of formal company rules. Journal ofApplied Psychology, 55,500-502.

*Munro MS, Raphael W. (1930). Selection tests for clerical occupations. Journal of the National Institute of Industrial Psychology, 5,127-137.

Murphy KR. (2002). Can conflicting perspectives on the role of g in personnel selection be resolved? Human Pe$ormance, 15,173-186.

Murphy KR, Shiarella A. (1997). Implications of the multidimensional nature of job per- formance for the validity of selection tests: Multivariate frameworks for studying test validity. PERSONNELPSYCHOLDOY, 50,823-854.

*National Institute of Industrial Psychology. (1932). The selection of salesmen. Human Factors, 6,26-29.

*Nelson A, Robertson IT, Walley L, Smith M. (1998). Personality and work performance: some evidence from small and medium-sized firms. Occupational Pgchologist, 12, 28-36.

JESUS E SALGADO ET AL. 601

‘Neuman E. (1938). Psychoteschnische eignungspriifung und anlernung im flugmotoren- bau. [Psychotechnic aptitude tests and training in airplane motors]. Industrielle Psychotechnik, 15,111-162.

Newel1 S, lknsley C. (2001). International uses of selection methods. In Cooper CL, Robertson IT (Eds.) International review of indushial and organizationalpsychology, (vol. 21, pp. 195-213). Chichester, U.K.: Wiley.

‘Newman SH, Howell MA. (1961). Validity of forced choice items for obtaining references on physicians. Psychological Repotts, 8,367.

* Notenboom CGM, van Leest PF. (1993). Psychologische selectie werkt! [The psycholog- ical selection works!]. Ejdschrift voor de Politie, 3,6547.

Nunnaly J. (1978). PJychometric theoty. 2nd ed. New York McGraw-Hill. *Nyfield G, Gibbons PJ, Baron H, Robertson I. (1995, April). The cross-cultural validity of

management assessment methods. Paper presented at the 10th Annual Conference of the Society for Industrial and Organizational Psychology, Orlando, FL.

Olea MM, Ree MJ. (1994). Predicting pilot and navigator criteria: Not much more than g . Journal ofApplied Psychology, 79,845-951.

‘Pacaud S. (1946). Recherches sur la stlection psychotechnique des agents de gare dits “facteurs-enregistrants.” Le Travail Humain, 9,23-73.

*Pacaud S. (1946). Recherches sur la selection profesionnelle des optratrices de machines a perforer et de machines comptables. La “selkction omnibus” est-elle possible en mecanographie? Le Travail Humain, 9,74-86.

*Pacaud S. (1947). Stlection des mtcaniciens et des chaufeurs de locomotive. Etude de la validitd des tests employts et composition des batteries selectives. Le Travail Humain, 10,180-253.

‘Patin J, Vinatier H. (1962). Un test verbal: Le CERP 15. Bulletin du CERI: 11,327-342. ‘Patin J, Vinatier H. (1963). Un test d’intelligence : Le CERP 2. Bulletin du CERI: 12,

‘Patin J, Vinatier H. (1963). Le test CERP 14 du docteur Morali-Daninos. Bulletin du

‘Patin J, Nodiot S. (1962). Le test mecaniques de F’. Rennes. Bulletin du CERI: 11,61-97. ‘Patin J, Panchout MF. (1964). L‘admission des candidats ti une formation profesionelle

de techniciens. Approche docimologique, validitts du mncours et de I’examen psychotechnique. Bulletin du C.E.R.P, 13,147-165.

Pearlman K, Schmidt FL, Hunter JE. (1980). Validity generalization results for tests used to predict job proficiency and training success in clerical occupations. Journal of Applied Psychology, 65,373-406.

‘Ptrez-Herrera, J, Perez-Muiioz J. (1984). Validez de la baterfa Belrampa. [Validity of Belrampa Battery]. I1 Congreso del Psicologia del Trabajo. Valencia, Spain.

*Petrie A, Powell MB. (1951). The selection of nurses in England. Journal of Applied

Ree MJ, Earles JA. (1991). Predicting training success: Not much mOre than g. PERSON-

Ree MJ, Earles JA, Teachout MS. (1994). Predicting job performance: Not much more than g. Journal ofApplied Psychology, 79,518-524.

‘Roe RA. (1973). Een ondenoek naar de validiteit van een selectieprocedure voor lager en uitgebreid lager administratieve functies. [A validity study of a selection procedure for lower level administrative jobs]. Unpublished report. University of Amsterdam.

Roe RA. (1989). Designing selection procedures. In Herriot P (Ed.), Assessment and selection in organizations (pp. 97-118) . Chichester, U.K.: Wiley.

*Roe RA, Boerkamp RG. (1971). De selectie vooorfuncties uit degroep UitgebreidAlgemen. Een validatiestudie. [The selection for functions of the cluster Uitgebreid Algemen.] Technical report, University of Amsterdam, The Netherlands.

187-201.

CERI: 12,381-390.

Psycho@, 35,281-286.

NELPSYCHOWGY, 44,321-332.

602 PERSONNEL PSYCHOLOGY

*Roloff HP. (1928). Ueber eignung und bewahrung. [On aptitude and validity.] Beihefte zur ZeistchriftfirrAngewande Psychologie, 148. (Abstract in L'Annbe Psychologique,

*ROSS J. (1962). Predicting practical skill in engineering apprentices. Occupational Psy-

Rothstein HR. (1990). Interrater reliability of job performance ratings: Growth to asymp- tote level with increasing opportunity to observe. Journal of Applied Psychology, 75,

Ryan AM, MacFarland L, Baron H, Page R. (1999). An international look at selection practices: Nation and culture as explanations for variability in practice. PERSONNEL

Sackett PR. (2002). The structure of counterproductive work behaviors: Dimensionality and relationships with facets of job performance. International Journal of Selection and Assessment, 10,5-11.

*Salgado JE (1993). Validacidn sintktica y utilidad de pruebas de habilidades cognitivas por ordenador. [Synthetic validity and utility of computerized cognitive ability tests]. Revista de Psicologi'a del Trabajo y las Organizaciones, 11,79-92.

Salgado JF. (1997). The five factor model of personality and job performance in the EC. Journal of Applied PsycholoD, 82,3043.

Salgado JE (1999). Personnel selection methods. In Cooper CL, Robertson IT (Eds), International review of industrial and organizationalpsycholoogy, (vol. 14, pp. 1-53). Chichester, U.K.: Wiley.

Salgado JE (2002). The Big Five personality dimensions and counterproductive behav- iors. International Journal of Selection and Assessment, 10, 117-125.

Salgado JF, Anderson N. (2002). Cognitive and GMA testing in the E C Issues and evidence. Human Performance, 1S,75-96.

*Salgado JF, Blanco MJ. (1990). klidez de laspruebas de aptitudes cognitivas en la seleccidn de ojkiales de mantenimiento de la Universidad de Santiago de Compostela. [Validity of cognitive ability tests for selecting maintainance worker in the University of Santiago de Compostela]. Paper presented at the 111 National Congreso of Social Psychology, Santiago de Compostela, Spain.

Salgado JF, Viswesvaran C, Ones DS. (2001). Predictors used for personnel selection. In Anderson N, Ones DS, Sinangil HK, Viswesvaran C (Eds), Handbook of indwtriaf, wor&, and organizationalpsychology, (vol. 1, pp. 165-199). London, U.K.: Sage.

*Samuel JA. (1970). The effectiveness of aptitude tests in the selection of postmen. Studies in Personnel Psychology, 2,65-73.

*Sanchez J. (1969). Validacidn de tests con monitores de formacidn [Test validity with training monittors]. Revista de Psicologfa Generaly Aplicada, 24,301-313.

Schmidt FL. (2002). The role of general cogntive ability and job performance: Why there cannot be a debate. Human Performance, IS, 187-211.

Schmidt FL, Hunter JE. (1977). Development of a general solution to the problem Df validity generalization. Journal of Applied Psychology, 62,529-540.

Schmidt FL, Hunter JE. (1996). Measurement error in psychological research Lessons from 26 research scenarios. Psychological Methods, 1,199-223.

Schmidt FL, Hunter JE. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,262-274.

Schmidt FL, Hunter JE. (1999). Theory testing and measurement error. Intelligence, 27,

Schmidt FL, Hunter JE, Caplan JR. (1981). Validity generalization results for two jobs groups in the petroleum industry. Journal ofApplied Psycholoa, 66,261-273.

1928,29,897-899).

cholO@t, 36,69-74.

322-327,

PSYCHOLOGY, 52,359-391.

183-198.

JESUS F. SALGADO ET AL. 603

Schmidt FL, Hunter JE, Pearlman K. (1981). lhsk differences and the validity of aptitude tests in selection: A red herring. Journal of Applied Psychology, 66,166-185.

Schmidt FL, Hunter JE, Pearlman K, Shane GS. (1981). Further tests of the Schmidt- Hunter Bayesian validity generalization model. PERSONNEL PSYCHOLOGY, 32,257- 281.

Schmidt FL, Hunter JE, Urry VW. (1976). Statistical power in criterion-related validation studies. Journal ofApplied Psychology, 61,473-485.

Schmidt FL, Ones DS, Hunter JE. (1992). Personnel selection. Annual Review of Psy- chology, 42,627-670.

*Schmidt-Atzert L, Deter B. (1993). Intelligenz und ausbildungserfolg: Eine unter- suchung Zur prognostischen validitat des I-S-T- 70. [Intelligence and training suc- cess: A study of the predictive validity of the I-S-T 70). Zeitschtifi fiir Arbeits und Organisationspsychologie, 37,5243.

*Schmidt-Atzert L, Deter B. (1993). Die vorhersage des ausbildungserfolgs bei verschei- denen berufsgruppen durch leistungstets. [The prediction of training success in different jobs by achievement tests]. Zeitschtifi f i r Arbeits und Otganirntionspsy- chologie, 37,191-196.

Schmitt N, Gooding RZ, Noe RD, Kirsch M. (1984). Meta-analyses of validity studies published between 1964 and 1982 and the investigation of study characteristics. PERSONNEL PSYCHOLOGY, 37,407-422.

*Schuler H. (1993). Social validity of selection situations: A concept and some empirical results. In Schuler H, Farr JL, Smith M (Eds.), Personnel selection and assessment: Individual and organizational perspectives (pp. 11-26). Hillsdale, NJ: Erlbaum.

‘Schuler H, Moser K, Diemand A, Funke U. (1995). Validitat eines eisntellungsintexviews zur prognose des ausbildungserfolgs. [Validity of an employment interview for the prediction of training success]. Zeitschrifitfiir Piidugogische Psychologie, 9,4S-54.

*Schuler H, Moser K, Funke U. (1994, July). The moderating effect of rater-ratee acquuin- tance on the validity of an assessment center. Paper presented at the 23th Congress of the International Association of Applied Psychology, Madrid.

*Serrano F’, Garcia-Sevilla L, Perez JL, Pina M, Ruiz JR. (1986). Municipal police eval- uation: psychometric versus behavioral assessment. In Yuille JC (Ed.), Police se- lection and training. The role ofpsychology (pp. 257-265). Dordrecht, Netherlands: Martinus Nijhoff.

‘SHL. (1989). Validation review. Surrey, U.K: Saville & Holdsvorth Ltd. *SHL. (1996). Validation review II. Surrey, U.K.: Saville & Holdworth, Ltd. *Smith MC. (1976). A comparison of the trainability assessments and other tests for

predicting the practical performance of dental students. Intemutionul Review of Applied Psychology, 25-26,125-130.

*Smith-Voskuyl OF. (1980). De consmctie van een testbattenj voor de selectie van leeding- verplegenden aan de “De Tjonsgerschans. ”. [The construction of a test battery for the selection of student curses in “De Tjonsgerschans.”] Unpublished final report.

‘Sneath F, Thakor M, Medjuck B. (1976). Esting ofpeople at work. London: Institute of Personnel Management.

Spector PE, Cooper CL, Sparks K, Bernin P, Dewe F’, Lu L, et al. (2001) An international study of the psychometric properties of the Hofstede Values Survey Module 1994 A comparison of individual and country/province level results. Applied Psychology: An Intemutionul Review. 50,269-281.

*Spengler G. (1971, July). Diepraxis derawahl vonfiihrungsmfien in der GlarustoffAG. [The practice in executive selection in Glanzstoff A.G.]. Proceedings of the 17th Congress of the International Association of Applied Psychology. (Piret R, Ed.). Belgium.

604 PERSONNEL PSYCHOLOGY

'Spielman W. (1923). Vocational tests for dressmakers' apprentices. Journal of the Na- tional Institute of Industrial Psychology, I , 277-282.

*Spielman W. (1924). The vocational selection of weavers. Journal of the National Institute of Industrial Psychology, 2,256-261.

*Spielman W. (1924). Vocational tests for selecting packers and pipers. Journal of the National Institute of Industrial Psycholoa, 2,365-373.

*Srinivasan V, Weinstein AG. (1973). Effects of curtailment on an admissions model for a graduate management program. Journal of Applied Pycholog, 58(3), 339-346.

'Stanbridge RH. (1936). The occupational selection of aircraft apprentices of the Royal Air Force. The Lancet, 230,1426-1430.

'Starren AML. (1996). De selectie van instroomkandidaten binnen her HR-beleid van een onderneming in de informatie Echnologie. Een evaluatie onderzoek. [Applicant se- lection and the HR-policy of an IT company: An evaluation study]. Paper devel- oped for the major Work and Organizational Psychology, University of Amsterdam, The Netherlands.

*Stevenson M. (1942). Summaries of researches reported in degree theses. British Journal of Psycholow, 12,182.

*Stratton GM, McComas HC, Coover JE, Bagby E. (1920). Psychological tests for select- ing aviators. Journal of Experimental Psycholoa, 3(6), 405-423.

'Tagg M. (1924). Vocational tests in the engineering trade. Journalof the National Institute of Industrial Psycholoa, 2,313-323.

*te Nijenhuis J. (1997). Comparability of test scores for immigrants and majority group members in The Netherlands. Unpublished doctoral dissertation. University of Amsterdam.

te Nijenhuis J, van der Flier H. (1997). Comparability of GAT3 scores for immigrants and majority group members: Some Dutch findings. Journal of Applied Psycholoa, 82,675-687.

*Thurstone LL. (1976) Manual del tesr P M . [Manual for the PMA test]. Madrid: -a. 'Timpany N. (1947). Assessment for foremanship. British Journal of Psycholoa, 38,23-28. 'Trost G, Kirchenkamp T (1993). Predictive validity of cognitive and noncognitive vari-

ables with respect to choice of occupation and job success. In Schuler H, Farr JL, Smith M (Eds.), Personnel selection and asessment (pp. 303-314). Hillsdale, N J Erlbaum.

*van Amstel, B, Roe FU. (1974). Verslag van twee studies m.6.t. de part-time opleiding kinderberscherming B aan de Sociale Academie te Twente. [Report on two studies of the part-time major "Kinderbescherming B of the Social Academy of Tiventel. Unpublished report. University of Amsterdam.

'van der Maesen de Sombreff PEAM, Zaal JN. (1986). Incremental utility ofpersonality questionnaires in selection by the Dutch government. Unpublished paper.

*van Leest PF, Olsder HE. (1991). Psychologisch selectieondetzoek en praktddeoordelingen avn agenten bq de gemeentepolitie Amsterdam. Research report. Amsterdam:, RPD Advies.

*Van Lierde AM. (1960). Essai de validation d'un test de suveillance sur une population des chauffeurs d'autobus. [Essay of validation of a vigilance test in a population of bus drivers]. Bulletin du C.E.R.I?, 10,193-205.

Verive JM, McDaniel M. (1996). Short-term memory tests in personnel selection: Low adverse impact and high validity. Intelligence, 23,15-32.

'Vernon PE. (1950). The validation of Civil Service selection board procedures. Occupa- tional PsycholoB, 24,75-95.

Vineberg R, Joyner JN. (1989). Evaluation of individual enlisted performance. In Wiskoff MF, Rampton GM (Eds.), Militarypersonnel measurement. Testing, assign- ment, evaluation (pp. 169-200). New York Praeger.

JESUS F. SALGADO ET AL. 605

Viswesvaran C. (2002). Absenteeism and measures of job performance: A meta-analysis. International Journal of Selection and Assessment, 10, 12-17.

Viswesvaran C, Ones DS. (2000). Perspectives on models of job performance. Interna- tional Journal of Selection and Assessment, 8,216-226.

Viswesvaran C, Ones DS. (2002). Agreements and disagreements on the role of general mental ability (GMA) in industrial, work, and organizational psychology. Human Performance, 15,211-231.

Viswesvaran C, Ones DS, Schmidt FL. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81,557-574.

'Vlug T (1988). De selectie van aspirant verkeersvliegen. Een evaluatie-ondenoek. Werkstuk voor de afstzufeem'chtingArbeids-en Organisatiepsychologie. [The selection of appli- cant airplane pilots: An evaluation study]. Paper developed for the major Work and Organizational Psychology, University of Amsterdam, The Netherlands.

*Vogelaar FJ. (1972). Een validatie ondenoek van een selectieprocedure voor politiefunc- tionarissen. [A validation study of the selection procedure for policemen]. Techni- cal report, University of Amsterdam, The Netherlands.

*Vogue F, Darmon EG. (1966). Validation d'examens psychotechniques de conducteurs de chariots automoteurs de manutention.[Validity of psychotechnic examinations of truck drivers]. Bulletin du CERI: 15, 183-187.

'Weiss RH. (1980). Grundintelligenztest skala 3-CFT 3. [Group intelligence test scale 3-Cm 31. Brannschweig: Georg Westermann.

*Wickham M. (1949). Follow-up of personnel selection in the A.T.S. Occupational Psy-

*Wittersheim JC, Schlegel J. (1970). Essai de validation d'une batterie de securitk. Le

*Yela M. (1956). Selecci6n profesional de especialistas mecinicos. [Professional selection

*Yela M. (1968). Manual del test de rotacidn defigurus macuas. [Manual for the complete

chology, 23,153-168.

Travail Humain, 33,281-294.

of mechanics]. Revista de Psicologia Generaly Aplicada, 13,717-719.

figure rotated]. Madrid Tea.