REVIEWER AND EDITOR DECISION MAKING IN THE JOURNAL REVIEW PROCESS

PERSONNEL PSYCHOLOGY 1997.50

REVIEWER AND EDITOR DECISION MAKING IN THE JOURNAL REVIEW PROCESS

STEPHEN W. GILLILAND Department of Management and Policy

University of Arizona

JOSB M. CORTlNA George Mason University

Much research on the journal review process has found little consistency among reviewers’ evaluations of manuscripts. We propose theoretical explanations for these differences related to gatekeeping and particularism phenomena and generate hypotheses regarding influences on initial editorial decisions. A sample of 823 original submissions to the Journal ofApplied Psychology were analyzed with respect to author and paper characteristics, reviewer evaluations, and editor decisions. Support was found for gatekeeping functions in that reviewers and editors appeared to pay particular attention to the adequacy of the research design, operationalization of constructs, and theoretical development. Evidence was found for variable gatekeeping in reviewer evaluations, and the impact of reviewer evaluations on editor decisions was moderated by this variability across reviewers. Little evidence was found for social particularism (i.e., favoritism based on gender or affiliation) or content particularism (preference for or against particular research settings or methodologies) .

Journal publications are the primary means by which scientists com- municate their ideas and the results of their research efforts. Journal publications also play an important role in the reward structure (pay, promotions, etc.) of most academics (Gomez-Mejia & Balkin, 1992). The majority of the respected journals in the social sciences use a peer review process to select papers for publication, and in most cases, this process is partially structured. The reviewer is informed as to the criteria (writing style, theoretical development, research design, measurement, etc.) that should be employed in judging the adequacy of a paper, and many journals include a rating form on which reviewers provide their evaluation of the degree to which various criteria are met. However,

We thank Neal Schmitt for providing access to the editorial files and both Neal and Robert T Anderson for their assistance conducting this study. An earlier version of this paper was presented at the Ninth Annual Conference of the Society for Industrial and Organizational Psychology, Inc., Nashville, TN, April, 1994.

Correspondence and requests for reprints should be addressed to Stephen W. Gilliland, Department of Management and Policy, 405 McClelland Hall, University of Arizona, Tbcson, AZ 85721, [email protected]. COPYRIGHT 0 1997 PERSONNEL PSYCHOLOGY, INC

427

428 PERSONNEL PSYCHOLOGY

differences in reviewers’ judgments about these criteria and the degree to which other criteria are also employed add elements of subjectivity to the process (Beyer, Chanove, & Fox, 1995; Daft, 1985; Kerr, Tolliver, & Petree, 1977; Lindsay, 1978). Discrepancies in judgments of a paper are frequently found (Jauch & Wall, 1989; Lindsay, 1978) and studies of interrater reliability have reported indices less than .30 (e.g., Fiske & Fogg, 1990; Gottfredson, 1978). Low interrater reliability suggests that individual reviewers use different criteria to judge manuscripts, or that they use those criteria inconsistently, or both.

In this paper, theoretical explanations for the lack of interrater reliability are proposed. We begin by presenting prior frameworks that have been used to study the editorial review process. We then extend these frameworks by developing the notion of variability in criterion weights used by reviewers and editors and the possibility of particularism for or against a given research setting, methodology, or statistical analysis. Hy- potheses are developed with regard to these influences on the initial editorial review decisions.

Theoretical Frame works

Beyer and colleagues (1995) developed a model of factors that influence the journal publication process, and organize this model around four frameworks taken primarily from the sociological literature.

Gatekeping. The most traditional view of the review process is termed gatekeeping and suggests that those papers that make a unique contribution to the existing literature through novelty of ideas, clear development of theory, and rigor of method receive higher ratings and more favorable editorial decisions. Considerable research using a variety of methods has investigated and supported the existence of the gatekeeping role (Daft, 1985; Schwab, 1985). Campion (1993b) provides a comprehensive list of gatekeeping criteria used by reviewers in applied psychology.

Particularism. The idea that reviewer ratings and editorial decisions are based to some extent on personal relations and status as opposed to their being driven entirely by scientific merit is termed particularism (Zuckerman, 1988). It suggests that a paper submitted by a friend or a colleague of high status is evaluated more favorably than a paper submitted by an enemy or a person whose status is unknown. Beyer et al. (1995) point out that, although blind review solves this problem to some extent at the reviewer level, it may still be an issue with respect to editorial decisions. However, in their analyses of editors’ and consulting editors’ decisions, they found little evidence of particularism.

GILLILAND AND CORTINA 429

Accumulative advantage. The accumulative advantage framework suggests that certain demographic characteristics of authors are associated with more favorable ratings and editorial decisions independent of the effects of any other paper characteristics that might be correlated with those demographics. For example, Beyer et al. (1995) hypothesized that senior faculty in large, research-oriented universities may have more graduate students at their disposal, more knowledgeable colleagues, and more opportunities for external funding, all of which may lead to papers of higher quality that receive more favorable evaluations. The influence of demographic characteristics on quality is what differ- entiates accumulative advantage and particularism. Beyer et al. (1995) tested the effects of accumulative advantage on reviewer recommendations and found effects for professorial rank, department rank, and research funding.

Reviewer style. A final framework discussed by Beyer et al. (1985) is reviewer style. Reviewers are thought to vary with respect to the degree to which they provide developmental feedback to authors. Presumably, the more developmental the feedback, the greater the chances that the manuscript will be improved and accepted for publication, if not in the present journal, then in a different journal. Beyer et al. (1995) found some support €or the reviewer style frameworks such that author revisions and editorial decisions were related to the developmental style of reviewers.

Conclusions. Both the accumulative advantage and reviewer style frameworks suggest factors that may influence the publication process through their impact on paper quality. That is, through an accumulative advantage or through developmental feedback, authors are able to produce higher quality work that is more likely to be published. In this way, the gatekeeping function, which addresses paper quality, can be seen as mediating the accumulative advantage and reviewer style functions. In addition, the gatekeeping function may be more complex than was suggested by Beyer et al. (1995). Specifically, we propose expanding the notion of gatekeeping such that variation in the selection and weighting of gatekeeping criteria is studied under the heading of variable gatekeeping. Further, the particularism framework can be expanded to include content particularism toward a particular methodology or research setting. Both variable gatekeeping and content particularism are discussed in the following sections.


Variable Gatekeeping

With multiple reviewers evaluating manuscripts, even if there were agreement with respect to the evaluation criteria, there is likely variability with respect to the weighting of these various criteria. This is consistent with previous research on the journal review process (Jauch & Wall, 1989; Lindsay, 1978) as well as the decision-making literature in general (e.g., Hitt & Barr, 1989). Variable gatekeeping suggests that there is reasonable agreement with respect to the general components of gatekeeping, but variability with respect to the way that those criteria are applied.

Efforts to associate this variability with reviewer characteristics have only demonstrated limited success (e.g., Campion, 1993a; Chase, 1970). In an ambitious research effort, Campion (1993a) asked editorial board members and ad hoc reviewers from three journals to provide importance weights for a list of 223 criteria. A major finding was a lack of correlation between many reviewer characteristics and the weights given to criteria. However, Campion (1993a) did find differences between reviewers with regard to professional affiliation and years of work experience. Thus, although there is evidence of variable gatekeeping, research has not been able to systematically describe this variability.

Content Particularism

Particularism as described by Beyer et al. (1995) involves such phenomena as cronyism, nepotism, and “old boy’s networks.” However, it may be that this is merely one form of particularism, what we might call social particularism. Another form of particularism involves preferences for or against certain approaches to research and can be labeled content particularism. For example, it has been suggested that sophistication of analyses is an important criterion when assessing paper quality. When the research questions being addressed are complex, sophisticated analysis may be necessary, and this criterion would fall into the realm of gatekeeping. Given that sophisticated analyses are not always required for the answers to certain questions, insistence on such analyses becomes content particularism. Likewise, there are some in the applied psychological fields who eschew the use of lab designs regardless of the research question. One purpose of the present paper is to examine whether or not reviewer ratings reflect content particularism. To our knowledge, no previous study has addressed this question.

GILLILAND AND CORTINA 43 1

Objectives of the Present Study

In the current study, we extend prior research on the editorial review process by developing and examining the role of variable gatekeeping within the gatekeeping framework and the role of content particularism within the particularism framework. In addition, we examine further the presence of social particularism in reviewer ratings and editorial decisions. Finally, we extended the model and research of Beyer et al. (1995) by examining the editor’s initial decision regarding the assignment of reviewers to particular manuscripts.

Given the detail with which we have chosen to examine gatekeeping and particularism phenomena with reviewers’ and editors’ decisions, we have limited the scope of this study by (a) not examining accumulative advantage and reviewer style frameworks examined by Beyer et al. (1995), and (b) focusing only on the initial review process. The accumulative advantage and reviewer style frameworks both received some support from Beyer et al. (1995). The gatekeeping and particularism frameworks received less support and may, therefore, be more interesting to re-examine. With regard to focusing on the initial review process, certainly very few manuscripts are accepted upon initial submission, and many go through multiple rounds of reviews. However, the initial review process is often the most critical in terms of the evaluation of the manuscript, and is most likely to reflect phenomena such as particularism in the decision process. We chose therefore to focus on this process. In the following sections we discuss the series of decisions in the initial review process and generate specific hypotheses regarding the influences of gatekeeping and particularism on these decisions.

Decision Making in the Initial Review Process

The initial review process begins when an author submits his or her paper to the journal for publication consideration. Assuming the paper is deemed appropriate for the journal, the editor must assign the paper to reviewers. In the case of the journal that was the focus of our study (Journal of Applied Psychology), this decision usually involved selecting three reviewers, of which one or sometimes two came from the editorial board and the remainder were ad hoc reviewers. These reviewers produce ratings and written evaluations of the manuscript, which consti: tute the second major decision in the initial review process. Finally, the reviewers’ ratings and written evaluations are compiled and interpreted by the editor or one of two associate editors who reads the manuscript and makes a decision regarding its potential suitability for publication. We elaborate on these three initial decisions and generate hypotheses


regarding the roles of gatekeeping, variable gatekeeping, content particularism, and social particularism in these decisions.

Editor’s decision regarding reviewers. Editors of most journals in the behavioral sciences (e.g., Personnel Psychologv, Journal of Applied Psy- chology, Academy of Management Journal, Academy of Management Re- view, Psychobgzcal Bulletin) must choose the reviewers for every suit- able paper that is submitted. At this point in the process, the primary phenomenon likely to affect decisions is gatekeeping: The editor must decide which types and levels of expertise are required from the reviewers.

Hypothesis I: Editors assign papers to certain reviewers based on topical content, methodology, and quality of the paper.

The logic behind this hypothesis is that the editor will try to match reviewers’ expertise to manuscripts on the basis of characteristics of the manuscript such as conteqt and type of research method. The extent to which content versus methods is emphasized by the editor in this decision is likely the result of both the type of paper and the expertise of the reviewers (e.g., papers with unusual methodologies may be assigned to specific reviewers with expertise in that methodology). The final por- tion of the hypothesis, which suggests differences in paper quality across reviewers, is more speculative and is based on the possibility that some reviewers may be better at providing constructive reviews for very poor papers or that editors are reluctant to send papers of apparently low quality to the more competent reviewers.

Reviewers’ evaluations. After papers are sent to reviewers, the reviewers provide ratings and written evaluations of the manuscript. The gatekeeping function suggests that these reviewer evaluations are based on the quality, contribution, and presentation of the manuscript. The role of gatekeeping in reviewer evaluations has been examined through three research approaches. In the first approach, reviewers and editorial board members provide self-reports of the importance of criteria used in judging papers. These studies have found that the most important characteristics are methodology, significance and importance of results, and clarity of presentation (Campbell, 1982; Chase, 1970; Kerr et al., 1977; Schwab, 1985). A limitation of this approach to assessing criteria importance is that respondents may have limited insight into their judgment processes such that they may not be able to accurately report what factors influence their decisions (Nisbett & Wilson, 1977).

The second type of study involves content coding reviews to discover what reviewers actually mention when they conduct reviews and jus- tify their recommendations regarding a manuscript. Results from these


analyses are consistent with self-report findings and suggest that the most frequently criticized aspects of manuscripts include: presentation of theory, procedures, and conclusions, inadequate design or execution, and irrelevance or unimportance of the topic literature (Daft, 1985; Fiske & Fogg, 1990). Content analyses are limited by the assumptions that all important reviewer criteria are conveyed in the written review and that the frequency of comments are indicative of criterion importance.

A third approach to studying the gatekeeping function involves policy capturing, wherein paper characteristics are used to explain reviewer and editor ratings and decisions. To our knowledge, Beyer et al. (1995) is the only study to adopt this methodology, and they only examined gatekeeping with regard to authors’ revision decisions and the editor’s final decision. In their study the only significant gatekeeping dimensions were the novelty of the study, the significance of the results, and the clarity of presentation. Unfortunately, Beyer et al. (1995) did not examine a number of the gatekeeping dimensions identified in the self-report and content analyses studies, such as technical merit and theoretical development. The research based on self-reports from reviewers (e.g., Schwab, 1985) and content analyses of reviews (e.g., Daft, 1985) suggest the following hypothesis regarding the gatekeeping function:

Hypothesis 2: Reviewers’ overall evaluations will be positively related to their evaluations of the manuscript’s theoretical development, technical merit, problem significance, and presentation clarity.

This hypothesis is also consistent with the fact that most journals use rating forms that focus reviewer attention on these gatekeeping dimensions.

Variable gatekeeping must also be considered as reviewers may agree to a reasonable extent on the evaluation criteria, but the importance that they attribute to these criteria may vary. Based on Campion (1993a), we expect reviewer policy differences to be related to the professional affiliation and experience of the reviewers. Academics tended to give more weight to theoretical importance and lessweight to practical importance and presentation than nonacademics. Those in psychology tended to give more weight to appropriateness of topic for the journal and presentation, but less weight to theoretical importance than those in business departments. Those with more experience gave less weight to theory and more weight to procedures, discussion, and presentation. Although Campion’s (1993a) results were based on self-reports from reviewers, we expect similar results from a policy-capturing approach.

Hypothesis 3: Reviewers differ with respect to the weights that they at- tach to criteria when evaluating manuscripts, and these differences will be related to reviewers’ professional affiliation and experience.


Beyer et al. (1995) investigated the effects of social particularism on final editorial decisions, but not on reviewer evaluations because the journal used blind review procedures. In a system that includes optional blind review (Journal of Applied Pvchology, Personnel Psychology, and many other journals have adopted a full blind review), there may be evidence of social particularism in reviewer ratings. In this case, we would expect author characteristics such as gender and affiliation to be related to reviewer evaluations. Based on the possibility of social particularism, we generated the following hypothesis:

Hypothesis 4: Reviewers’ overall evaluations of manuscripts will be related to author gender and affiliation.

In addition to social particularism, there may also be favoritism toward or against certain methods of research. Such content particularism is distinguished from gatekeeping in that gatekeeping is an evaluation of the appropriateness of the methodology, given a particular topic and research question. Assuming that research questions vary across papers, and that some questions suggest certain methodologies while other questions suggest other methodologies, there should be no consistent pattern of relationships between methodanalysis characteristics and reviewer ratings of overall manuscript quality. The content particularism perspective, on the other hand, suggests the following hypothesis:

Hypothesis 5: Reviewers’ overall evaluations of manuscripts will be related to the research setting and design, subject type, and data analysis method.

Editorial decision. After the reviews are returned to the editor or one of the associate editors, this editor usually reads the manuscript and the reviews and makes a decision regarding publication potential. Tradi- tionally, this decision is to reject the manuscript outright, to suggest that the authors revise and resubmit the manuscript for further consideration, or to accept the manuscript pending revisions. From a gatekeeping perspective, we expect that editor’s decisionswill be based on the reviewers’ ratings of a manuscript’s theoretical development, technical merit, problem importance, and presentation clarity.

Hypothesis 6: The editor’s decision will be positively related to reviewers’ evaluations of the manuscript’s theoretical development, technical merit, problem significance, and presentation clarity.

Variability of reviewer policies may also affect editorial decisions such that agreement among reviewers may be more convincing for the editor than disagreement. Given that reviewers are sometimes chosen


for diversity of positions and training instead of similarity, such agreement across reviewers might be particularly convincing. With less agreement among reviewers, the editor may base the editorial decision more heavily on his or her own evaluation of the manuscript. The variable gatekeeping perspective suggests the following hypothesis:

Hypothesis 7 Reviewer agreement will moderate the positive relationship between reviewer evaluations and editor’s decision, such that the editor’s decision will be most strongly related to reviewer evaluations under con- ditions of high reviewer agreement.

The editor, as suggested by Beyer et al. (1995), may be influenced by social particularism such that author characteristics might affect editorial decisions over and above reviewer ratings of the paper. In addition, the editor may also be influenced by content particularism, such that a particular methodology consistently receives more positive or negative editorial decisions than other types of methodologies.

Hypothesis 8 The editor’s decision will be related to author gender and affiliation over and above the evaluations of the reviewers.

Hypothesis 9 The editor’s decision will be related to the research setting and design, subject type, and data analysis method over and above reviewer ratings of these aspects of the papers.

Method

Sample

The sample consisted of 823 articles submitted for review in the Jour- nal of Applied Psychology between the years of 1989 and 1991. Journal of Applied Psychology articles were chosen because we were able to ob- tain access to this sample. Given the policy-capturing nature of some of the hypotheses, the use of a single journal ensured that our analyses would be based on common independent and dependent variables. Be- cause of our interest in capturing the policy of individual board members (Hypothesis 3), we chose only those papers reviewed by one or more of the 35 who had served the entire 3-year period so as to maximize the number of reviews they had completed. Finally, all papers involving authors affiliated with the same university with which the authors of the present paper were affiliated were removed from the study to protect confidentiality during the data coding process. For the sample period, a total of 1,430 new submissions were received; the 823 papers studied represented 58% of the total received during that period.


TABLE 1 Primary Content of Submissions and Number of Submissions

Content Area Number %

Selection/utility 92 11.2

Performance/performance ratings 73 8.9 Organizational psychology" 85 10.3

Job attitudes 72 8.7 Stress 70 8.5 Motivational 59 7.2 Leadership 46 6.0 Gender/family issues 44 5.3 Statisticslmeasurement 43 5.2 Job analysis/job evaluation 40 4.9 Employee withdrawal 31 3.8 Ttainingearning 28 3.4 Organizational entryhecruiting 26 3.2 Negotiationhnion issues 19 2.3 Other 91 11.1

Organizational psychology was a broad category that included groups, climate, culture, commitment, justice, power, and roles.

Measures

Paid undergraduate students coded data from the files associated with each paper. The paper was coded as to whether or not the review was blind (69 of the 823 authors requested and were granted blind reviews). The action editor (the editor or one of two associate editors) and this editor's decision regarding the paper (1 = reject, 2 = revise and resubmit, 3 = accept, pending revision) were also coded as was the editorial board member who read the paper. Ratings of the paper made by the editorial board member and two ad hoc reviewers were coded on nine dimensions using 5-point Likert-type scales. Dimensions included appropriateness of topic, mastery of relevant literature, conceptual or theoretical arguments, operationalization of key constructs, adequacy of research design, data analysis and interpretation, writing style (clarity), overall contribution, and probability of making at least moderate contribution if revised. All dimensions included specific anchors on the rating scales.

Coded paper characteristics included research setting (laboratory, field, method paper, review paper), research design (experimental, survey, correlational, other), type of subjects (0 = college students, 1 = other) and content of the paper. Eighteen content areas were originally coded. Because of the small number of papers in several content areas the final set of content areas was reduced to 15 (see n b l e 1 for the number of papers coded in each category). Finally, the statistical analysis


used to test primary hypotheses was coded by one of four experts familiar with the current study as we did not believe that the undergraduate coders would be able to accurately identify this information. Statistical analyses were coded as (1) correlational or regression, (2) analysis of variance, (3) factor analysis, (4) LISREL, confirmatory factor analysis or path analysis, or (5 ) other. In addition to the paper characteristics, two author characteristics were coded: the first author’s gender based on her/his first name (0 = male, 1 = female) and the author’s affiliation (1 = academic, 0 = other).

Although the coding of most of the variables was simply a matter of transcription, coding some variables involved a degree of judgment on the part of the coder (e.g., research design, paper content). Cod- ing of all papers was checked by one of four experts familiar with the current study for both transcription and judgment errors. In cases of disagreement between original and expert coding, the expert’s evaluation was used. To maximize experts’ coding agreement, 30 papers were initially coded by all four experts. Complete agreement as to the coding of these manuscripts was achieved in 83% of the cases for all variables except for the data analysis variable. Here, experts agreed in 70% of the cases with most disagreements resulting when multiple data analysis techniques were used in a single paper. All disagreements were discussed so as to produce greater rating consistency before evaluating the remainder of the manuscripts.

Results

Means, standard deviations, and intercorrelations among the primary variables are presented in Table 2. Author affiliation and author gender were overrepresented by academics and males, respectively, leading to some restriction of variance in these variables. Correlations among manuscript ratings within a reviewer tended to be higher than correlations across reviewers, suggesting that (a) multicollinearity was a potential concern in some of the analyses and (b) interrater consistency was low. To further investigate interrater consistency, interrater reliabilities and interrater agreement indices were calculated among the editorial board members and the two ad hoc reviewers (see Bible 3). Consistent with past research (e.g., Fiske & Fogg, 1990), interrater reliability tended to be in the .3 to .4 range. Interrater agreement was even lower with many values only slightly above chance agreement.

TABL

E 2

Mea

ns, S

tand

ard

Dev

iatio

ns, a

nd In

terc

orre

latio

ns A

mon

g Pr

imat

y Va

riabl

es

P

W

03

Var

iabl

e M

SD

1

2 3

4 5

6 7

8 9

10

11

12

13

14

15

16

17

18

19

1. A

utho

r affi

liatio

n"

0.06

0.

24

2. A

utho

r gen

derb

0.

28

0.45

-0

.12

3. EB

appr

opria

te'

4.10

0.99

-0

.01

-0.0

6 4.

EB li

tera

ture

3.

43 0

.92

-0.0

6 -0

.01

0.20

5.

EB th

eory

2.8

1 0.

96

-0.0

7 0.

00

0.27

0.

55

6. EB

cons

truc

ts

2.80

0.

98

0.01

-0

.06

0.22

0.

36

0.45

8 7.

EB de

sign

2.56

1.24

0.

02

-0.0

4 0.

24

0.36

0.49

0.

52

8. E

Bda

ta an

aly.

2.

72

1.10

0.

03

-0.0

1 0.

11

0.35

0.

42

0.47

0.56

9.

EB w

ritin

g 3.4

1 0.

98

-0.0

4 0.

02

0.15

0.

34

0.38

0.29

0.

26

0.36

B 10

. EB

over

all

2.00

1.10

0.04

-0

.06

0.35

0.4

1 0.

56 0

.52

0.69

0.

51

0.30

11

.AH

appr

opri

ate"

4.

22

1.01

0.02

-0

.03

0.20

0.

10

0.11

0.

08

0.09

0.

04

0.03

0.1

1 r z

16. A

Hda

taan

aly.

2.

72

1.10

-0

.05

0.00

0.0

6 0.

10

0.17

0.08

0.1

5 0.

17 0

.10

0.13

0.

18

0.37

0.

45

0.44

0.6

1 r

12. A

H li

tera

ture

3.3

3 0.

95

-0.0

5 0.

02

0.09

0.

27

0.18

0.0

9 0.

16

0.11

0.

12

0.17

0.24

13

. AH

theo

ry

2.80

0.

98

0.00

0.

06

0.05

0.

13

0.12

0.07

0.

11

0.09

0.

07

0.11

0.

25

0.57

14

. AH

cons

truc

ts

2.97

1.03

0.

00

0.04

0.

02

0.11

0.

13

0.19

0.

18

0.13

0.

10

0.17

0.

19

0.43

0.

49

15. A

H d

esig

n 2.7

1 1.

25

0.02

0.

01

0.02

0.

12

0.15

0.

16

0.18

0.15

0.

06

0.13

0.2

1 0.

43

0.52

0.

57

17. A

H w

ritin

g 3.3

6 1.

07 -

0.04

0.

03

0.05

0.

14

0.12

0.0

8 0.

09

0.09

0.

22

0.11

0.15

0.

33

0.39

0.

28

0.28

0.

32

18. A

H o

vera

ll 2.

24

1.13

0.

00

0.00

0.

11

0.14

0.2

1 0.

20

0.24

0.1

8 0.

11

0.23

0.

35

0.45

0.61

0.

58

0.69

0.57

0.

36

19. E

dito

r dec

ision

' 1.

25 0

.44

0.05

-0

.02

0.23

0.2

7 0.

37 0

.38

0.45

0.

34

0.19

0.5

1 0.

16

0.23

0.

26

0.32

0.

29

0.25

0.

16

0.39

3 EC

0

Nor

e: C

orre

latio

ns of

r>

.07

are s

igni

fican

t at p

< .O

S.

I, A

utho

r gen

der w

as co

ded

0 =

mal

e, 1

= fe

mal

e.

" AH

App

ropr

iate

thro

ugh

AH

Ove

rall

repr

esen

t rat

ings

mad

e by

the

firs

t Ad

Hoc

revi

ewer

. D

ata

for t

he s

econ

d A

d H

oc re

view

er a

re a

vaila

ble

Aut

hor a

ffilia

tion

was

code

d 0

= A

cade

mic

, 1 =

Oth

er.

EB A

ppro

pria

te th

roug

h EB

Ove

rall

repr

esen

t rat

ings

mad

e by

the

edi

toria

l boa

rd re

view

er.

from

the

first

auth

or.

Edito

r dec

ision

was

code

d 1 =

reje

ct, 2

= re

vise

and

resu

bmit

or a

ccep

t pen

ding

revi

sion.


TABLE 3 Interrater agreement and Interrater Reliability for Reviewer Evaluations

Dimensions

Appropriateness Literature Theory Constructs Design Data analysis Writing Overall evaluation

Interrater reliability

.44

.30

.36

.46

.37

.31

.41

.39

Interrater agreement fkappal" EB-AH1 EB-AH2 AHI-AH2

. O l .04 .09

.08 -.01 .09

.04 .08 .08

.Ol .ll .04

.04 .02 0

.02 .03 .03

.05 .04 .04

.06 .07 .06

EB = Editorial board member, AH1 = First ad hoc reviewer, AH2 = Second ad hoc reviewer.

Editor's Decision Regarding Reviewers

Hypothesis 1 suggested that papers assigned to different reviewers would vary in topic content, research methods, and quality. To test this hypothesis, the distribution of paper content, research setting, study design, data analysis, and editorial decision were each examined across the 35 editorial board members using Chi-square analyses. Paper content varied across editorial board members ( ~ ~ ( 4 7 6 ) = 1957.24, p < .05), with some board members receiving more than 65% of the papers they reviewed from a single content category. Although we do not have ob- jective data on board members areas of expertise, our knowledge of their expertise suggests that paper content was matched to expertise. For example, board members who are known for publishing in areas of gender issues or leadership tended to receive the majority of the papers they reviewed in these areas. Variation across board members was also found for the research setting ( ~ ~ ( 1 0 3 ) = 352.16,~ < .05), study design (~'(102) = 323.99,~ < .05), and data analysis (~'(170) = 342.46 ,~ < .05). In terms of research setting, some board members tended to only review field studies, whereas others reviewed an equal mix of laboratory and field studies, and still others predominantly reviewed method papers. Similar variation was seen in study design and data analysis. The findings with regard to paper content, research setting, study design, and data analysis, all tend to support the hypothesized gatekeeping compo- nent of the decision process in the editor's assignment of papers to reviewers.

Hypothesis 1 also suggested that paper quality would vary across reviewers. The distribution of editorial decisions varied across board members ( ~ ~ ( 6 8 ) = 94.54, p < .05), with one reviewer having 95% of the papers they reviewed rejected by the editor and another having 35%


rejected, 35% revise and resubmit, and 30% accepted pending minor revisions. Although this finding supports the notion that board members receive papers of different quality, it is also possible that different evaluation standards of board members affected the editorial decisions causing different rejection rates. A measure of paper quality that is independent of board members’ judgments is the average evaluations of ad hoc reviewers. A one-way ANOVA using board members as the independent variable and average ad hoc reviewers’ final judgments as the dependent variable indicated a main effect of board members (F(34,774) = 1 . 5 5 , ~ < .05). This effect was significant, although it appears to have been fairly weak. Nevertheless, average ad hoc reviewer evaluations for board members ranged from 1.85 to 3.03, indicating that paper quality did indeed vary across board members. This finding supports the notion that the editor considered paper quality when assigning papers to reviewers.

Reviewers’ Evaluations

Hypothesis 2 addressed the gatekeeping function and suggested that reviewers’ evaluations would be related to the manuscripts’ theoretical development, technical merit, problem significance, and presentation clarity. To test this hypothesis, we regressed each reviewer’s overall evaluation of the manuscript on their ratings of the paper on various dimensions. Only the first seven dimensions were used as predictors because the eighth dimension, Contribution, correlated r = .8 with overall evaluation and was considered an alternate form of the final recommendation. These regressions were conducted for editorial board members and the two ad hoc reviewers (see Bble 4). To control for a lack of independence in editorial board members evaluations, we cre- ated 34 dummy variables to represent the 35 board members and entered these into the regression before entering the seven paper ratings. mi- torial board member dummy variables explained 16% of the variance in overall ratings (F(34,664) = 3.83, p < .05). Examining both zero-order correlations and beta-weights, the adequacy of the research design was most predictive of overall evaluations. Conceptual or theoretical arguments and operationalization of key constructs were the next most predictive dimensions, followed by appropriateness of topic and adequacy of data analysis. Mastery of relevant literature and writing style (clarity) both demonstrated significant correlations, but did not contribute significantly in the regressions. The writing style (clarity) dimension demonstrated a significant beta-weight for the second ad hoc reviewers only. Overall, these results are consistent with the gatekeeping function, and with prior self-report and content-analysis studies.


TABLE 4 Regression of Reviewers’ OvemN Evaluation on Their Ratings

of Specific Paper Dimensions

Board member Ad hoc reviewer 1 Ad hoc reviewery- Dimensions r P r 4 r 4

Appropriateness Literature Theory Constructs Design Data analysis Writing R.2

n

.37* .19* .34*

.41* .03 .44*

.58* .1s* .60* 54. .09* .58* .68* .46* .71* .52* .13* 59’ .33* .02 ..%*

.65* (A R2= .48*)“ 699

.16*

.22*

.17*

.36*

.16*

.04

-.03

.64* 700

.34* .16*

.a* .03

.63* .20*

.s9* .12*

.73* .43*

.56* .08

.36* .11* .67* 32 1

*p< .05 OAR2 represents the variance explained beyond the editorial board member dummy

Hypothesis 3 addressed variable gatekeeping and suggested that reviewers would vary with respect to weights they attached to different evaluation criteria and that this variability would overlap with reviewers’ professional affiliation and expertise. To capture individual differences in evaluations, we first regressed each editorial board members’ recommendations (ad hoc reviewers had not evaluated enough papers to make these regressions meaningful) on their seven paper ratings (again, contribution was eliminated given the overall correlation with final recommendations). This yielded standardized regression weights for each of the seven rating dimensions for each of the editorial board members. Due to missing ratings from three board members, policies could be examined for only 32 of the 35 board members. The multiple R from each board member’s regression provided an index of association between dimension ratings and overall recommendation.

Results indicated that editorial board member ratings of the seven paper dimensions were highly predictive of their publication recommendation (average adjusted multiple correlation was .74). However, multiple correlations for individual board members based on 17 to 35 reviews ranged from .48 to .99. The means and ranges of the beta weights are presented in Bble 5. Consistent with results from Hypothesis 2, adequacy of research design demonstrated the largest mean beta-weight. It should be noted that, although the sizes of these weights may well be due in part to common method variance, there is no reason to believe that the puttem of the weights is due to method variance. Of particular interest in Bble 5 is the wide range of variability in the beta-weights for each dimension, which supports the first half of Hypothesis 3.

variables


TABLE 5 Means and Ranges of Beta Weights for

Each of the Seven Dimensions

Dimensions M SD Minimum Maximum

Appropriatenes .I3 .26 -.49 .58 Literature .06 .22 -.45 .76 Theory .15 .27 -.2a .69 Constructs .08 .25 -.44 .85 Design .45 .23 -.lo .85 Data analysis .I5 .26 -.a .78 Writing .06 .25 - 3 5 .a R2 .74 .14 .a .99

In an attempt to explain the differences in rater policies, we used t tests to compare several training and demographic characteristics of board members with their seven beta weights. Board member characteristics included (a) academic versus nonacademic affiliation; (b) psychology versus business school affiliation; (c) gender; (d) years since doctoral degree was obtained; and (e) whether doctoral degree was obtained from a psychology or business department. Few differences in beta weights were found, and they tended to be associated with professional affiliation and training. Mastery of relevant literature was given more weight by business department doctorates (M = 22) than those with psychology doctorates (M = .03, t(30) = 2.04, p < .05). On the other hand, operationalization of key constructs was given more weight by psychology doctorates (M = .14) than by business doctorates (M = -.14, t(30) = 2 . 6 7 , ~ < .05) and given more weight by those currently affiliated with a psychology department (M = .18) than by those affiliated with a business school (M = .01, t(30) = 2.11, p < .05). Interestingly, nonacademics had higher beta-weights associated with data analysis and interpretation (M = .45) than academics (A4 = .11, t(30) = 2 . 6 5 , ~ < .05).

Social particularism was addressed in Hypothesis 4 with the prediction that reviewers’ evaluations would be related to author gender and affiliation. To test this hypothesis, the editorial board members and the two ad hoc reviewers’ overall evaluations were each examined witht tests for differences on the basis of author gender and affiliation. These analyses were restricted to papers that did not receive a blind review. No differences between author genders were found for any of the reviewers (Editorial board member, t(667) = 1 . 4 1 , ~ > .1Q Ad hoc reviewer 1, t(667) = .26, p > .lo; Ad hoc reviewer 2, t(294) = .19, p > .lo). Like- wise, no differences were found for those authors with academic versus nonacademic affiliation (Editorial board member, t(735) = 1 . 3 3 , ~ > .lo; Ad hoc reviewer 1, t(735) = .12, p > .lo; Ad hoc reviewer 2, t(320) =


.76, p > .lo). We also repeated these analyses using blind review status as a second factor in an ANOVA model. Neither significant main effects nor significant interactions were found with any of the variables. Thus, we found no evidence of social particularism in reviewer evaluations. It should be noted that, although there would be alternative explanations for results supportive of this hypothesis, such as accumulative advantage, these explanations cannot be readily applied to the present findings.

Content particularism was the final hypothesis examined with respect to reviewers’ evaluations. Hypothesis 5 predicted that reviewers’ evaluations would be related to the research setting and design, subject type, and data analysis method. One-way ANOVAs were used to examine the impact of each of these paper characteristics on editorial board members and ad hoc reviewers’ overall evaluations. It should be noted that the independent variables in these ANOVAs are not independent of each other. For the editorial board members, significant effects were found for researchsetting (F(3,801) = 3 . 3 6 , ~ < .05), experimental design (F(3, 802) = 4.03, p < .05), and data analysis (F(4, 751) = 3.79, p < .05). Post hoc LSD tests revealed that methodological studies (M = 2.36) received more favorable evaluations than laboratory studies (M = 1.98), field studies (M = 1.95), or review papers (M = 1.98). Similarly, in terms of design, “other” papers (which included method papers) (M = 2.24) were evaluated more favorably than experimental (M = 1.90) and survey designs (M = 1.91), although they did not differ significantly from correlational designs (M = 2.09). In terms of data analysis method, papers that relied predominantly on factor analysis (M = 1.68) or analysis of variance (M = 1.81) were evaluated less favorably than papers relying on correlational or regression (M = 2.06), LISREL, confirmatory factor analysis or path analysis (M = 2.14), or other methods (M = 2.15). Be- cause of the small number of papers relying primarily on factor analysis (n = 25), the differences between this method and the other methods were not significant in the post hoc LSD tests. None of these differences in research setting, experimental design, or data analysis method were significant with the ad hoc reviewers @s > .lo). In addition, none of the reviewers’ evaluations demonstrated any relationship with the sample subject type (student sample vs. other) (all ps > .lo). Overall, there is some evidence of content particularism among the editorial board members, with a tendency to favor methods papers and give less favorable evaluations to papers that rely on analysis of variance. However these differences are small and it is unclear the extent to which these papers may also differ in terms of overall paper quality.


Editorial Decision

The editorial decision placed each manuscript into one of three cat- egories, with the following proportions in each category: reject (74.6%), revise and resubmit (21.7%) and accept pending revisions (3.6%). Be- cause of the small number of studies that actually received an accept pending revision, and because of the operational similarity with revise and resubmit decisions, we combined this category of decisions with revise and resubmit, thereby making the editor’s decision a dichotomous variable. All analyses on this variable were conducted with logistic regression.

Hypothesis 6 suggested that editorial decision should be related to reviewers’ evaluations of theoretical development, technical merit, problem significance, and presentation clarity. This hypothesis was tested by regressing the dichotomous editorial decision variable onto a set of 34 editorial board member dummy variables (to control for lack of independence in editorial board member evaluations) and 14 predictors: the seven ratings from the board member, and the seven ratings from the first ad hoc reviewer. Because fewer than half the papers had a third reviewer, we excluded them from the analyses because of the impact the reduced samples size would have had on the power of the analyses. The addition of the board member dummy variables to a null-model did not improve the model fit (~‘(34) = 44 .37;~ > .lo). However, the addition of the 14 predictor variables resulted in a considerable improvement in model fit (~’(14) = 247.79; p i .05). It is also interesting to note that if the board member ratings are entered in Step 2 and the ratings of the first ad hoc reviewer are entered in Step 3, both steps result in a significant increment in the fit of the model ( ~ ’ ( 7 ) = 182.31; p < .05 for the first step, ~ ~ ( 7 ) = 60 .16;~ < .05 for the second step), suggesting that editors use unique information from multiple reviewers in making editorial decisions. a b l e 6 presents the final regression weights for both sets of ratings. Whereas research design demonstrated the strongest regression weights in the analyses of reviewers’ evaluations, editor decisions tended to be most strongly associated with board members’ and ad hoc reviewers’ ratings of construct operationalization. Consistent with the gatekeeping function, other significant regression coefficients included the editorial board members’ evaluations of appropriateness, theory, and design. The regression coefficients suggest that editors rely more heavily on editorial board members evaluations than ad hoc reviewers.

Although not explicitly hypothesized, an implicit model within the gatekeeping function suggests that reviewers’ evaluations of paper characteristics determine their overall publication recommendations, which in turn help to determine the editor’s editorial decision. In order to

GILLILAND AND CORTINA

TABLE 6 Logistic Regression Weights for Editorial Decision

445

Dimensions Coefficient Standard error Wald

Editorial board member Appropriateness .64 .I9 11.41’ Literature .08 .22 .14 Theory .34 .20 2.91 Constructs .I2 .I9 14.32’ Design .49 .IS 10.62* Data analysis .44 .17 6.93’ Writing -.07 .I8 .I5

Ad hoc reviewer 1 Appropriateness .02 .18 .01 Literature .Ol .20 .13 Theory .24 .I9 1.53 Constructs .4a .I8 6.89. Design .26 .16 2.74

Writing .I3 .I5 .72 Data analysis ‘20 .16 1.51

*p< .os

assess this model, we regressed editorial decision onto the 34 editorial board member dummy variables (Step l), the board members’ and first ad hoc reviewers’ overall recommendations (Step 2), and the 7 paper characteristics rated by the board member and first ad hoc reviewer (Step 3). The reviewers’ overall recommendations improved the model fit considerably over dummy variables entered in the initial step (~‘(2) = 212.58; p < .05). Addition of each reviewers’ paper evaluations in step 3 also improved the fit of the model (~’(14) = 46.58; p < .05), which suggests that the editor considers the reviewers’ individual ratings in addition to their overall recommendations. However, the fact that the improvement in model fit was less then when the overall recommendations were not added to the model (~‘(14) = 247.79; p < .05), suggests that the reviewers’ overall recommendations partially mediate the relationship between their evaluations of paper characteristics and the editor’s decision.

Hypothesis 7 dealt with variable gatekeeping and suggested that reviewer agreement would moderate the impact of reviewer evaluations on the editor’s decision. Using hierarchical moderated logistic regression, editorial decision was regressed onto the overall recommendations of the board member and first ad hoc reviewer (Step l), the variance across these ratings (Step 2), and the overall recommendations by variance cross-products (Step 3). The first step resulted in a substantial increment in model fit over a null model (~‘(2) = 283.17;~ < .05), while the second step resulted in a negligible improvement in fit (~‘(2) = .ll;p


> .05). Step 3 resulted in considerable improvement in model fit ( ~ ~ ( 2 ) = 19.94; p < .05), thus supporting the hypothesis that the effect of reviewer ratings on editorial decisions depends on the consistency of ratings across reviewers.

To further investigate this interaction, a plot was constructed by per- forming a median split on the variability index and regressing editorial decision onto overall reviewers’ ratings for each of the two variability groups. Consistent with Hypothesis 7, the editorial decision was more strongly predicted by reviewer ratings when agreement was high (overall classification accuracy = 88.75%) than when agreement was low (overall classification accuracy = 78.09%). Separate plots were constructed relating the logit function to rating values for the board member and the ad hoc reviewer for each variability group (see Figure 1). The relationship between reviewer ratings and editorial decision was positive regardless of the amount of variability in the ratings, but the relationship was stronger when the reviewers agreed. Although not hypothesized it is also interesting to note that when variability was low, the editor appears to place equal weight on the ratings of the board member (B = 1.24, S.E. = .36, Wald = 1 1 . 7 5 , ~ < .05) and the ad hoc reviewer (B = 1.26, S.E. = .39, Wald = 10 .57 ,~ < .05). However, when variability is high, the editor places more weight on the ratings of the board member (B = .98, S.E. = .11, Wald = 78 .16 ,~ < .05) than the ad hoc reviewer (I3 = .64, S.E. = .lo, Wald = 3 9 . 8 2 , ~ < .05).

In Hypothesis 8, social particularism was examined in terms of the relationship between author gender and affiliation and the editor’s decision. Using hierarchical logistic regression, editorial decision was regressed onto the overall ratings of the board member and first ad hoc reviewer (Step 1) and author gender and affiliation (Step 2). The second step resulted in a negligible improvement in fit ( ~ ~ ( 2 ) = 2.02: p > .05) over the first step ( ~ ~ ( 2 ) = 252.27; p < .05), thus providing little support for the notion of social particularism in editorial decisions.

Content.particularism suggested a relationship between the editor’s decision and the research setting, research design, subject type, and data analysis method (Hypothesis 9). These relationships were examined with four hierarchical logistic regression analyses. In the first three analyses, editorial decision was regressed onto reviewer ratings of research design adequacy (Step 1) and a set of dummy variables representing the study characteristic in question (either research design, research setting, or subject type). In the fourth analysis, editorial decision was regressed onto reviewer ratings of data analyses adequacy (Step 1) and a set of dummy variables representing data analysis method. In every case, the first step produced a considerable improvement in model fit over a null- model ( ~ ~ ( 2 ) = 183.13;~ < .05; ~ ~ ( 2 ) = 183.13;~ < .05; x2(2) = 181.34;


p < .05; and ~ ~ ( 2 ) = 120.96;~ < .05 respectively), while the second step produced a negligible improvement in model fit ( ~ ~ ( 3 ) = 3.56: p > .05; ~ ' ( 3 ) = 6.57: p > .05; ~ ' ( 1 ) = 1.49: p > .05; and ~ ~ ( 4 ) = 6.71: p > .05 for design, setting, subtype, and analysis type respectively.) Thus, there was no evidence of content particularism in editorial decisions.

Discussion

As with much prior research on the peer review process with journals in the social sciences (e.g., Fiske & Fogg, 1990), this study found little agreement and low reliabilities among reviewers. The primary purpose of this paper was to examine a number of theoretical explanations for this lack of interrater reliability. Specifically, it extended the traditional gatekeeping notion that papers are selected using several indicators of paper quality to the possibility that, while reviewers and editors might agree on the attributes of manuscripts that suggest quality, they may not agree on the relative contributions of these attributes. This phenomenon was referred to as variable gatekeeping. Another source of reviewer disagreement is the possibility that some reviewers or editors demonstrate favoritism toward or against certain authors (social particularism) or manuscript characteristics (content particularism) such as laboratory versus field settings, student versus real-world samples, and so forth. We tested hypotheses based on gatekeeping, variable gatekeeping, social particularism, and content particularism frameworks in three editorial decisions including the assignment of manuscripts to reviewers, the evaluations of the reviewers, and the editor's decision regarding publication potential.

Gatekeeping was examined in the first decision, and we found that papers assigned to different reviewers varied with respect to topic and methods, such that reviewers with certain areas of expertise, or per- ceived areas of expertise, were more likely to be sent paperswith content or methods corresponding to that area. In addition, there was some evidence that paper quality varied consistently across reviewers, suggesting that editors might, for avariety of reasons, send manuscripts of ostensibly lower quality to certain reviewers. Also consistent with the gatekeeping function, reviewer's overall evaluations of manuscripts were most strongly related to their ratings of adequacy of research design, conceptual development, and adequacy of operationalizations of key constructs. Similarly, editorial decisions were found to be related to the reviewer ratings of operationalization of constructs, adequacy of design, and adequacy of analyses. One important reason for the rank ordering suggested by the regression weights in both these sets of analyses is that


some manuscript problems are both serious and uncorrectabie. Prob- lems with research design and operationalizations are typically difficult or impossible to fix. Minor theoretical errors of omission or commission can often be remedied, but major logical flaws or misrepresentations are not. These often represent “fatal flaws” and supersede other considera- tions. On the other hand, writing style problems are more tractable and, therefore, rarely represent the primary basis for overall evaluations of manuscripts. Although these results all suggest strong support for the gatekeeping function, they do not address the primary issue of variability in reviewer evaluations.

Variable gatekeeping hypotheses were tested with respect to reviewer recommendations and editorial decisions. Consistent with the variable gatekeeping notion, considerable variability in weights associated with paper characteristics was found across reviewers. Unfortunately, little of this variability overlapped with variability in the reviewer characteristics that we coded, which is consistent with Campion (1993a). Perhaps the most notable exception was the finding that nonacademics placed more importance, in terms of standardized regression weights, on data analysis and interpretation than did academics. In addition, Campion found that those reviewers affiliated with business schools place more weight on theory than those affiliated with psychology departments. In the current study, those with doctorates from business schools demonstrated larger weights for mastery of literature than those with psychology doctorates. One interesting explanation for the variability in weights assigned to given paper characteristics relates to the earlier mention of fatal flaws. Reviewers may differ with respect to which dimensions are considered critical. Future research might focus on fatal flaws and the extent to which they differ across reviewers.

Variable gatekeeping was also demonstrated in editorial decisions such that the reviewers’ overall recommendations were more predictive of editorial decisions when variability across reviewers was low than when variability across reviewers was high. It is also interesting to note that when there is less agreement, the editor’s decision tends to be more strongly associated with the editorial board member’s evaluation than the ad hoc reviewer’s evaluation. This difference may reflect greater trust in the editorial board member’s evaluations because of greater per- ceived expertise or competence. Alternatively, this difference may reflect the fact that board members are chosen by the editor and, therefore, may tend to view research and critical concerns similarly to the editor. Unfortunately, these different interpretations would likely be very difficult for future research to tease apart. Thus, with regard to variable gatekeeping, we have presented data that support its existence, but have done little to systematically describe the sources of this variability.


Social particularism was examined with respect to author gender and institutional affiliation in both the reviewers’ evaluations and the editorial decision. The data offered no evidence of social particularism in either set of evaluations. This observation is consistent with Beyer et al. (1995) and extends this prior research by demonstrating social particularism with respect to author gender and affiliation is not an issue with reviewer evaluations, even in a journal that did not employ a blind review policy.

Some evidence was found for content particularism and suggested that preferences exist among board members for certain study characteristics such as research setting and analysis method. Although it was difficult to tease apart content particularism from legitimate concerns, one interesting finding was that much of the power of the research setting and experimental design to predict editorial board member overall ratings was due to the fact that methods papers as a group received higher ratings than did papers with substantive foci. One possible interpretation of this finding is that methodological papers, on average, are of higher quality than are papers with a substantive focus. A more likely reason, however, is that methodological papers are not subject to the same set of criteria to which substantive papers are subjected. For example, consider a paper by Cortina (1993). This paper involved dis- cussions about the meaning and appropriate applications of coefficient alpha. Operationalization of key constructs is virtually a nonissue for this paper. Furthermore, the research design and the analyses were relatively straightforward. Thus, the number of criteria that must be met by such a paper is considerably smaller than is the number of criteria that must be met by papers with a more substantive focus.

Although editorial board members demonstrated some content particularism, no evidence was found for content particularism among ad hoc reviewer ratings. Similarly, results showed that research design, research setting, subject type, and data analysis type failed to contribute to the prediction of editorial decisions over and above the relevant ratings provided by the reviewers. However, it should be noted that this result would occur not only when content particularism is absent from editorial decisions, but also when editors and reviewers share forms of content particularism. Given that content particularism was found in editorial board member evaluations, the latter possibility cannot be ruled out.

The research reported in this paper does have limitations. With respect to generalizations to the peer review process in general, all of the papers in this study were submissions to a single journal. Ideally, a study like this would sample multiple journals, which would provide


more confidence in our ability to generalize results to a wide variety of journals.

A second limitation is that the present study did not examine the content of reviews and decision letters. This left several questions unan- swered. For example, what aspects of the content of reviews contribute to editorial decisions over and above ratings? What is the impact of a “fatal flaw” on the correlation between overall reviewer recommendations and dimension ratings? What is the impact of the mention of a “fatal flaw” on the editorial decision? Is such information more promi- nent in editorial decision letters than is other information? These and other questions remain to be addressed adequately.

A third limitation is that issues regarding the blind review process are not completely resolved. Because of the relatively small number of “bfind-reviewed” articles contained in the present study, comparisons between blind-reviewed and open-reviewed manuscripts were difficult to conduct. In addition, we only considered two forms of social particularism, author gender and affiliation, and did not consider the more difficult to quantify issues of status, reputation, and friendships. The evidence we present suggests that the extent to which blind review is necessary to reduce potential problems such as particularism may have been overestimated in the past. As suggested by one of the reviewers of the present paper, one reason for this might be that “blind” reviews are rarely truly blind, as many reviewers are able to take educated guesses as to the identity of authors based on writing style, content, papers that are cited, and so forth. Research aimed directly at this issue is needed.

Fourth, the present paper did not address the role that authors can play in the selection of reviewers. Editors may often rely not only on paper content, but also on citations within the present manuscript when choosing reviewers. Authors can sometimes manipulate this process by including citations of people that they would like to serve as reviewers, or by failing to include citations for people that they would not want to serve as reviewers. Future research should examine the extent to which editors attend to such information when selecting reviewers.

Fifth, sample size is an issue for some of the analyses. Although the overall sample was large, the number of cases for some analyses (regressions of individual board members’ recommendations, and individual editor decisions) was relatively small.

Finally, the usual criticism of policy-capturing studies applies in this case as well; that is, we have considered only linear relationships, and these typically produce very good predictive models even in the presence of significant curvilinearity (e.g., Dawes, 1979). The implication, of course, is that linear models fail to capture the true nature of the decision-makers’ policies.


In spite of these potential or real limitations, we beiieve that the results of the analysesreported in this paper indicate that reviewers and editors are focussing their attention on factors that most researchers agree are critical features of empirical research and that they are affected little, if at all, by features of the research or author that might reflect a “bias” of some type.

REFERENCES

Beyer JM, Chanove RG, Fox WB. (1995). The review process and the fates of manuscripts submitted to AMJ. Academy of Management Journal, 38, 1219-1260.

Campbell JP (1982). Editorial: Some remarks from the outgoing editor. JournalofApplied Psychology, 67,691-700.

Campion MA. (1993a). Are there differences between reviewerson the criteria they use to evaluate research articles? The Industrial-Organizational fsychologist, 31 (2), 29-39.

Campion MA. (1993b). Article review checklist: A criterion checklist for reviewing research articles in applied psychology. PERSONNEL PSYCHOLOGY, 46,705-718.

Chase JM. (1970). Normative criteria for scientific publication. American Sociologist, 5,

Cortina JM. (1993). What iscoefficient alpha? An examination of theory and applications. Journal of Applied fsychology, 78, 98-1 04.

Daft RL. (1985). Why I recommend that your manuscript be rejected and what you can do about it. In Cummings LL, Frost PJ (Eds.), Publishingin the organizationalsciences (pp. 193-209). Homewood, IL Irwin.

Dawes, RM. (1979). The robust beauty of improper linear models. American fsychologist,

Fiske DW, Fogg L. (1990). But the reviewers are making different criticisms of my paper! Diversity and uniqueness in reviewer comments. Arnericun Psychobgist, 45, 591- 598.

Gomez-Mejia LR, Balkin DB. (1992). Determinants of faculty pay: An agency theory perspective. Academy of Management Journal 35, 921-955.

Gottfredson S. (1978). Evaluating psychological research reports: Dimensions, reliability, and correlates of quality judgments. American fsychologisf, 33, 920-934.

Hitt MA, Barr SH. (1989). Managerial selection decision models: Examination of config- ural cue processing. Journal ofApplied @cho&y, 74, 53-61.

Jauch LR, Wall JL. (1989). What they do when they get your manuscript: A survey of Academy of Management reviewer practices. Academy of Management Journal 32,

Kerr S, Tblliver J, Petree D. (1977). Manuscript characteristicswhich influence acceptance for management and social science journals. Academy of Management Journal 20,

Lindsay D. (1 978). Thescientificpublication system in socialscience. San Francisco: Jossey- Bass.

Nisbett RE, Wilson TD. (1977). Riling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231-259.

Schwab DI? (1985). Reviewing empirically based manuscripts: Perspectives on process. In Cummings LL, Frost PJ (Eds.), Publishing in the organizational sciences (pp. 171- 181). Homewood, IL: Irwin.

Zuckerman H. (1988). The sociology of science. In Smelser NJ (Ed.), Handbook of sociology (pp. 51 1-571). Newbury Park, C A Sage.

262-273.

34,571-582.

157-173.

132-141.

REVIEWER AND EDITOR DECISION MAKING IN THE JOURNAL REVIEW PROCESS

Documents

Transcript of REVIEWER AND EDITOR DECISION MAKING IN THE JOURNAL REVIEW PROCESS