SELF-OTHER AGREEMENT: DOES IT REALLY MATTER?

PERSONNEL PSYCHOLOGY 1998,SI

SELF-OTHER AGREEMENT: DOES IT REALLY MATTER?

LEANNE E. ATWATER School of Management

Arizona State University West

CHERI OSTROFF Department of Management

Arizona State University

FRANCIS J. YAMMARINO School of Management

SUNY Binghamton

JOHN W. FLEENOR Mediappraise Corp.

A current controversy in the self-other rating and 360-degree feedback literature is the extent to which self-other agreement (and lack of agreement) has an impact on individual and organizational outcomes. Using a large sample and a multi-source data set, the current study addressed some methodological limitations of prior research. Results from polynomial regression analyses demonstrated that both self- and other ratings are related to performance outcomes. This procedure revealed the underlying three-dimensional relationship between self- ratings, other ratings, and effectiveness. Findings indicate that the relationship between self-ratings, other ratings and outcomes are somewhat more complex than previous conceptualizations in this area. Si- multaneous consideration of both self- and other ratings in terms of the direction and magnitude of self- and other ratings is important for explaining effectiveness outcomes.

Upward and 360-degree feedback are rapidly growing in popularity as leadership development and/or performance appraisal tools (London & Smither, 1995). Generally, these processes involve surveying subordinates (or subordinates, peers, supervisors, and customers in the case of 360-degree feedback) about a manager’s performance and then providing averaged ratings as feedback to the manager about how others rated him or her. The rationale behind this type of intervention rests with the notion of self-perception. Because individuals are not very good at evaluating themselves similarly to others or objective criteria (Harris & Schaubroeck, 1988), anonymous feedback from subordinates should

Correspondence and requests for reprints should be addressed to Leanne Atwater, School of Management, Arizona State University West, 4701 West Thunderbird Road, Phoenix, A 2 85069-7100.

COPYRIGHT 8 1998 PERSONNEL PSYCHOLOGY, INC.

577

578 PERSONNEL PSYCHOLOGY

help managers see themselves as others see them, and provide them with developmental feedback about needed changes in their behavior.

The feedback process also allows the comparison of self- and other ratings, that is, assessments can be made of whether self-ratings are higher, lower, or in-agreement with ratings provided by others. At present, there is controversy concerning the relevance of self-other agreement for predicting individual outcomes such as performance or effectiveness. Specifically, some earlier work has suggested that whether an individual rates himself or herself similarly to others, or provides high or low self-ratings compared to others has no impact on whether the self-rater is effective on the job (cf. Fleenor, McCauley & Brutus, 1996). Others have suggested that self-other agreement is related to effectiveness (cf. Atwater & Yammarino, 1997). The controversy concerning the relationship between self-other agreement and effectiveness is the focus of this paper.

Case for Agreement

Recent work by Atwater and Yammarino (1992); Van Velsor, Tay- lor, and Leslie (1993); Atwater, Roush, and Fischthal (1995); and others has suggested that the degree and direction of self-other agreement are relevant to various outcome measures. For example, Van Velsor et al. (1993) found overraters (those with self-ratings above other ratings) received the lowest subordinate ratings of managerial practices and self- awareness when compared to underraters (those with self-ratings below other ratings) and those in-agreement (those with self-ratings similar to other ratings). Atwater and Yammarino (1992) assessed both predictors and outcomes of self-other agreement and discovered that self-other agreement was a moderator of predictor-leadership and leadership- performance relationships. Specifically, in their study, leadership ratings of overestimators, underestimators, and in-agreement estimators were predicted by different variables. In addition, performance evaluations were positively related to leadership ratings for in-agreement estimators, but not for under- or overestimators. Recommendations for promotion were negatively related to leadership for overestimators, positively related for those in-agreement, and unrelated for underesti-

‘We are interested in self-other rating agreement (ix., the match between one’s self- rating and n other ratings of the focal individual). The degree of agreement then permits us to discuss “categories of agreement” (i.e., overestimators, underestimators, in agreement/goods, and in agreementipoors) as well as the “accuracy” of a self-rating in relation to the other rating. In brief, we are focusing on a simple comparison process of self- and other ratings and not “accuracy” in the more traditional psychometric sense of a rating estimate in relation to some “true score.”

LEANNE E. ATWATER ET AL. 579

mators. The conclusion drawn from this study was that self-other agreement, as an individual difference variable, could moderate predictor- outcome relationships.

Atwater and Yammarino (1997), in their review and model of self- other rating agreement and its relationship to outcomes, suggested that both degree and type of agreement between self- and other ratings were relevant to performance or effectiveness as well as to leadership training and feedback efforts. They proposed a 4-group categorization model that included overestimators, underestimators, in-agreement/good estimators and in-agreement/poor estimators. In-agreement is defined as self-ratings within a half-standard deviation of other ratings. In- agreemendgood estimators have other ratings above the mean and in- agreement/poor estimators have other ratings below the mean. This ex- tension reflected the fact that those agreeing with others that they were good performers likely differed in meaningful ways from those agreeing with others that they were poor performers.

Regarding the effectiveness of these four groups, Atwater and Yam- marino (1997) contended that overestimators would be expected to be poorer performers and to be less effective. The effectiveness of underestimators would be mixed because, although underestimators were always trying to improve, they would also likely suffer from low self-confidence. In-agreement/good estimators would be good performers with views of themselves that were similar to those held by others. In-agreement/poor estimators would be poor performers that recognized their poor performance, but were unwilling or unable to make changes. In summary, these researchers contended that the direction of difference between self- and other ratings, as well as the magnitude of that difference, were relevant to predicting and understanding the self-rater’s performance. An opposing argument is presented below.

Case Against Agreement

Recently, Fleenor et al. (1996) expanded the 4-group model proposed by Atwater and Yammarino (1997) to six groups. They proposed that overestimators and underestimators could receive either above or below average scores from others. In their model, high and low performers are distinguished from overestimators, underestimators, and in-agreement estimators, resulting in six categories. Using the 6-group categorization model, they tested relationships between self-other rating congruence and concluded that variance in effectiveness for over-, under-, and in-agreement estimators could be accounted for simply by others’ ratings (i.e., whether ratings were above or below the mean), and that the degree or type of agreement was unimportant.

580 PERSONNEL, PSYCHOLOGY

Brutus, Fleenor, and Taylor (1996) drew a similar conclusion-effectiveness ratings were accounted for primarily by peer ratings and neither self-ratings nor self-other agreement had much relevance. In summary, they concluded that although self-other agreement may be a relevant comparison for training purposes, it was not useful for predicting or understanding individual performance or effectiveness.

Problems with Previous Work

Both conceptual and methodological problems have hindered the ad- vancement of self-other agreement research. Specifically, researchers have largely neglected two critical issues: (a) the conceptual clarification of what self-other agreement means, and (b) the relationship between the precise conceptualization and the appropriate operationalization of self-other agreement. That is, researchers in the area of self-other agreement have not provided explicit definitions of the conceptual form of the relationship between self-ratings, other ratings, and outcomes, nor have they provided strong rationales for choosing one analytic strategy or measure of self-other agreement over another.

Many researchers in the self-other agreement area have suggested that self- and other ratings interact in influencing outcomes. Yet, one problem with this perspective is that little theoretical explanation as to how the two elements “interact” (e.g., Edwards, 1991; Pervin, 1978; Ter- borg, 1981) has been provided. One way to view how the two elements interact is based on the well accepted definition of interaction, whereby the relationship between an independent variable and a dependent variable is different for different levels of a third variable. In other words, such an interaction in the self-other agreement area would imply that the slope of the line depicting the relationship between other ratings and outcomes would differ for different levels of self-ratings. Further, when considering the form of the interaction, the underlying assumption is that agreement results in higher outcomes than a lack of agreement. When both self and other agree that the ratee is “good” and when self and other believe the ratee is “poor,” outcomes will be high; whether self- and other ratings are high or low is unimportant. Although in none of the studies reviewed was it clear that self-other agreement researchers imply this form of a relationship between self-ratings, other ratings and outcomes (e.g., Brutus et al., 1996) researchers have often assumed that a significant interaction is a requirement for supporting self-other agreement hypotheses. Further, in recent years, it has been clearly established that agreement relationships can take a wide variety of functional forms (e.g., Edwards, 1993; Edwards, 1994; Edwards & Van Harrison, 1993; Kulka, 1979). The key issues are specifjing the form of the relationship


and using appropriate analytical techniques that allow discovery of the form.

As noted earlier, the general consensus in conceptualizations of self- other agreement is that overestimators should have lower outcomes than underestimators. Further, individuals whose ratings correspond to those of others when the ratings are at a high level (in-agreement good) should have the highest outcomes, while individuals whose ratings correspond to those of others when the ratings are at a low level (in-agreement poor) should have low outcomes. However, the relative effects of disagreement at varying levels of the attribute have not been clearly specified. Consequently, at present, the specific form of the hypothesized relationship between self-ratings, other ratings and outcomes is unclear.

Nevertheless, a wide variety of indices have been used to represent self-other agreement (e.g., algebraic difference scores, categorical agreement, and interaction terms) and these different indices of self- other agreement imply very different functional forms of the relationships among the variables of interest. In some studies, an agreement index has been created based on a correlation between self and other scores (e.g., London & Wohlers, 1991; Wohlers & London, 1989), or based on the magnitude of the difference score between self-ratings and other ratings (e.g., Nilsen & Campbell, 1993), and then these agreement indices have been correlated with an outcome variable such as performance. In other studies (e.g., Brutus et al., 1996; Nowack, in press), regression-based models have been used to examine the effect of self- ratings, other ratings, and their interaction on some outcome. And, in yet other studies (e.g., Atwater & Yammarino, 1992; Atwater, Roush & Fischthal, 1995, Fleenor et al., 1996; Roush & Atwater, 1992; Van Velsor, Taylor, and Leslie, 1993), categorical agreement has been used whereby individuals are grouped into categories based on the difference or the absolute difference of self-other scores, and then differences among mean scores on the outcome variables are examined. The problem with these approaches is two-fold. First, each of these indices as- sumes a specific type of functional form to represent the self-other agreement model. However, the functional form of the relationship among the variables to be tested has rarely been theoretically specified. Sec- ond, the indices used to assess agreement are often flawed (see Edwards 1993, 1994 for a thorough discussion of their problems). Rarely have researchers in this area considered both the theoretical functional form of self-other agreement and the appropriate operational procedures for testing the functional form hypothesized. With one exception (Brutus et al., 1996), researchers have devised single indices to represent the degree of self-other agreement. Optimally, the relationship between self- ratings, other ratings, and outcomes should be conceptualized in three


dimensions (Edwards, 1994). In such a way, both self- and other ratings can be viewed as distinct constructs and a more accurate picture of the relationship can be detected.

Earlier work has suggested a wide variety of functional forms of congruence or agreement (for a more complete review of congruence models and functional forms, see Edwards 1991, 1993, 1994; Kulka, 1979). Forms include an absolute difference model where correspondence between self- and other ratings results in the highest ratings and lack of agreement results in lower outcomes (see Edwards, 1993 for a graphic depiction of this absolute difference model). An alternative hypothesis regarding the relationship between self-ratings, other ratings and outcomes is based on a monotonic relationship. As self-ratings become lower than other ratings, outcomes increase; as self-ratings become higher than other ratings, outcomes decrease (see Edward's 1993 algebraic difference model for a graphic depiction of this relationship). Researchers (e.g., Nilsen & Campbell, 1993) who have used algebraic differences between self- and other scores have implicitly tested this model. In a third model, an additive main effects model, both self- ratings and other ratings are both believed to be related to outcomes. Along the line of perfect agreement, agreement at higher levels of rated behaviors results in higher outcomes than agreement at lower levels of outcomes. Further, from any given point of agreement, as one moves away from perfect agreement, lack of agreement when self-ratings are lower than other ratings is related to higher outcomes (underestimator) than lack of agreement when self-ratings are higher than other ratings (overestimator). This is a linear model with both variables having a positive (but unequal) relationship to outcomes. Researchers hypothesizing 4-group and 6-group conceptualizations of agreement (cf., Atwater & Yammarino, 1997; Fleenor et al., 1996) implicitly hypothesize some as- pects of this functional form, but specific tests of this form have not been conducted.

Finally, some researchers (e.g., Brutus et al., 1996) have concluded that self-other agreement is not relevant, and that effectiveness is accounted for solely by others' ratings. This is a single main effects model (see Edwards, 1993, for a graphic depiction), with only one element (other ratings) having a significant positive relationship to outcomes.

A variety of possible agreement models can be delineated and those described above do not encompass all previous conceptualizations of agreement. Nevertheless, we believe it is critical that self-other researchers begin to specify the functional form of the agreement model they are hypothesizing so that conceptual representations of agreement


are explicitly defined. Further, serious problems are associated with previous operationalizations of agreement (e.g., difference scores, correlations, categorical agreement) and problems with using these types of measures have been clearly delineated (e.g., Edwards, 1994). The use of these indices provides no conceptual or analytic advantage over using separate measures (e.g., separate self- and other ratings), and the computation of a single index of agreement (e.g., a difference score, categories) should be avoided in favor of separate measures (Edwards, 1991). Thus, we strongly advocate that self-other agreement researchers determine the functional form of the hypothesis they wish to test, and test the form using the quadratic model advocated by Edwards (1991, 1994). That is, regression-based models which use five predictors (self, other, self squared, other squared, and the product of self- and other ratings) yield a surface that can have slope, curvature and tilt, and can test virtually any functional form of agreement, while avoiding problems associated with commonly used indices of agreement. (See Edwards, 1993, 1994 for more details regarding the observed betas for supporting other functional forms.)

Current Study

The relevance of the ope or direction of agreement (over, in-agreement, under) as well as degree of agreement and level of behavior as rated by self and other need to be considered if the effects of self-other agreement on outcomes are to be assessed accurately. Previous conceptualizations of self-other agreement (e.g., Atwater & Yammarino, 1997; Fleenor et al., 1996) have proposed that: (a) overestimators have low effectiveness; (b) individuals whose ratings correspond with others that their performance is high have the highest effectiveness; (c) individuals whose ratings correspond with others that their performance is low have the lowest effectiveness; and (d) underestimators may have high or low effectiveness depending on the nature of the outcome measure. This conceptualization suggests an additive model where both self- and other ratings, but not their interaction, are related to outcomes.

Rationale for hypothesizing an additive model of self other agreement and effectiveness. The conceptual rationale for adopting the model that represents two positive main effects for self and other is rather straight- forward. First, there is a large literature on the value of “other” sources for providing valid and reliable ratings and the relationship of these ratings to various outcomes (e.g., performance; see Bass, 1990; Edwards, 1993,1994; Harris & Schaubroeck, 1988; Landy & Farr, 1980; Wohlers & London, 1989). This work provides justification for an “other” main effect in any model adopted.


However, other ratings alone do not explain or account for all of the variance in criteria. There is also literature on the value of the “self” for providing ratings. Although these self-ratings may be less valid and reliable than other ratings (see Hoffman, 1923; Podsakoff & Or- gan, 1986; Thornton, 1980) and may account for less variance in cer- tain criteria (cf. Podsakoff & Organ, 1986), they are nevertheless important in the performance and related literatures (see Bass, 1990; Harris & Schaubroeck, 1988; Landy & Farr, 1980). This work provides justification for the “self” main effect in any model adopted to predict outcomes.

Third, again given an extensive literature (e.g., Ashford, 1989; Bass, 1990; Edwards, 1993,1994; Harris & Schaubroeck, 1988; Landy & Farr, 1980; Mabe & West, 1982; Thornton, 1980), these “self” and “other” sources tend not to provide ratings that correlate very well with one another. This may be due, in part, to the relatively different perspectives of the focal individual held by the self and various relevant others. This lack of relationship and correlation, as well as the unique perspectives, diminishes the likelihood of the self and other sources working in an interactive way; however, this does not preclude simultaneous consideration of self- and other ratings.

Self- perception and self-ratings are affected by numerous factors (e.g., age, gender, tenure, personality characteristics, self-esteem, comparative in- formation; see Atwater & Yammarino, 1997, for an extensive review and integration); and these factors determine the accuracy of self-ratings. When these self-ratings are compared to ratings provided by others (which also are affected by numerous factors), the degree of agreement between self- and other ratings can be used to characterize self-raters in terms of their agreement/disagreement with ratings provided by others.

Researchers ( e g , Paulhaus, 1986; Sackheim, 1983; Taylor & Brown, 1988) have suggested that when self-ratings are higher than ratings provided by others (i.e., self-raters are overestimators) the difference often results from self-enhancement bias. Although this bias reflects a positive self-view and results in fewer negative thoughts and higher expectations of success in new endeavors, i t also leads overestimators to ignore criti- cism and discount failure which, in turn, may result in poorer future performance (cf. Atwater & Yammarino, 1997; Paulhaus, 1986; Sackheim, 1983; Taylor & Brown, 1988).

As noted by Ashford (1989) and Taylor and Brown (1988), self- perception is a key element of the self-regulation process. In the case of enhanced self-assessment (i.e., overestimators), a falsely positive sense of accomplishment may lead people to pursue tasks for which they are ill-suited or unrealistic optimism may lead people to ignore risks or fail to

The expected impact of self- and other ratings on uutcomes.


prepare for uncertainties. These overestimators also are likely to misdi- agnose their strengths and weaknesses, and have negative attitudes stem- ming from the fact that they rarely feel they receive the credit they de- serve. Moreover, empirical studies have demonstrated that individuals who provided inflated self-ratings relative to the ratings of others are poorer performers, less effective leaders and more likely to suffer ca- reer derailment (e.g., Atwater & Yammarino, 1992; Bass & Yammarino, 1991; Flocco, 1969; Van Velsor et al., 1992).

In the case of negative self-perceptions and assessments lower than those of others (i.e., underestimators), unfounded negative assessments can keep individuals from pursuing options for which they are qualified; yet, these underestimators tend to be somewhat successful and effective because they overestimate their weaknesses and underestimate their strengths. This tendency to overestimate weaknesses could lead the individual to work hard to compensate. consequently, a tendency to compensate for weaknesses with hard work will likely result in greater success at tasks they undertake (see Ashford, 1989; Atwater et al., 1995; Atwa- ter & Yammarino, 1997; Taylor & Brown, 1988; Yammarino & Atwater, 1997).

In contrast, work by Smircich and Chesser (1981) suggested that agreement, whether based on high or low ratings (i.e., in-agreement/good and in-agreemendpoor, respectively), is preferable to disagreement (i.e., under- or overestimation) because it indicates some level of understanding between self and other. In the optimal case, where self- and other ratings are in agreement and favorable (i.e., in-agreemendgood), the focal individual understands how he or she is seen by others, feedback to this individual is positive, expectations for reward and recognition are re- alistic, and very positive outcomes should result (e.g., high performance, effective leadership, and promotability) (see Atwater et al., 1995; At- water & Yammarino, 1997; Yammarino & Atwater, 1997). In the case where self- and other ratings are in agreement and unfavorable (i.e., in- agreemenvpoor), the focal individual understands how he or she is seen by others, but feedback to this individual will likely be negative and self- worth low. Motivation to improve may be high or low, depending on whether the individual believes he or she can improve and whether the resources are available for improvement. Negative outcomes generally result for the in-agreement/poor estimators (e.g., low performance and promotability) and few actions are taken to improve performance, (see Atwater et al., 1995; Atwater & Yammarino, 1997; Smircich & Chesser, 1981; Smither et al. 1995; Yammarino & Atwater, 1997).

Thus, based on the attitudes and behaviors expected to accompany self-rating tendencies and self-other agreement, we expect the cype and degree of self-other agreement to be related to the effectiveness of the


self-rater. We propose, following the procedure outlined by Edwards (1993, 1994), that self-ratings and other ratings be viewed as separate measures and that the form of the relationship between self-ratings, other ratings and outcomes be viewed in three dimensions. Further, we hypothesize that the functional form of agreement will be an additive one whereby self- and other ratings will be positively related to outcomes. We do not hypothesize non-linear or interaction effects for self- and other ratings. Rather, the hypothesis is that both self- and other ratings must be considered simultaneously in their relationship to outcomes. That is, we believe that along the line of perfect agreement (self-equal other ratings), outcomes will be greater as ratings are higher. For any given point of agreement, deviations from agreement will be such that underestimators will have equal to or only slightly lower outcomes than those in-agreement, and overestimators will have lower outcomes than both underestimators and those in-agreement. In addition, given earlier work done on the validity of self- and other ratings, we expect other ratings to account for more variance than self-ratings. Finally, most previously cited literature has addressed manager-subordinate agreement. Based on prior research, the relationship hypothesized above would be expected for manager-subordinate agreement. However, with the increasing use of 360-degree feedback instruments where peers also provide ratings, it would be interesting to compare the functional form of self-peer congruence with self-subordinate congruence in relation to outcomes. As such, in this study, we explored the relationship between self-ratings, peer ratings and outcomes, as well as the relationship between self-ratings, subordinate ratings, and outcomes.

Method

Sample

Data were collected from and about 1,460 managers who partici- pated in leadership development programs. As part of each program, a multi-rater feedback instrument was completed by the manager, his or her peers, and subordinates. The manager’s effectiveness was rated by each manager’s direct supervisor. The sample of managers was 69% male and 88% White. The average age of the managers was 42. Partic- ipants represented six levels of management and a variety of public and private sector organizations.

Fourteen hundred and forty-six managers completed self-ratings, 3,939 subordinates completed surveys about their managers, 3,958 peers completed surveys about managers, and 1,012 direct supervisors of the managers completed effectiveness ratings of the managers. The number


of subordinate surveys completed per manager ranged from 1 (for 3% of the managers) to 14 (for less than 1% of the managers). The average number of subordinate surveys per manager was 4. The number of peer surveys completed per manager ranged from 1 (for 3% of the managers) to 22 (for less than 1% of the managers). The average number of peer surveys per manager was 5. In total, 1,326 managers had data from subordinates, self, and supervisor, and 1,374 had data from peers, self, and supervisor.

Measures

Benchmarks, a multi-rater feedback instrument that collects ratings on managerial behaviors (Lombardo & McCauley, 1994; McCauley, Lombardo, & Usher, 1989; Zedeck, 1995), was used to collect performance data about managers in this study. Included in the survey are 16 scales (106 items) that measure a variety of managerial strengths and weaknesses in areas such as leading people, building and mending relationships, and acting with flexibility. All items are assessed on a 5-point scale, where 5 is high. For the purposes of this study, the 16 scales were averaged into one measure of overall managerial performance. Prior research has demonstrated high reliability for the scales as well as accept- able validity as a measure of managerial performance (see Lombardo & McCauley, 1994).

Scale reliabilities (Cronbach’s alphas) on the Benchmark scales, cal- culated for each rater group, (i.e., for self, subordinate, and peer ratings), were high for all measures (i.e., ranging from .75 to .93) indicating adequate reliability for use in subsequent analyses (see Lombardo & Mc- Cauley, 1994) . When the scales were used as items, alpha reliabilities ranged from .89 to .95 (see Brutus, Fleenor, & London, in press).

To justify aggregation across raters for a manager, intraclass correlations (ICCs) were computed. ICCs for subordinate ratings on Bench- marks had been computed for subordinate ratings on a previous com- parable sample of managers (see Fleenor et al., 1996) and were found to range from .47 to .70 which was adequate to justify aggregating subordinate ratings (Van Velsor & Leslie, 1991). We computed ICCs for the peer ratings on Benchmarks for two samples of 500 managers in this study and obtained ICCs of a similar range (.43 to .69). These ratings also were considered high enough to justify aggregating peer ratings for managers with multiple raters. Thus, each manager received an aggregated score across subordinates and an aggregated score across peers, as well as a self and a supervisor score.

The outcome measure used in this study was a measure of managerial effectiveness collected from each manager’s supervisor on a 16-item


TABLE 1 Means, Standard Deviations, Alphas and Correlations for Managerial Performance (MP) and Effectiveness Ratings

SD Alpha SELF SUB PEER _ _ _ _ ~

Self ratings of MP

Subordinate ratings of MP

Peer ratings of MP

Supervisor ratings of

(SELF) 3.79 .32 .89

(SUB) 3.69 .41 .95 .25***

(PEER) 3.68 .38 .94 .26*** .50***

effectiveness (EFF) 3.62 .58 .91 .25*** .33*** .31***

* * * p < ,001

survey. It should be noted that this outcome measure is a perceptual measure and an “other rating.” However, the effectiveness measure was collected from a different source on a different instrument than the self- and other ratings of managerial behaviors. More objective measures (i.e., not perception based) would have been desired, but were unavail- able.

The effectiveness items assessed elements such as starting a project from scratch, negotiating a major contract, and taking on many additional responsibilities. These items also were assessed on a 5-point scale, where 5 was a high score. (The effectiveness items are included in the Appendix.) Each manager received a score representing the average of the 16 items as a measure of his or her managerial effectiveness.

Results

Means, standard deviations, alpha coefficients, and intercorrelations among the self, subordinate, and peer ratings of managerial performance and the supervisor ratings of effectiveness are presented in Ta- ble 1.

To test the functional form of the congruence-outcome relationships, hierarchical regressions were computed whereby supervisory ratings of effectiveness were regressed on self-ratings and other ratings in the first step (Model l), and the cross-product of self times other ratings, the square of self-ratings and square of other ratings were added in the second step (Model 2). A significant increase in R2 in Step 2 indicates a nonlinear relationship between effectiveness and self- and other ratings.

Results for supervisory ratings of effectiveness when self- and subordinate ratings were considered are presented in Table 2, and results for


TABLE 2 Regressions of Managerial

Effectiveness on Self- and Subordinate Ratings

Model 1 Model 2 Betas Betas

Variable (s.e.) (s.e.)

Self rating .36** - 1.70* (.05) (.75)

t.04) (52)

(. 10)

( . I11

(JW

Subordinate rating .39** .83

Self X self .28**

Self X subordinate -.01

Subordinate X subordinate - .06

R .38 .39 R2 .I4 .I5 F 1 1 1.29' * 46.50" R2 n .01 F A 2.97'

* p < .05 * * p < .01

effectiveness when self- and peer ratings were considered are presented in Table 3. The pattern of results was similar in both cases.

Given the significance of both Model 1 (self- and other ratings) and Model 2 (full model with all five terms entered), and given the significance of the increase in R2 indicating that the higher-order terms account for significant additional variance in effectiveness, Edwards (1993, 1994) suggests interpreting the surface which corresponds to the equations (see Figures l and 2). According to Edwards and his colleagues (Edwards & Parry, 1993; Edwards & Van Harrison, 1993), salient fea- tures of the surface can be identified in the following manner. First, we let a l = bl + b2, and a2 = b3 + b4 + b5, where bl is the beta for self- ratings, b2 is the beta for other ratings, b3 is the beta for self-squared, b4 is the beta for the cross-product of self and other, and b5 is the beta for other squared. If al differs from zero, then there is a linear slope along the line of perfect agreement when self = other ( S = 0). Further, if a2 is positive, then this surface is curved upward or is convex along the S = 0. In our case, for self-subordinate ratings, a modest convex slope was observed (al = -369, p < .07; a2 = .209, p < .lo). For self-peer ratings, the line was convex (al = -1.306, p < .01; a2 = .277, p < .05). That is, when self- and other ratings are in agreement and high, effectiveness is high. Effectiveness decreases as self- and other ratings agree and become lower, and then increase again slightlywhen self- and other ratings


TABLE 3 Regressions of Managerial

Effecfiveness on Self- and Peer Ratings

Variable

Model 1 Betas (s.e.)

Model 2 Betas (s.e.)

Self rating

Peer rating

Self x self

Self X peer

Subordinate X peer

R RZ F RZ a F A

.29" -1.60' (.05) (.72)

(,05) (54)

(.lo)

(.13)

1.06)

.49** .29

.30**

-.lo

.08

.40 .41

.16 .17 131.01 * * 54.97**

.02 3.75'

* P< .05 * * p < .01

agree and are very low. We also tested the slopes along the reverse of the S, = 0 line (e.g., when self = 5, other = 1). Here, we let XI = bl - b2, and x2 = b3 - b4 + bg. Similar to the interpretations along the S = 0 line, ifxl differs from zero, there is a linear slope along the reverse S, = 0 line, and if xg is greater than 0, the surface curves upward along the S, = 0 line. In our case, this line was convex for self-subordinate ratings (XI = 2.52, p < .01; x2 = .232, p < .07) and self-peer ratings (XI = 1.89, p < .01; x2 = .484, p < .Ol). These results indicate that effectiveness is high for severe underestimators and is lower for severe overestimators.

In addition, i t is possible to examine some trends in the surface by considering lateral shifts in the surface along the S = 0 line, perpendicular to the S = 0 line. This indicates whether the minimum value or lowest effectiveness score is displaced laterally from the S = 0 line. The magnitude and direction of this shift is determined by the quantity (b2 - b1)/2(b3 - b4 f bS). A positive value indicates a shift toward the region where S > 0, and a negative value indicates a shift toward the region where S < 0. Further, it is possible to examine rotations in the surface, such that the minimum does not lie along or is not parallel to the S = 0 line. If b3 is less than bg, the surface rotates clockwise, and if b3 is greater than bg, the surface rotates counterclockwise. The magnitude of the rotation is also determined by bq, with larger rotations for smaller values of b4. In our analyses, b3 was greater than bg, with a fairly


4.00-4.50 rn 3.00-3.50 m2.00-2.50

Figitre I : Results for Effectiveness with Self- and Subordinate Ratings

small value for b4, indicating a clockwise rotation. In the present study, for both self-subordinate and self-peer analyses, the value indicating a lateral shift was positive suggesting that the minimum value of effectiveness occurs in the region where self-ratings are greater than other ratings (overestimator), in the region where self-scores are moderate and other scores are low. Further, b3 was slightly larger than b5, indicating some clockwise rotation such that minimum values are not parallel to the S = 0 line.

It should also be noted that lack of agreement bctween sdf- and othcr ratings is represented by rnovcment along the lines perpendicular to the


El 4.50-5.00 63 3.50-4.00 02.50-3.00

84.00-4.50 3.00-3.50

Figure 2: Results for Effectiveness with Self- and Peer Ratings

S = 0 line.2 In other words, given a point along the S = 0 line (e.g., 3,3) with a sum of S + 0 (e.g., 6 ) , the corresponding self- and other points must sum to this S + 0 value (e.g., self = 4, other is 2; self = 5, other = 1; self = 2, other = 4). Thus, given any point along the S = 0 line, lack of' agreement is represented by movement along lines perpendicular to the S = 0 line. In our casc, we find that for a given point along the S = 0 line, beginning with thc points 3.5 and below, effectiveness increases as

~

We are grateful to an anonymous reviewer for clarilying this point.


S < 0 (underestimation) and effectiveness decreases as S > 0 (overestimation). Beginning at high ranges of S = 0 (e.g., above 4), effectiveness increases slightly with both overestimation and underestimation, and the increase is slightly greater for overestimators.

In general, these results indicate that effectiveness tends to be greater when both self- and other ratings are high than when they are low. Fur- ther, for any given level of ratings, effectiveness tends to be lower when self-ratings are greater than other ratings, particularly at lower levels of ratings.

Discussion

Although earlier work has questioned the relevance of self-other agreement to managerial outcomes of effectiveness or performance, results of the present study lend support for the importance of considering both self-ratings and other ratings. Specifically, some research (Brutus et al., 1996; Fleenor et al., 1996) has suggested that self-other agreement was not relevant; that it was the confounding of agreement group and performance level that allowed agreement group to appear important. These authors have concluded that simultaneous consideration of both self- and other ratings is of little importance; other ratings are most important for explaining managerial outcomes.

Results from the present study clearly show the importance of first conceptualizing the form of the agreement relationship and then using appropriate tests of the hypothesis. Some of the confusion and contra- dictory findings in previous studies can be explained by the failure to consider the functional form of the agreement relationship and the failure to conduct appropriate tests of the hypotheses implied by the researchers. These methodological issues are important ones. Recent studies have attempted to assess the relevance of self-other agreement using regression analysis, entering self, other, and the self-other interaction terms to predict a criterion, concluding that if the interaction term was not significant, self-other agreement was not important. Using the cross-product term (self x other) specifies a form of the relationship between self- ratings, other ratings and outcomes. This form of the relationship is not reflective of some of recent theorizing in the literature (e.g., Atwater & Yammarino, 1997; Fleenor et al., 1996). In fact, the functional form of the relationship between self-ratings, other ratings and managerial outcomes has not been clearly specified in previous conceptualizations.

The results presented here indicate the relationship between self- ratings, other ratings, and outcomes may be more complex than earlier conceptualizations of the relationships. We find that it is important to


simultaneously consider self-ratings and other ratings in explaining managerial effectiveness. Further, it is important to consider the magnitude of the ratings and the direction of lack of agreement (i.e., self greater than other vs. self less than other). Our results indicate that effectiveness is highest when both self- and other ratings are high, and when self-ratings are substantially lower than other ratings (severe underestimation). Effectiveness is lowest for overestimators when self-ratings are moderate and subordinate ratings are low. Further, considering the lines representing lack of agreement, movement away from the point of agreement (when this self = other agreement point is moderately high or below), reveals that effectiveness tends to increase for underestimators and decrease for overestimators.

The results from this study are generally consistent with our predictions about the form of the relationship. We hypothesized that along the self-other agreement line, effectiveness would be greater as rating levels increased. This trend was supported, although we found a slight upward curvature in the line such that effectiveness decreased and then had a slight increase when ratings were in agreement at low levels of the rated behaviors. In terms of lack of agreement, we hypothesized that at any given point of agreement, effectiveness would decrease as lack of agreement increased when self-ratings are greater than other ratings (overestimators). Some support was found for this notion at lower levels of the attribute, but not at higher levels. It was also hypothesized that at any given point of agreement, effectiveness would be the same or would decrease only slightly when self-ratings are less than other ratings (underestimators). Our findings suggest that effectiveness increases with underestimation.

It may be that those self-raters who receive high ratings from others but provide still higher self-ratings are effective because their weaknesses have a minor impact on their effectiveness, whether they recognize them or not. However, those who are rated mediocre or low by others, yet higher by themselves may have critical weaknesses they do not recognize which are negatively impacting their effectiveness. In terms of the underestimators being more effective, it may represent an interest in continually striving to improve and not becoming overconfident or complacent. As an anecdote, we have conducted 360-degree feedback sessions where feedback recipients see their self-ratings and those provided by others. It is always interesting to note that the majority of those with the highest other ratings and lowest self-ratings are very concerned about any area where they were rated low by others. Although they do not see themselves doing that well, they sure want others to see them as


a good performer. Future research is clearly needed to better understand the reasons for relationships between self- and other ratings and effectiveness.

In summary, this study has clearly supported the relevance of considering both self-ratings and other ratings in explaining outcomes such as supervisor ratings of managerial effectiveness. Research should continue to assess those outcomes for which self-other agreement matters and those for which it is unimportant. That is, self-other agreement may be important for some performance and outcome measures, but not for others. For example, our outcome measure was a rating of effectiveness based on supervisors’ perceptions. Although the ratings of effectiveness were obtained from a different source on a different survey instrument than the self- and other ratings, both predictors and outcome were based on ratings and thus, likely shared some method variance.

It is also possible that self-other agreement is most relevant to outcomes that involve human perceptions and less relevant to more objective measures such as sales volume or meeting productivity goals. For example, if overestimation is considered almost “trait-like’’ in that it may represent a constellation of individual variables such as arrogance and lack of self-awareness, this rating tendency may be most relevant to outcome measures that reflect interpersonal relationships such as perceptions of effectiveness. Lack of self-awareness is more likely to impact interpersonal relationships than meeting productivity goals. A hypo- thetical example follows. Highway patrol officers being considered for promotion are evaluated both on the number of contacts (e.g., arrests, stops) and their leadership potential. Overestimation may not be related to number of contacts, but may be related to judgments of leadership potential.

More research is needed with different types of outcome measures as well as with different constituencies of comparison others. Most research has used subordinates as the comparison others. In this study, both subordinates and peers were used as comparison others. More research is needed to determine the forms of agreement that are appropriate for different comparison groups.

Finally, we should continue to investigate just what being an over- or underestimator means. Is there a personality constellation that this rating style represents as we suggested above? To what extent does feedback temporarily or permanently alter self-other agreement? If overestimators become in-agreement raters or underestimators, do perceptions of their managerial effectiveness or other outcomes improve? These and other questions should be addressed as we continue to try to understand the relevance of self-other agreement and its influence on individual and organizational outcomes.


REFERENCES

Ashford S. (1989). Self-assessments in organizations: A literature review and integrative model. In Cummings LL, Staw BM (Eds.), Research in orgnnizational behavior (Vol. 1 1 , pp. 133-174). Greenwich, C T JAI Press.

Atwater L, Roush P, Fischthal A. (1995). The influence of upward feedback on self and follower ratings of leadership. PERSONNEL PSYCHOLOGY, 48, 35-59.

Atwater L, Yammarino F. (1992). Does self-other agreement on leadership perceptions moderate the validity of leadership and performance predictions? PERSONNEL PSY-

Atwater L, Yammarino F. (1997). Self-other rating agreement: A review and model.

Bass B. (1990). Bass and Stogdill’s handbook of leadership. New York: Free Press. Bass B, Yammarino F. (1991). Congruence of self and others’ leadership rating of naval

officers for understanding successful performance. Applied Psychology: An Inter- national Review, 40,431-454.

Brutus S , Fleenor J, London M. (in press). Does 360-degree feedback work in different industries? A between industry comparison of the reliability and validity of multi- source performance ratings. Journal of Management Development.

Brutus S , Fleenor J, Taylor S . (1996, April). Methodological issues in 360-degree feedback research. Paper presented at the Annual Conference of the Society for Industrial and Organizational Psychology, Inc. San Diego.

Edwards JR. (1991). Person-job fit: A conceptual integration, literature review, and methodological critique. In Cooper CL, Robertson IT (Eds.), Intemationa[ Review of Industrial and Organizational Psychology, 6, 283-357.

Edwards JR. (1993). Problems with the use of profile similarity indices in the study of congruence in organizational research. PERSONNEL PSYCHOLOGY, 46, 641-665.

Edwards JR. (1994). The study of congruence in organizational behavior research: Cri- tique and proposed alternative. Organizational Behavior and Human Decision Pro- cesses, 58, 683-689.

Edwards JR, Parry M. (1993). On the use of polynomial regression equations as an alternative to difference scores in organizational research. Academy of Management Journal, 36, 1577-1613.

Edwards JR, Van Harrison R. (1993). Job demands and worker health: Three-dimensional reexamination of the relationship between person-environment fit and strain. Jour- nal of Applied Psychology, 78, 628-648.

Fleenor J, McCauley C, Brutus S. (1996). Self-other rating agreement and leader effectiveness. Leadership Quarterly, 7, 487-506.

Flocco E. (1969). An examination of the leader behavior of school business administrators. Disserfation Abstracts Intemational, 30, 84-85.

Harris M, Schaubroeck J. (1988). A meta-analysis of self-supervisor, self-peer, and peer- supervisor ratings. PERSONNEL PSYCHOLOGY, 41, 43-62.

Hoffman G. (1923). An experiment in self-estimation. Journal ofA6nomal and Social Psychology, IS, 43-49.

Kulka RA. (1979). Interaction as person-environment fit. In Kahle RA (Ed.), New directions for methodology of behavioralscience (pp. 55-71). San Francisco: Jossey-Bass.

Landy F, Farr J. (1980). Performance rating. Psychological Bulletin, 87, 72-107. Lombard0 M, McCauley C. (1994). Benchmarks: A manual and trainer’s guide. Greens-

CHOLOGY, 45, 141-164.

Research in Personnel and Human Resource Management, 15, 121-174.

boro, NC: Center for Creative Leadership.


London M, Smither J. (1995). Can multi-source feedback change perceptions of goal accomplishment, self evaluations, and performance related outcomes? Theory- based applications and directions for research. PERSONNEL PSYCHOLOGY, 48, 803- 839.

London M, Wohlers A. (1991). Agreement between subordinate and self-ratings in upward feedback. PERSONNEL PSYCHOLOGY, 44, 375-390.

Mabe P, West S. (1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67, 280-286.

McCauley C, Lombard0 M, Usher C. (1989). Diagnosing management development needs: An instrument based on how managers develop. Journal of Management, IS , 389-403.

Nilsen D, Campbell D. (1993). Self-observer rating discrepancies: Once an overrater, always an overrater? Human Resource Management, 32, 265-281.

Nowack K. (1997). Congruence between self-other ratings and assessment center performance. Journal of Social Behavior and Personali& 12,5,145-166.

Paulhaus D. (1986). Self-deception and impression management in test responszs. In Angleitner A, Wiggins J (Eds.), Personality assessment via questionnaire (pp. 143- 165). New York Springer.

Pervin LA. (1978). Theoretical approaches to the analysis of individual-environment interaction. In Pervin LA, Lewis M (Eds.), Perspectives in inreractional psychology. NY: Plenum.

Podsakoff P, Organ D. (1986). Self-reports in organizational research: Problems & prospects. Journal of Management, 12, 531-544.

Roush P, Atwater L. (1992). Using the MBTl to understand transformational leadership and self-perception accuracy. Military Psychology, 4, 17-34.

Sackheim H. (1983). Self-deception, self-esteem and depression: The adaptive value of lying to oneself. In Masling J (Ed.), Empirical studies of psychoanahtic theoripp (pp. 101-157). Hillsdale, NJ: Earlbaum.

Smircich L, Chesser R. (1981). Superiors’ and subordinates’ perceptions of performance: Beyond disagreement. Academy of Management Journal, 24, 198-205.

Smither J, London M, Vasilopoulos N, Reilly R, Millsap R, Salveminim N. (1995). An examination of the effects of an upward feedback program over time. PERSONNEL

Taylor S, Brown J. (1988). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin, 103, 193-210.

Terborg J. (1981). Interactional psychology and research on human behavior in organizations. Academy of Management Review, 6, 569-576.

Thornton G. (1980). Psychometric properties of self-appraisal of job performance. PER- SONNEL PSYCHOLOGY, 33, 262-271.

Van Velsor LS. (1 991). Feedback to managers, Volume 11: A review and comparison ofsirteen multi-rater feedback insfrumenfs. Greensboro, NC. Center for Creative Leadership.

Van Velsor E, Taylor S, Leslie J. (1992, August). Self-rater agreement, selfawareness and leadership efectiveness. Paper presented at the 100th Annual Convention of the American Psychological Association, Washington DC.

Van Velsor E, Taylor S, Leslie J. (1993). An examination of the relationships among self-perception accuracy, self-awareness, gender, and leader effectiveness. Human Resource Managemenf, 32, 249-264.

Wohlers A, London M. (1989). Ratings of managerial characteristics: Evaluation diffi- culty, coworker agreement, and self-awareness. PERSONNEL PSYCHOLOGY, 42, 235- 261.

PSYCHOLOGY, 48, 1-33.


Yammarino F, Atwater L. (1997). Do managers see themselves as others see them? Im- plications of self-other rating agreement for human resources management. Orga- nizational Dynamics, 25 (4), 3544.

Zedeck S. (1995). Review of Benchmarks. In Conoley J, Impara J (Eds.), The fwewh mental measurementsyearbook. Lincoln, NE: Buros Institute of Mental Measurements.

APPENDIX

Effectiveness Items

How effectively would this person handle each of the following? 1 = among the worst; 2 = less well than most; 3 = adequate&; 4 = better than most; 5 = among the best.

1. turning around an organization or unit in trouble when he/she has

2. starting something from scratch 3. turning around an organization or unit in trouble when he/she has

4. working more than 6 months in a foreign country 5. being promoted into an unfamiliar line of business 6. switching from a line job to a staff job 7. a huge leap in responsibility (more people, functions and money) 8. having a significant role in an acquisition 9. having 6 months to close down an operation

little authority

the authority to make people comply

10. negotiating a major contract 11. installing a new system on a project 12. serving on a task force to solve a major problem 13. moving laterally into an unfamiliar line of business (no promotion) 14. being promoted in the same function or division (moving a level up) 15. switching from a staff job to a line job 16. being promoted two or more levels

SELF-OTHER AGREEMENT: DOES IT REALLY MATTER?

Documents

Transcript of SELF-OTHER AGREEMENT: DOES IT REALLY MATTER?