Social perception of male and female computer synthesized speech

18
Social perception of male and female computer synthesized speech John W. Mullennix a, *, Steven E. Stern a, *, Stephen J. Wilson c , Corrie-lynn Dyson b a Department of Psychology, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA b Department of Psychology, Clark University, Worcester, MA, USA c Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15213, USA Abstract The present study addressed the issue of whether social perception of human speech and computerized text-to-speech (TTS) is affected by gender of voice and gender of listener. Lis- teners were presented with a persuasive argument in either male or female human or synthetic voice and were assessed on attitude change and their ratings of various speech qualities. The results indicated that female human speech was rated as preferable to female synthetic speech, and that male synthetic speech was rated as preferable to female synthetic speech. Degree of persuasion did not differ across human and synthetic speech, however, female listeners were persuaded more by the argument than male listeners were. Patterns of ratings across male and female listeners were fairly similar across human and synthetic speech, suggesting that gender stereotyping for human voices and computerized voices may occur in a similar fashion. # 2003 Elsevier Science Ltd. All rights reserved. Keywords: Synthetic speech; Gender; Text-to-speech (TTS); Persuasion; Social influence With the proliferation of high-quality computerized text-to-speech (TTS) systems offering options of different voices within each system, some issues arise concerning users’ choice of what synthetic voice they prefer to use in a particular application. Early work on preferences for synthetic speech (Logan & Pisoni, 1986; McHugh, 1976; Nusbaum, Pisoni, & Schwab, 1984; Nye, Ingemann, & Donald, 1975) focused on comparisons of different algorithms or different systems in order to identify the most intelligible and natural sounding system. Later work on preferences has Computers in Human Behavior 19 (2003) 407–424 www.elsevier.com/locate/comphumbeh 0747-5632/03/$ - see front matter # 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0747-5632(02)00081-X * Corresponding authors. Fax: +1-814-269-2022. E-mail address: [email protected] (J.W. Mullennix).

Transcript of Social perception of male and female computer synthesized speech

Social perception of male and female computersynthesized speech

John W. Mullennixa,*, Steven E. Sterna,*,Stephen J. Wilsonc, Corrie-lynn Dysonb

aDepartment of Psychology, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USAbDepartment of Psychology, Clark University, Worcester, MA, USA

cDepartment of Psychology, University of Pittsburgh, Pittsburgh, PA 15213, USA

Abstract

The present study addressed the issue of whether social perception of human speech andcomputerized text-to-speech (TTS) is affected by gender of voice and gender of listener. Lis-

teners were presented with a persuasive argument in either male or female human or syntheticvoice and were assessed on attitude change and their ratings of various speech qualities. Theresults indicated that female human speech was rated as preferable to female synthetic speech,and that male synthetic speech was rated as preferable to female synthetic speech. Degree of

persuasion did not differ across human and synthetic speech, however, female listeners werepersuaded more by the argument than male listeners were. Patterns of ratings across male andfemale listeners were fairly similar across human and synthetic speech, suggesting that gender

stereotyping for human voices and computerized voices may occur in a similar fashion.# 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Synthetic speech; Gender; Text-to-speech (TTS); Persuasion; Social influence

With the proliferation of high-quality computerized text-to-speech (TTS) systemsoffering options of different voices within each system, some issues arise concerningusers’ choice of what synthetic voice they prefer to use in a particular application.Early work on preferences for synthetic speech (Logan & Pisoni, 1986; McHugh,1976; Nusbaum, Pisoni, & Schwab, 1984; Nye, Ingemann, & Donald, 1975) focusedon comparisons of different algorithms or different systems in order to identify themost intelligible and natural sounding system. Later work on preferences has

Computers in Human Behavior 19 (2003) 407–424

www.elsevier.com/locate/comphumbeh

0747-5632/03/$ - see front matter # 2003 Elsevier Science Ltd. All rights reserved.

doi:10.1016/S0747-5632(02)00081-X

* Corresponding authors. Fax: +1-814-269-2022.

E-mail address: [email protected] (J.W. Mullennix).

focused on the gender of voices offered within systems. Not surprisingly, Mirenda,Eicher, and Beukelman (1989) found that male and female listeners preferred humanmale and female voices over synthetic (DECtalk DTC01) male and female voices.However, they also found a tendency for males to be more willing than females torate a female voice as appropriate for a male speaker. This suggests there may besome cross-gender differences in preferences for particular types of voices, and per-haps some differences in how men and women perceive human and synthetic voices.The selection of a voice used to convey information through a TTS system is

important. In terms of attributions that people make regarding voice output overcomputer, there is evidence suggesting that people apply gender stereotypes to maleand female computer voices (Nass, Moon, & Green, 1997). Indeed, Reeves and Nass(1996) report that when listeners are exposed to male and female computerized voi-ces, evaluations from male-voiced computers are viewed as ‘‘friendlier’’ and aretaken more seriously than evaluations from female-voiced computers. Also, theyshowed that both male and female listeners rated a female-voiced computer as moreknowledgeable about love and relationships, while a male-voiced computer wasrated as more knowledgeable about technical subjects (Reeves & Nass, 1996, p. 164).On the other hand, Mirenda et al. (1989) found that although ratings of gender-appropriateness of voice clearly followed gender identity of the speaker with humanvoices, the situation was less clear for synthetic voices. Their results suggest thatlisteners may be making slightly different gender-related attributions to male andfemale synthetic voices than they do male and female human voices.The attributions that listeners make to male and female computer voices may

affect how the listener processes the intended message produced by the TTS system.In particular, if the message is one that is intended to influence the listener, not justinform the listener, then these attributions may have a major impact. In our pre-vious work (Stern, Mullennix, Dyson, & Wilson, 1999; Stern, Mullennix, & Wilson,2002), we found that listeners presented with a persuasive argument on a topic ratedhuman speech more ‘‘favorably’’ than synthetic speech (that is, ratings related toperceptions about the speaker, message, and effectiveness of the message were morefavorable). However, one shortcoming in the studies of Stern et al. (1999, 2002) wasthat male human and synthetic voices only were used. It remains to be seen whethersimilar results would be obtained with female human and synthetic voices. As dis-cussed later, there is substantial evidence indicating that social influence is affectedby interactions between gender of listener and gender of speaker. What is unknownis whether these interactions hold when listening to synthesized speech. If onebelieves that synthetic voices are gender stereotyped in the same way as human voi-ces, then one would expect to see similar patterns of results (e.g. in ratings of per-ceptions of the message and speaker and in persuasion data) for both human andsynthetic speech. But if synthetic voices are not gender stereotyped in the same way,different patterns may emerge.In terms of the relationship between gender and persuasion, researchers (Burgoon,

1974; Eagly, 1978, 1983) have noted gender differences in persuasiveness and sus-ceptibility to persuasion. In particular, there is a tendency for females to be moreeasily persuaded than males, with this difference minimized if the speaker is a

408 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

female. Alternatively, some view this gender difference as a tendency for some malesnot to be persuaded regardless of the situation (Burgoon & Klingle, 1998). Also,there are a number of cross-gender interactions that affect degree of influence. Carli,LaFleur, and Loeber (1995) found that likeableness was a more important determi-nant of influence for female speakers than male speakers, while Carli (1990) showedthat female speakers who spoke more tentatively were more influential with malelisteners and less influential with female listeners than those who spoke assertively.These findings indicate that gender is an important factor.In the present study, the question we address is whether the persuasiveness of an

argument produced by a TTS system and the ratings of perceptions given to thespeech produced by the system are affected by the gender of the voice output by thesystem and the gender of the listener. This question has implications for both theo-retical and applied issues. If people gender stereotype computerized syntheticspeech, it is important to demonstrate this tendency in a situation where listenersreceive a plausible argument. If gender effects are similar across human and syn-thetic speech, then this would support assertions that gender stereotyping of com-puterized speech is similar to that occurring for human speech, at least in terms ofgeneralizing to situations where persuasive arguments are involved. In terms of real-world applications, TTS systems are becoming widely used, especially as assistivetechnologies for those who are speech-impaired and/or visually impaired. The TTSsystem we test here is DECtalk, a relatively inexpensive, high-quality system cur-rently used by many people. If there are differences in the persuasiveness of syntheticspeech due to the choice of a male or female voice produced by the system, this hasimportant ramifications for the choices that the end user of the system may wish tomake.To conduct this experiment, a persuasive appeal (adopted from Petty & Cacioppo,

1986) was used to assess the degree to which listeners are persuaded to change theirattitudes about a topic. The passage was recorded from male and female humanspeakers and played back on audiotape, or was output from a computerized TTSsystem where an option was specified to produce the passage in a male or femalesynthetic voice. After hearing the passage, listeners rated their perceptions of thespeakers, the quality of speech, the message itself, and the effectiveness of the mes-sage. We examined the gender of the listener and how this interacted with gender ofvoice. In terms of expectations for the study, there are many possible patterns ofresults that could be obtained. We distilled the possibilities down into a set of pre-dictions that we felt had the best empirical precedent and that would best serve thegoal of the study. The predictions are as follows:

1. Differences between Human and Synthetic Speech. One goal of the present studywas to determine whether the findings from Stern et al. (1999, 2002) would generalizeto female speech. This led to the following hypotheses:

Hypothesis 1a. Rated perceptions of the speaker, message, and effectiveness of mes-sage will be more favorable for female human speech than for female syntheticspeech.

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 409

Hypothesis 1b. Female human speech will be more persuasive than female syntheticspeech.

2. Differences between Male and Female Synthetic Speech. Despite attempts toincrease the quality of female voice generated from synthesis-by-rule (e.g. Klatt &Klatt, 1990), it is generally acknowledged that the higher fundamental frequencyand more diffuse formant structures typical of human female voice lead to difficul-ties in generating a high quality synthetic female voice (Karlsson, 1991). It may bethat female synthetic voice sounds more ‘‘unnatural’’ than male synthetic voice,leading to differences in how listeners rate certain speech qualities (e.g. softness,squeakiness, slowness, etc.). This leads to the following hypotheses concerning dif-ferences between male and female synthetic speech:

Hypothesis 2a. Rated perceptions of speech qualities will be more favorable for malesynthetic speech than for female synthetic speech.

Hypothesis 2b. Male synthetic speech will be more persuasive than female syntheticspeech.

3. Gender Stereotyping Effects. Past research on gender and persuasion (Burgoon,1974; Eagly, 1978, 1983) suggests that females are persuaded more than males, withthis difference minimized if the speaker is female. If synthetic voices are gender-ste-reotyped in the same manner as human voices, we would expect the same pattern ofresults regardless if the speech is human or synthetic. This leads to two hypothesesconcerning gender:

Hypothesis 3a. Female listeners will rate male voices more favorably than male lis-teners, while there will be little difference between males and females in rating femalevoices. This pattern of ratings will be similar across human and synthetic speech.

Hypothesis 3b. Females will be more persuadable than males, regardless of the typeof speech (human or synthetic).

1. Method

1.1. Participants

There were 195 (96 male and 99 female) participants.1 All participants were stu-dents enrolled in introductory psychology classes at the University of Pittsburgh at

1 The sample size for this study (approximately 200 participants) was close to the sample size used in our

previous studies (Stern et al., 1999, 2002). In our prior work, the sample size used was sufficient to detect the

small to medium size effects we were interested in. This logic is consistent with power analyses (Cohen, 1988)

indicating that for r=0.20, a sample size of 200 provides 0.64 power. On the other hand, many of our present

analyses examined only one half of the sample, with a corresponding decrease in power.

410 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

Johnstown and were given course credit for their participation. The participantswere all US citizens.

1.2. Materials and apparatus

The natural human speech message was presented via audiotape through a SonyK870ES tape deck. The length of the passage was 5 min 10 s for the male passages(averaged across five male speakers), 5 min 2 s for the female passages (averagedacross five female speakers), 5 min 0 s for the male/synthetic passage, and 5 min 3 sfor the female/synthetic passage. The synthetic speech was presented via DECtalkExpress V2.4C, a commercially available, high-quality TTS system. Participants lis-tened to the messages through a set of commercial quality headphones.

1.2.1. Stimulus materialsThe persuasive argument was a passage in favor of university wide comprehensive

exams adapted from models of strong arguments provided by Petty and Cacioppo(1986). The human speech message was recorded by five different male and five dif-ferent female speakers, in order to obtain a representative sample of speaking styles.The speakers were local college students who had no professional speaking experi-ence. We chose these particular speakers in order to emulate as closely as possiblethe way in which a persuasive speech on this topic would be recorded and used inthe local college environment. For the synthetic voices, one male voice (‘‘Paul’’) andone female voice (‘‘Betty’’) were selected from the DECtalk TTS system. The defaultvalues in the DECtalk system for speech rate and all other voice-related parameterswere used.

1.2.2. Dependent measuresTo measure the participants’ perceptions of speech quality, attitudes toward the

message, and attitudes toward the speaker, we used a series of three questionnairescontaining a total of 25 semantic differential items developed by Leathers (1997) andLucia (1998). The questionnaires are found in Appendix A. The items examining per-ceptions of speech qualities consisted of seven scales (loud voice-soft spoken voice, deepvoiced-squeaky voiced, fast speaking-slow speaking, heavy accent-faint accent, talkedtoo long-didn’t talk long enough, heavy nasality-faint nasality, monotone-lively).The items examining perceptions of the message consisted of six scales (i.e. stimu-

lating-boring, vague-specific, unsupported-supported, complex-simple, convincing-unconvincing, and uninteresting-interesting). The items examining perceptions of thespeaker consisted of 12 scales (i.e. incompetent–competent, honest–dishonest, unas-sertive–assertive, uninformed–informed, untrustworthy–trustworthy, timid–bold,unintelligent–intelligent, straightforward–evasive, active–inactive, qualified–unqua-lified, sincere–insincere, meek–forceful). To rate perceptions of the effectiveness of theargument, we administered a series of nine-point semantic differential scales used byBaker and Petty (1994). This questionnaire consisted of six scales (i.e. bad–good,foolish–wise, negative–positive, beneficial–harmful, effective–ineffective, and con-vincing–unconvincing; see Appendix A). Because of the large number of variables in

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 411

the scales used to assess perceptions of the speaker and perceptions of the message,these two measures were collapsed into five factors and four factors, respectively.2

And finally, the degree of persuasion evoked by the argument was measuredthrough the use of a pre-test/post-test attitudinal measure developed after Rosselli,Skelly, and Mackie’s (1995) Initial Attitude Questionnaire (see Appendix B). Ourmeasure consisted of 12 items rated on a seven-point Likert-type item scale(1=Disagree completely, 7=Agree Completely). Three items were designed tomeasure attitudes relevant to the stimulus argument about comprehensive exams,and the other nine items were related to three control topics that measured attitudestoward animal rights, environmentalism and a proposed tuition raise. Each set ofthree attitude items was summed to produce a potential range of scores from 3 to 21.The three control topics were chosen to assess whether listening to the stimulus

argument would lead to a general rise in favorable attitudes. The questions on tui-tion raise, in particular, would alert us to the possibility that the persuasive appealon comprehensive exams would affect attitudes toward another campus-relatedissue. As we discover later, the comprehensive exam argument produced a sig-nificantly greater attitude change than for the three control topics.3

1.3. Design and procedure

The experimenter randomly assigned participants to listen to the persuasive message inone of the four experimental conditions: human speech/male voice, human speech/femalevoice, synthetic speech/male voice, synthetic speech/female voice. For the human speechconditions, listeners were alternated through the ten different speakers. Prior to listeningto the message, participants completed the attitude measure as a pretest.Participants were told that the experiment concerned the topic of comprehensive

exams in college and that they would listen carefully to a passage and answer somequestions about it afterwards. Before listening to the passage, listeners in all conditionsheard a test message during which they were permitted to adjust the volume level of theheadphones to a comfortable listening level. After this, the persuasive passage wasplayed, with the listener signaling the experimenter when the passage was finished.After completion of the persuasive passage, participants completed the same atti-

tude measure administered as a pretest once again as the posttest. Then, participants

2 The measures relevant to attitudes toward the speaker and the message were subjected to exploratory

principal components factor analyses (Varimax rotation), retaining items with factor loadings above 0.5.

The factor analysis of items related to speaker yielded five factors: knowledgeable, truthful, involved,

powerful, and accurate. The factor analysis of items related to message yielded four factors: captivating,

clear, convincing, and simple (cf. Stern et al., 1999).3 As shown in Table 2, mean ratings of attitudes toward animal rights, environment, and tuition issues

changed little from pretest to posttest, but mean ratings of attitudes towards comprehensive exams

appeared to change (M=13.2 vs. M=16.2). To support this observation, a weighted contrast was con-

ducted on the persuasion data, with the pretest/posttest difference in the comprehensive exam data

weighted (+3) against the pretest/posttest difference in the data for the other three control topics (�1 for

each topic). The results showed that there was a greater increase in scores from pretest to posttest for the

questions regarding the comprehensive exams in comparison to the control questions in the other three

topics, Fcontrast(1, 561)=382.9, P<0.001, r=0.82.

412 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

completed the measures described above assessing perceptions of speech qualities,perceptions of the message, perceptions of the speaker, and effectiveness of themessage. At the conclusion of the experimental sessions, the experimenters debriefedeach participant about the purpose of the study.

2. Results

The data from the scales used to assess perceptions of speech qualities, attitudestoward the message, speaker, and effectiveness of the message were analyzed using aseries of 2�2�2 (listener gender�voice gender�speech type) factorial ANOVAs.Speech type refers to whether the speech was human or synthetic.

2.1. Female human voice vs. female synthetic voice

In Table 1, the rating data are displayed as a function of speech type and voicegender conditions. A series of contrast t-tests were conducted to examine whetherdifferences existed between the female human and female synthetic voices for the var-ious rating measures, as collapsed into factors via factor analysis. Contrast ts were cal-culated by comparing the difference between the mean ratings of the two voices (femalehuman and female synthetic). The mean square error (MSE) and degrees of freedom(d.f.) from the factorial ANOVA were used (Rosenthal & Rosnow, 1991). Table 1provides the t values, significance levels, and effect sizes (reported as r).Consistent with Hypothesis 1a, The female human speakers were rated as more

truthful and more involved than the female synthetic speaker. The message pro-duced by the female human speakers were rated as more convincing, good, andpositive than the female synthetic speaker, while the message produced by the femalesynthetic speaker was rated as more harmful and more ineffective. In terms ofspeech quality, the female human speakers was rated as slower, less nasal, and live-lier than the female synthetic speaker, while the female synthetic voice was rated assqueakier and more accented than the female human voices.To examine Hypothesis 1b, we compared the persuasiveness of female human

speech to female synthetic speech by comparing the pretest/posttest differences onthe attitude measure for questions related to the persuasive appeal (comprehensiveexams). The persuasion data are shown in Table 2. A weighted contrast (Rosenthal& Rosnow, 1985) comparing the mean difference in ratings for female human voice(mean difference=3.2) to female synthetic voice (mean difference=2.8) demon-strated some support for Hypothesis 1b, that female human voice was more per-suasive than female synthetic voice F(1, 561)=3.4, P<0.10, r=0.13.

2.2. Male synthetic voice vs. female synthetic voice

To examine whether there were differences in perceptions of male and femalesynthetic voices, another series of contrast t-tests was conducted. Table 1 provides

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 413

the t values, significance levels, and effect sizes for the comparison of male syntheticvoice to female synthetic voice. Support for Hypothesis 2a was evidenced by the factthat female synthetic voice was rated as less powerful, softer, squeakier, and fasterthan the male synthetic voice. In addition, the message produced by the male syn-thetic voice was rated as good and more positive, in contrast to the female syntheticvoice. There was also a marginal trend for the male voice to be rated as more‘‘accented’’ than the female voice.To examine Hypothesis 2b, we compared the mean difference in pretest/posttest

ratings (see Table 2) for male synthetic speech (mean difference=3.1) to female syn-thetic voice (mean difference=2.5), via a weighted contrast similar to that used to

Table 1

Rating data displayed as a function of voice gender and speech type (human or synthetic)

Measure

Male/

Human

Female/

Human

Male/

Synth

Female/

Synth

(Male/Synth

vs. Female/

Synth)

r

(Female/Human

vs. Female/

Synth)

r

Perceptions of speaker

Knowledgeable

14.84 15.44 14.89 14.75 0.23 0.02 1.06 0.07

Truthful

10.19 10.83 9.79 10.07 0.64 0.05 1.71** 0.12b

Involved

13.48 14.64 13.98 13.42 0.88 0.06 1.82** 0.13b

Powerful

7.84 8.58 9.63 8.61 2.22** 0.16b 0.02 0.00

Accurate

11.88 11.82 11.84 11.34 1.24 0.09 1.18 0.09

Perceptions of message

Captivating

8.39 9.23 9.76 8.64 1.12 0.08 0.57 0.04

Clear

11.94 12.04 11.94 11.66 0.69 0.05 0.95 0.07

Convincing

4.69 5.10 4.66 4.39 0.78 0.06 1.69** 0.12b

Simple

4.55 4.31 4.06 4.27 0.77 0.06 0.11 0.01

Effectiveness of message

Good

6.51 6.84 6.60 6.24 1.37** 0.10b 2.19** 0.16b

Wise

6.47 6.80 6.50 6.45 0.18 0.01 1.20 0.09

Positive

6.71 7.29 6.98 6.62 1.25** 0.09 2.28** 0.17b

Harmful

2.98 2.78 2.90 3.20 1.16 0.08 1.56** 0.11b

Unconvincing

3.31 2.99 3.34 3.61 0.78 0.06 1.69** 0.12b

Ineffective

3.47 2.85 3.49 3.63 0.30 0.02 1.68** 0.12+

Perceptions of speech qualities

Soft

4.21 3.63 3.15 3.69 2.24** 0.16b 0.25 0.02

Squeaky

3.41 4.01 3.43 4.65 5.54** 0.37a 2.87** 0.21b

Slow

4.32 4.04 3.74 3.01 2.72** 0.19b 3.76** 0.26b

Unaccented

5.43 5.43 4.19 3.72 1.56* 0.11b 5.51** 0.37a

Not long

3.51 3.36 2.92 3.21 1.18 0.09 0.63 0.05

Less nasal

4.28 4.96 3.45 3.05 1.19 0.09 5.66** 0.38a

Lively

2.69 3.60 1.69 1.83 0.50 0.04 6.17** 0.41a

Power determinations are from Cohen’s (1988) power tables.a Denotes power level >0.60, <0.80.b Denotes power level >0.25, <0.60.

* Denotes a marginally significant contrast at P<0.10.

** Denotes a significant contrast at P<0.05.

414 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

examine Hypothesis 1b. The results showed supported Hypothesis 2b, F(1, 561)=10.8,P<0.01, r=0.23, with male synthetic speech being more persuasive than femalesynthetic speech.

2.3. Gender stereotyping effects

To illustrate the next set of findings, the same rating data are now presented inTable 3 by gender of listener, voice gender, and type of speech (human or synthetic).To examine Hypothesis 3a (female listeners will rate male voices more favorablythan male listeners, while there will be little difference between males and females inrating female voices), two series of contrast ts were conducted. These tests took thedifference between male and female listeners on how they rated male voices andcompared this to the difference between male and female listeners on how they ratedfemale voices. Separate analyses were run on the human speech and the syntheticspeech in order to assess whether human voice gender and synthetic voice genderaffected participants’ ratings in the same way.The results of these analyses are shown in Table 3. There were several cases where

female listeners rated male voices more favorably than male listeners. As oneexample, female listeners rated male human and synthetic voices as more convincingthan male listeners did. Overall, there were 16 measures (out of 22 rating measures)where the pattern of ratings was similar for male and female listeners for bothhuman and synthetic speech, two cases where a significant difference between maleand female listeners were found for human speech but not for synthetic speech, andfour cases where a significant differences between male and female listeners wasobserved for synthetic speech but not human speech. These results provide supportfor Hypothesis 3a and support for the idea that human and synthetic voices aregender stereotyped in approximately the same fashion.In terms of Hypothesis 3b, whether females are more persuadable than males, a

contrast t-test was conducted on the comprehensive exam argument data for listenergender. In support of Hypothesis 3b, the results showed that females were persuadedby the comprehensive exam argument more than males were Fcontrast(1, 561)=6.3,P<0.01, r=0.18. In addition, another test was conducted examining persuasion

Table 2

Pretest and posttest attitudes as a function of voice condition, pretest vs. posttest, and topic of argument

Topic

Comprehensive

Tuition Animal Rights Environment

Pre

Post Diff Pre Post Diff Pre Post Diff Pre Post Diff

Male/Human

9.2 16.4 +7.2 9.8 11.9 +2.1 9.2 9.5 +0.3 13.4 12.5 �0.9

Female/Human

13.2 16.4 +3.2 10.3 11.6 +1.3 9.4 9.5 +0.1 12.4 12.2 �0.2

Male/Synthetic

13.6 16.8 +3.2 10.3 11.9 +1.6 10.3 10.4 +0.1 12.4 12.6 +0.2

Female/Synthetic

12.7 15.2 +2.5 8.9 10.5 +1.6 9.9 9.7 �0.2 13.0 12.8 �0.2

Means:

12.2 16.2 +4.0 9.8 11.5 +1.7 9.7 9.8 +0.1 12.8 12.5 �0.3

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 415

across male and female listeners for the male voice only. The results of the test werenot significant, indicating that the degree of persuasion induced by the male voicedid not differ across listener gender.

2.4. Overall differences between human and synthetic speech

Overall, we also obtained rating data that, to some degree, overlapped with thefindings of Stern et al. (1999, 2002). In Table 4, mean ratings for perceptions ofhuman and synthetic speech are displayed, collapsed over listener gender and voicegender variables. The ANOVA showed that human speech was rated as significantly

Table 3

Rating data displayed as a function of voice gender, listener gender, and type of speech (human or syn-

thetic)

Human voice comparisons

Synthetic voice comparisons

Male human

Female human Contrast Male synthetic Female synthetic Contrast

voice

voice t r voice voice t r

Male

listener

Female

listener

Male

listener

Female

listener

Male

listener

Female

listener

Male

listener

Female

listener

Perceptions of speaker

Knowledgeable

14.68 15.00 15.18 15.71 0.32 0.03 14.12 15.65 14.70 14.80 2.22** 0.22b

Truthful

9.58 10.8 10.55 11.12 1.47* 0.15 9.16 10.42 9.74 10.40 1.37* 0.14

Involved

13.08 13.88 14.32 14.96 0.24 0.02 13.08 14.86 13.43 13.40 2.78** 0.27b

Powerful

7.56 8.12 8.45 8.71 0.64 0.07 9.56 9.69 8.74 8.48 0.84 0.08

Accurate

11.68 12.08 11.59 12.04 0.13 0.01 11.52 12.15 10.91 11.76 0.56 0.06

Perceptions of message

Captivating

7.61 9.16 8.95 9.50 0.98 0.10 8.60 10.92 8.52 8.76 2.06** 0.21b

Clear

11.73 12.16 12.04 12.04 1.18 0.12 11.56 12.32 11.09 12.24 1.08 0.11

Convincing

4.50 4.88 5.19 4.83 2.12** 0.21b 4.20 5.12 4.30 4.48 2.14** 0.21b

Simple

4.35 4.76 4.45 4.17 2.50** 0.25b 4.12 4.00 4.35 4.20 0.11 0.01

Effectiveness of message

Good

6.27 6.76 6.71 6.96 0.90 0.09 6.20 7.00 6.00 6.48 1.22 0.12

Wise

6.31 6.64 6.81 6.79 1.26 0.13 6.24 6.77 6.30 6.60 0.84 0.08

Positive

6.50 6.92 7.33 7.25 1.73** 0.18 6.52 7.44 6.52 6.72 2.52** 0.25b

Harmful

3.54 2.42 2.86 2.71 3.59** 0.35a 3.08 2.73 3.00 3.40 2.80** 0.27b

Unconvincing

3.50 3.12 2.81 3.17 2.12** 3.80 2.88 3.70 3.52 2.14** 0.21b

Ineffective

3.69 3.24 2.67 3.04 1.87** 0.19 3.48 3.50 3.78 3.48 0.74 0.07

Perceptions of speech qualities

Soft

4.35 4.08 3.73 3.54 0.33 0.03 3.04 3.27 3.65 3.72 0.68 0.07

Squeaky

3.54 3.28 4.27 3.75 1.17 0.12 3.40 3.46 4.65 4.64 0.32 0.03

Slow

4.19 4.44 4.09 4.00 1.27 0.13 3.76 3.73 2.87 3.16 1.21 0.12

Unaccented

5.50 5.36 5.32 5.54 1.19 0.12 4.08 4.31 3.43 4.00 1.13 0.11

Not long

3.54 3.48 3.18 3.54 1.38* 0.14 2.72 3.11 3.22 3.20 1.67** 0.17

Less nasal

4.12 4.44 4.95 4.96 0.94 0.10 3.48 3.42 2.83 3.28 1.56* 0.16

Lively

2.65 2.72 3.45 3.75 0.81 0.08 1.64 1.73 1.70 1.96 0.60 0.06

Power determinations are from Cohen’s (1988) power tables.a Denotes power level >0.60, <0.80.b Denotes power level >0.25, <0.60.

* Denotes a marginally significant contrast at P<0.10.

** Denotes a significant contrast at P<0.05.

416 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

softer, slower, less accented, less lengthy, less nasal, and livelier than syntheticspeech, and human speech was rated as marginally more truthful than syntheticspeech. Synthetic speech was rated as significantly squeakier and more powerfulthan human speech. Thus, particularly when considering ratings related to speechqualities, these findings are similar to those we have observed in previous studies.

3. Discussion

One important focus of the present study was to compare female human speech tofemale synthetic speech. In our previous work, we had studied male voices only. Our

Table 4

Mean ratings for human speech vs. synthetic speech for perceptions of speaker, message, effectiveness of

message, and speech qualities

Measure

Speech type

Human

Synthetic

Perceptions of speaker

Knowledgeable

15.14 14.82

Truthful*

10.51 9.93

Involved

14.06 13.70

Powerful**

8.21 9.12

Accurate

11.85 11.59

Perceptions of message

Captivating

8.81 9.20

Clear

11.99 11.80

Convincing

3.15 3.47

Simple

4.43 4.17

Effectiveness of message

Good

6.67 6.42

Wise

6.64 6.48

Positive

7.00 6.80

Harmful

2.88 3.05

Unconvincing

3.15 3.47

Ineffective

3.16 3.56

Perceptions of speech qualities

Soft**

3.92 3.42

Squeaky**

3.71 4.04

Slow**

4.18 3.38

Unaccented**

5.43 3.96

Not long**

3.43 3.06

Less nasal**

4.62 3.25

Lively**

3.14 1.76

* Denotes a marginally significant main effect at P<0.10.

** Denotes a significant main effect at P<0.05.

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 417

prediction that ratings would differ across female human and female syntheticspeech was borne out (see Table 1). We found that listeners focused on differences inthe quality of speech, with the female synthetic voice receiving ratings indicatingthat listeners found the synthetic voice less pleasing (e.g. as evidenced by being ratedas ‘‘squeakier’’, ‘‘more accented’’, more ‘‘nasal’’, etc.). In addition, the femalehuman speakers were given ratings indicating that listeners believed that they weremore truthful and more involved than the synthetic speaker, and that the messageuttered by the human speaker was more convincing, more positive, etc. We alsofound that, as indexed by attitude change on the comprehensive exam argument,that female human voice was more persuasive than female synthetic voice. Overall,these results replicate and extend our previous findings based on male voices (Sternet al., 1999, 2002), showing that fairly similar patterns of results are obtained usingfemale voices.Another focus was on comparisons between male and female synthetic voices. As

expected, we found some differences between these two speakers (see Table 1). Themale synthetic speaker was rated as more powerful, softer, less squeaky and asspeaking more slowly than the female synthetic speaker. These particular differencesare perceptual in nature, and are most likely due to differences in synthesis qualitybetween male and female voice. However, we also found that the message producedby the male synthetic voice was rated as more favorable (e.g. good and more posi-tive) and was more persuasive, in terms of the persuasive appeal, than female syn-thetic voice. Thus, the differences between the voices are not completely due toperceptual factors, as higher-level perceptions of the message and persuasiveness arealso affected.We also examined two hypotheses related to the issue of listener gender and gen-

der stereotyping. When examining persuasiveness, we found that females were morepersuaded by the comprehensive exam argument than males. This pattern of datamay be related to the observation that females are generally more susceptible topersuasion than males in certain situations (Burgoon, 1974; Eagly, 1978, 1983).However, we found no evidence that voice gender affected degree of persuasion.On the other hand, analysis of the rating data did reveal some cross-gender effects.

We found that female listeners rated male voices more favorably than male listenersdid in a few instances (see Table 3). More importantly, the pattern of ratings formale and female listeners across human and synthetic speech was fairly similar, witha few minor differences. As noted earlier, the pattern of results was similar for 16 outof 22 measures across human and synthetic speech. This provides evidence thatgender stereotyping, in terms of how male and female listeners treat a persuasiveargument, is similar for messages produced by male and female human speakers andfor messages produced by male and female synthetic speakers. This result convergesnicely with other research suggesting that people apply gender stereotypes to maleand female voices output over computer (Nass et al., 1997).And finally, we also note that the overall differences between human and synthetic

speech in the rating data (when collapsed over voice gender and listener gender)resemble our previous findings (Stern et al., 1999, 2002). The rating data indicatethat listeners view human and synthetic speech quite differently (see Table 4).

418 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

In summary, our findings suggest that there exist some differences and somesimilarities between male and female human speech and male and female syntheticTTS speech, in terms of social perception and social influence amongst Americancollege students. It appears that, when listening to a persuasive appeal, femalehuman speech is preferable to female synthetic speech and male synthetic speech ispreferable to female synthetic speech. Although we found some evidence for cross-gender interactions between speaker and listener, it may be worthwhile in futurework to examine gender of speaker and listener using other types of verbal materialswhere social influence could be assessed. As well, it would be useful to examinevarieties of male and female TTS voices in systems where the synthesis quality ofmale and female voices may be improved.Although the technology behind TTS is advancing rapidly, with better and more

natural-sounding computerized speech soon available, the issue of whether we gen-der stereotype speech emanating from a computer is still a viable and importantissue. We obtained some results in the present study that are consistent with the ideathat human and synthetic speech are gender stereotyped in a similar fashion. How-ever, although we believe we have sufficient power in our sample size and analyses tosupport this idea, some of the effect sizes we obtained were small. As a result, theone caveat to this work is whether the effects we have observed in the laboratory arerobust and will transfer over to real-world applications. This question is one thatcan be answered in the future by more extensive testing with different TTS systemsin different contexts and different computer-mediated communication situations.

Acknowledgements

This research was funded by a grant from the University of Pittsburgh CentralResearch Development Fund. The authors acknowledge Marissa Andolina, HeatherClark, Melissa Guntrum, and Erin King for their assistance on this project.

Appendix A

In order to evaluate the effectiveness of the argument, please rate your attitudetowards the argument using the rating scales below.

Bad

[—[—[—[—[—[—[—[—[ Good

Foolish

[—[—[—[—[—[—[—[—[ Wise

Negative

[—[—[—[—[—[—[—[—[ Positive

Beneficial

[—[—[—[—[—[—[—[—[ Harmful

Convincing

[—[—[—[—[—[—[—[—[ Unconvincing

Effective

[—[—[—[—[—[—[—[—[ Ineffective

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 419

Directions: Rate theMESSAGE itself, regardless of your opinion on the topic, onthe following dimensions by circling the number that corresponds with your answer.Example:

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Flamboyant

Flamboyant Flamboyant Conservative Conservative Conservative

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Stimulating

Stimulating Stimulating Boring Boring Boring

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Vague

Vague Vague Specific Specific Specific

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Unsupported

Unsupported Unsupported Supported Supported Supported

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Complex

Complex Complex Simple Simple Simple

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Convincing

Convincing Convincing Unconvincing Unconvincing Unconvincing

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Uninteresting

Uninteresting Uninteresting Interesting Interesting Interesting

Note: * denotes reverse scored item (asterisks and this note not included oninstrument distributed to subjects).

Directions: Rate the SPEAKER on the following dimensions by circling thenumber that corresponds with your choice.

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Unintelligent

Unintelligent Unintelligent Intelligent Intelligent Intelligent

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Straightforward

Straightforward Straightforward Evasive Evasive Evasive

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Active

Active Active Inactive Inactive Inactive

420 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Qualified

Qualified Qualified Unqualified Unqualified Unqualified

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Sincere

Sincere Sincere Insincere Insincere Insincere

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Meek

Meek Meek Forceful Forceful Forceful

Note: * denotes reverse scored item (asterisks and this note not included oninstrument distributed to subjects).

Directions: Rate the SPEAKER on the following dimensions by circling thenumber that corresponds with your choice.

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Incompetent

Incompetent Incompetent Competent Competent Competent

*1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Honest

Honest Honest Dishonest Dishonest Dishonest

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Unassertive

Unassertive Unassertive Assertive Assertive Assertive

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Uninformed

Uninformed Uninformed Informed Informed Informed

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Untrustworthy

Untrustworthy Untrustworthy Trustworthy Trustworthy Trustworthy

1

2 3 4 5 6 7

Completely

Somewhat Slightly Neutral Slightly Somewhat Completely

Timid

Timid Timid Bold Bold Bold

Note: * denotes reverse scored item (asterisks and this note not included oninstrument distributed to subjects).

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 421

Directions: Rate the SPEAKER on the following dimensions by circling thenumber that corresponds with your choice.

Loud Voice

1 2 3 4 5 6 7 Soft-Spoken Voice

Deep Voiced

1 2 3 4 5 6 7 Squeaky Voiced

Fast Speaking

1 2 3 4 5 6 7 Slow Speaking

Heavy Accent

1 2 3 4 5 6 7 Faint Accent

Talked Too Long

1 2 3 4 5 6 7 Didn’t Talk Long Enough

Heavy Nasality

1 2 3 4 5 6 7 Faint Nasality

Monotone

1 2 3 4 5 6 7 Lively

Appendix B

Directions: Rate the extent to which you agree with the following statements on a7 point scale by writing your answer in the space provided.

1

2 3 4 5 6 7

Disagree

Disagree Disagree Neutral Agree Agree Agree

Completely

Somewhat Slightly Slightly Somewhat Completely

_____ 1) The use of animals for research purposes is inhumane and morallyunjustified.

_____ 2) The proper disposal of industrial toxic waste is one of the most seriousproblems facing our country.

_____ 3) A 5 percent raise in tuition would be an unfair burden on the studentswho are attending the university.*

_____ 4) Required comprehensive exams before college graduation, in a student’smajor, can benefit both the student and the university through increased corporateand individual donations.

_____ 5) Animal experimentation is an essential tool for scientific and medicalresearch.*

_____ 6) The ‘‘greenhouse effect’’ is not as serious as the media would have usbelieve.*

422 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424

_____ 7) A 5 percent raise in tuition could be substantially raise the quality of theeducation at a university.

_____ 8) Required comprehensive exams before college graduation, in a student’smajor, are a waste of time and money for the student and the university.*

_____ 9) Research involving animal subjects may some day be instrumental insaving the life of your child or the child of someone close to you.*

_____ 10) Oil drilling off the coast of California should not be allowed under anycircumstances.

_____ 11) The income raised by a 5 percent tuition hike could raise the quality oflife for the students who are there.

_____ 12) Students attending universities that require comprehensive exams havehigher chances of getting better paying jobs.

Note: * denotes reverse scored item (asterisks and this note not included oninstrument distributed to subjects).

References

Baker, S. M., & Petty, R. E. (1994). Majority and minority influence: source position imbalance as a

determinant of message scrutiny. Journal of Personality and Social Psychology, 67, 5–19.

Burgoon, M. (1974). Approaching speech/communication. New York: Holt, Newhart, & Winston.

Burgoon, M., & Klingle, R. S. (1998). Gender differences in being influential and/or influenced: a chal-

lenge to prior explanations. In D. J. Canary, & K. Dindia (Eds.), Sex differences and similarities in

communication (pp. 257–285). Mahwah, NJ: Lawrence Erlbaum Associates.

Carli, L. L. (1990). Gender, language, and influence. Journal of Personality and Social Psychology, 59,

941–951.

Carli, L. L., LaFleur, S. J., & Loeber, C. C. (1995). Nonverbal behavior, gender, and influence. Journal of

Personality and Social Psychology, 68, 1030–1041.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Hillsdale, NJ: Lawrence

Erlbaum Associates.

Eagly, A. H. (1978). Sex differences in influenceability. Psychological Bulletin, 85, 86–116.

Eagly, A. H. (1983). Gender and social influence: a social psychological analysis. American Psychologist,

38, 971–981.

Karlsson, I. (1991). Female voices in speech synthesis. Journal of Phonetics, 19, 111–120.

Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and perception of voice quality variations among

female and male talkers. Journal of the Acoustical Society of America, 87, 820–857.

Leathers, D. G. (1997). Successful nonverbal communication: principles and applications (3rd ed). Boston:

Allyn & Bacon.

Logan, J. S., & Pisoni, D. B. (1986). Preference judgements comparing different synthetic voices. Research

on Speech Perception Progress Report No. 12. (pp. 263–290). Bloomington, IN: Indiana University.

Lucia, V. C. (1998). The effects of speech rate and speaker-listener congruence on persuasion. Unpublished

master’s thesis, Wayne State University, Detroit, MI.

McHugh, A. (1976). Listener preference and comprehension tests of stress algorithms for a text-to-phonetic

speech synthesis program. Naval Research Laboratory Report 8015.

J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424 423

Mirenda, P., Eicher, D., & Beukelman, D. R. (1989). Synthetic and natural speech preferences of male

and female listeners in four age groups. Journal of Speech and Hearing Research, 32, 175–183.

Nass, C., Moon, Y., & Green, N. (1997). Are machines gender neutral? Gender-stereotypic responses to

computers with voices. Journal of Applied Social Psychology, 27, 864–876.

Nusbaum, H. C., Pisoni, D. B., & Schwab, E. C. (1984). Subjective evaluation of synthetic speech: mea-

suring preference, naturalness, and acceptability. Research on Speech Perception, Progress Report No.

10 (pp. 391–408). Bloomington, IN: Indiana University.

Nye, P. W., Ingemann, F., & Donald, L. (1975). Synthetic speech comprehension: a comparison of listener

performances with and preferences among different speech forms. Status Report on Speech Perception

SR-41. Haskins Laboratories.

Petty, R. E., & Cacioppo, J. T. (1986). Communication and persuasion. New York: Springer Verlag.

Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media

like real people and places. Cambridge, UK: Cambridge University Press.

Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis: focused comparisons in the analysis of variance.

Cambridge, UK: Cambridge University Press.

Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: methods and data analysis (2nd

ed). New York: McGraw-Hill.

Rosselli, F., Skelly, J. J., & Mackie, D. M. (1995). Processing rational and emotional messages: the cog-

nitive and affective mediation of persuasion. Journal of Experimental Social Psychology, 31, 163–190.

Stern, S. E., Mullennix, J. W., Dyson, C., & Wilson, S. J. (1999). The persuasiveness of synthetic speech

versus human speech. Human Factors, 41, 588–595.

Stern, S. E., Mullennix, J. W., & Wilson, S. J. (2002). Effects of perceived disability on persuasiveness of

computer synthesized speech. Journal of Applied Psychology, 87, 411–417.

424 J.W. Mullennix et al. / Computers in Human Behavior 19 (2003) 407–424