Bilingual Telephone-Assisted Computerized Speech-Recognition Assessment: Is a Voice-Activated...

51
Bilingual Computerized Screening 1 Running Head: BILINGUAL COMPUTERIZED SCREENING OF DEPRESSION Bilingual Telephone-Assisted Computerized Speech Recognition Assessment: Is a Voice-Activated Computer Program a Culturally and Linguistically Appropriate Tool for Screening Depression in English and Spanish? Gerardo M. González Craig R. Costello Tammi R. La Tourette Leslie K. Joyce Mario Valenzuela Psychology Program California State University, San Marcos Key Words: Depression screening; Spanish language; Latinos; Speech recognition; Telephone interviewing; CES-D

Transcript of Bilingual Telephone-Assisted Computerized Speech-Recognition Assessment: Is a Voice-Activated...

Bilingual Computerized Screening 1

Running Head: BILINGUAL COMPUTERIZED SCREENING OF DEPRESSION

Bilingual Telephone-Assisted Computerized Speech RecognitionAssessment: Is a Voice-Activated Computer Program a Culturallyand Linguistically Appropriate Tool for Screening Depression in

English and Spanish?Gerardo M. GonzálezCraig R. Costello

Tammi R. La TouretteLeslie K. JoyceMario ValenzuelaPsychology Program

California State University, San Marcos

Key Words: Depression screening; Spanish language; Latinos; Speech recognition; Telephone interviewing; CES-D

Bilingual Computerized Screening 2

Send all correspondence to G.M. González, Psychology Program, California State University, San Marcos, CA 92096. Telephone (619) 750-4094, Telefax (619) 750-4111, E-mail: [email protected]

Bilingual Computerized Screening 3

AbstractThe purpose of this study was to evaluate an automated voice-interactive program for screening depression in English and Spanish. The Center for Epidemiological Studies - Depression scale (CES-D) was administered in two interview formats: a speechrecognition program presented by cellular telephone and a face-to-face method. In a single session counterbalanced design, 32 English- and 23 Spanish-speakers completed randomly ordered administrations of the two CES-D methods, the Beck Depression Inventory (BDI), and the Short Acculturation Scale (SAS). There was strong evidence that the two CES-D methods were psychometrically equivalent, reliable, and valid in both languages. The two methods were highly rated by both language groups. The Spanish-speakers did not display a preference for either method, but the English-speakers preferred the face-to-face method. The results also suggested that verbal response latency time was positively correlated to depression scores. Lastly, the Spanish-speakers' acculturation levels were not correlated to depression scores. Differences in age, education, and income between the language groups were confounded by unequalsample sizes. The findings generally supported the viability of the automated CES-D as a culturally and linguistically appropriate tool for screening depression in English and Spanish.Furthermore, the analyses of respondent voice characteristics shows promise as a method for screening depression in both languages.

Bilingual Computerized Screening 4

Bilingual Telephone-Assisted Computerized Speech RecognitionAssessment: Is a Voice-Activated Computer Program a Culturallyand Linguistically Appropriate Tool for Screening Depression in

English and Spanish?The purpose of this study was to test a bilingual

computerized speech recognition method for screening depression symptoms using a cellular telephone. The aim was to evaluate thecomputerized speech responsive program as a culturally and linguistically appropriate tool to detect depression in English and Spanish. The program was assessed for psychometric equivalence, reliability, validity, feasibility of administration, and respondent acceptability. Major features of the study were to examine the relationships of verbal response time and acculturation to depression levels. The long-term goal of computerized speech recognition screening is not to replace personnel, but to provide assessment services in the absence or unavailability of mental health care staff, particularly in busy primary care settings. Ultimately, the program is intended to facilitate accessibility to depression prevention interventions through the early detection of symptoms.

The underdetection and insufficient treatment of clinical depression poses a major health concern. About 75% of clinicallydepressed individuals visit non-psychiatric medical care providers for relief of their somatic symptoms (Robins et al., 1985; Shapiro et al., 1984). Many of these persons are first treated at primary care settings, where up to 30% of the patientsmay be depressed (Broadhead, Clapp-Channing, Finch, & Copeland,

Bilingual Computerized Screening 5

1989). In actuality, only half of the primary care cases are accurately identified for depression by medical personnel (Katon,1987; Pérez-Stable, Miranda, Muñoz, & Ying, 1990). High patient volume, understaffing, and time constraints at primary medical facilities hinder adequate screening of depression. As a result,the underrecognition of depression leads to inappropriate treatment and unnecessary health care costs (Greenberg, Stiglin, Finnelstein, & Berndt, 1993). Underserved groups, such as Spanish-speaking populations, are particularly at risk for undertreatment. For example, an Epidemiological Catchment Area study found that only 11% of Mexican Americans (compared to 22% of non-Hispanic Whites), who met the criteria for clinical depression, sought a mental health care provider for treatment (Hough et al., 1987).

Spanish-speaking patients generally lack accessibility to linguistically and culturally compatible mental health services (Organista, Muñoz, & González, 1994). Nationally, there are about 500 to 600 appropriately trained bilingual mental health professionals who can provide Spanish-speaking clinical services (Muñoz & Ying, 1993). However, the increasing gap between the Spanish-speaking population and bilingual mental health professionals-in-training, makes it unlikely that ample Spanish-speaking clinicians will be available in the near future (Bernal & Castro, 1994). Thus, alternative strategies for delivering mental health services to Spanish-speaking communities are imperative.

Bilingual Computerized Screening 6

Recent advances in computerized technology offer alternativescreening methods for individuals not reliably assessed with standard paper-and-pencil questionnaires. Such is the case for the disabled (Noyes, Haigh, & Starr, 1989), persons who cannot read nor write, or non-English-speakers (Starkweather & Muñoz, 1989). These populations are less likely to utilize mental services because of written assessment or language barriers. Forthese populations, computerized technology has the potential to minimize the obstacles that contribute to the underidentificationof depression.Computer-assisted Psychological Assessment

The literature provides abundant evidence to support the strong psychometric properties and respondent acceptability of numerous computer-assisted assessment methods (Lukin, Dowd, Plake, & Kraft, 1985; Rozensky, Honor, Rasinski, Tovian, & Herz, 1986). Past studies found that several computer-assisted depression assessment methods were psychometrically reliable and valid measures (Greist & Klein, 1980; Helzer, Robins, Croughan, &Ratcliff, 1981; Kobak, Reynolds, Rosenfeld, & Greist, 1990; Steer, Rissmiller, Ranier, & Beck, 1994). Research also revealedthat many depressed patients preferred computer interactive interviewing over face-to-face interviewing (Carr, Ghosh, & Ancill, 1983; Lucas, Mullin, Luna, & McInroy, 1977). In addition, computerized interviewing may increase respondent self-disclosure in cases of discomfort with revealing sensitive issues, such as suicidal thoughts, to an interviewer (Moore, Summer, & Bloor, 1984).

Bilingual Computerized Screening 7

Computer-assisted psychological assessment also represents major advantages in test administration, time efficiency, accuracy, and cost effectiveness (Butcher, 1987). Structured computerized interviewing improves the quality, quantity, and integrity of clinical data by accurately transcribing, scoring and storing patient responses, standardizing administration procedures, and minimizing errors attributable to human oversight(Erdman, Klein, & Greist, 1985). Although an open-ended face-to-face interview gathers crucial clinical data, a clinician may inadvertently omit up to 35% of meaningful interview items (Climent, Plutchik, & Estrada, 1975; Simmons & Miller, 1971).

Computerized objective techniques also provide viable approaches to improve clinical assessment. Greist, Klein, Erdman, and Jefferson (1983) found that a computerized actuarial model interview was more accurate in predicting suicide attempts than clinicians who knew the patients. "Response latency" or response time to presented items is meaningful data (Slack, 1971;Stout, 1981). Research suggests that depressed individuals display significantly different verbal response latencies compared to non-depressed persons (Mandal, Srivastava, & Singh, 1990; Stassen, Bomben, & Günther, 1991). Response latencies may be related to psychomotor agitation or retardation and can play an important role in assessing depression (Nisonne, 1988). Computer technology serves as an effective psychological assessment tool, but it has not been widely tested across multiple cultures and languages. Computerized speech

Bilingual Computerized Screening 8

recognition, on the other hand, can be used to verbally administer mental health assessment in any language.Computerized Speech Recognition

In sum, a computerized speech recognition program administers a discrete choice questionnaire by presenting an item(visually on a computer screen or verbally by a pre-recorded prompt), recognizing a spoken response, and scoring the response.A pioneering study by Richards, Fine, Wilson, and Rogers (1983) tested a voice recognition system for administering the MinnesotaMultiphasic Personality Inventory (MMPI) to 32 disabled patients with limited hand function. The system visually displayed the MMPI items on a monitor, recognized the patient's verbal response, and generated a profile. The results indicated there were no significant differences between the MMPI profiles produced by the computerized speech recognition and paper-and-pencil methods.

Based on the potential of speech recognition technology and the imminent need for alternative depression screening methods inEnglish- and Spanish-speaking communities, the primary author andcolleagues developed several bilingual computerized speech recognition applications to screen depression. The following pilot studies evaluated the feasibility of voice-activated programs to screen depression in English and Spanish-speaking samples.

Muñoz, González, and Starkweather (1995) developed a bilingual speech recognition IBM-compatible prototype. The purpose of the study was to test the program with public sector

Bilingual Computerized Screening 9

primary care patients. The patients completed randomly ordered paper-and-pencil and speech recognition forms of the 20-item Center for Epidemiological Studies - Depression scale (CES-D). Acounterbalanced experimental design with 38 English- and Spanish-speaking depressed patients was conducted in an immediate test-retest sequence. For both language groups, the findings suggested that the two CES-D methods had similar total score means and variances, high ranked-order correlations, and high coefficient alpha reliability estimates. Although a majority of the patients in both groups were computer novices, the English-speakers preferred the computerized method because of perceptionsthat it was easier to use, captivating, and presented a feeling of personal interaction. Spanish-speakers, however, had no significant preference.

González (1993a) tested a Macintosh speech recognition CES-Dprototype with English-speaking adults in a university setting. A total of 68 participants completed both paper-and-pencil and speech recognition methods of the CES-D and a computer anxiety scale in a single-session counterbalanced study. The results suggested that the two CES-D methods displayed equivalent means and variances, strong alternate forms reliabilities, high internal consistency estimates, and correlated equivalently with the computer anxiety scale. There was no significant preference for either CES-D method (González, Spiteri, & Knowlton, 1995).

In another study by González (1993b), a speech recognition CES-D program presented by cellular telephone was tested with English- and Spanish-speaking populations. The purpose of the

Bilingual Computerized Screening 10

study was to evaluate the feasibility of the cellular telephone computer program for screening depression in general community settings. The advantages of a cellular telephone interview were portability; simplifying the program interface; reducing respondent anxiety by removing a visible computer terminal; and generating access for individuals not likely to utilize assessment services. In addition, Latinos may be more willing toprovide accurate answers to sensitive items in a telephone surveythan a face-to-face interview (Marín, Pérez-Stable, & Marín, 1989). Thirty Spanish- and 22 English-speakers completed both computer-telephone and face-to-face versions of the CES-D and a verbal depression checklist in immediate test-retest counterbalanced order. To simplify the presentation of the completely audio computer-telephone program, the standard CES-D discrete choices (less than 1 day, 1 to 2 days, 3 to 4 days, and 5 to 7 days), indicating the frequency of symptoms during the past week, was modified to a specific number of days (0 to 7) format.

The results suggested that for both language groups, the 0 to 7 days format significantly elevated the total scores, but total score variances did not differ for either method. In both samples, the CES-D modes yielded high internal consistency estimates, high alternate form reliabilities, and similarly high correlations to the depression checklist. Both groups reported high positive ratings for the two methods. The Spanish-speakers did not have a preference for either method, but English-speakerspreferred the computer-telephone method because it was more

Bilingual Computerized Screening 11

personable (González, Costello, Valenzuela, Chaidez, & Nuñez-Alvarez, 1995).

These pilot studies provided evidence that computerized speech recognition programs were generally reliable, valid, and equivalent to standard methods. The purpose of this current study was to evaluate the voice-activated computer program as a culturally- and linguistically appropriate tool for screening depression in English and Spanish. Equivalent face-to-face and cellular telephone speech recognition methods of a depression scale were compared. Analyses included examining the psychometric properties, feasibility of administration, and respondent acceptability of both methods. A major feature of this study was to record verbal response latency times. The aim was to assess the relationship of verbal response time to depression levels. An exploration of the association of acculturation to depression was undertaken for the Spanish-speaking sample.

Our research hypotheses applied to both CES-D methods by language.(a) The two methods would be equivalent in total score means and

variances.(b) The two methods would demonstrate high reliability

estimates.(c) The two methods would yield similar validity coefficients.(d) Respondents would report high acceptability for both

methods.

Bilingual Computerized Screening 12

(e) Response latencies for both methods would be related to depression levels.

MethodInstruments

Verbal instructions and structured interview items, developed for this study, were translated by a bilingual instructor with certification in Spanish/English translation. Published instruments, such as the CES-D, BDI, and SAS were presented in standardized translated form.

The Structured Interview Form: This form was developed for this study to structure the interviewer's verbal presentation of items and to record observations and participant reactions. The interviewers presented demographic items, such as participant gender, ethnicity, age, income, education, and computer experience. The interviewer noted observed and verbally expressed participant reactions to the computer-telephone and face-to-face CES-D methods. The interviewer also recorded participants' acceptability ratings of each CES-D method, queriedparticipants for a CES-D method preference, and inquired about their reasons for the preference.

Short Acculturation Scale: The Short Acculturation Scale (SAS) for Hispanics was developed by Marín, Sabogal, Marín, Otero-Sabogal, and Pérez-Stable (1989). The SAS consists of fouritems that assess primary (dominant) language use in general, at home, for thinking, and for speaking with friends. Participants rated themselves along a 5-point continuum (only Spanish, Spanishbetter than English, both equally, English better than Spanish,

Bilingual Computerized Screening 13

and only English). An average score of 2.99 or above indicates ahigh level of acculturation. The SAS demonstrates strong reliability and validity data (.40 to .76). The interviewer alsoasked each participant to self-identify ethnicity, family ancestry, and length of U.S residency.

Center for Epidemiological Studies-Depression Scale: The CES-D is a 20-item self-report scale designed to assess depression symptoms (Radloff, 1977). The scale includes four reverse scored items phrased in a non-depressive direction. The CES-D has an item response format with four possible choices (less than 1 day = 0, 1 to 2 days = 1, 3 to 4 days = 2, and 5 to 7 days = 3) to indicate frequency of symptoms during the preceding week. The total score varies from 0 to 60. In the general population, a score of 16 or greater suggests a high level of depressive symptoms (Comstack & Helsing, 1976; Weissman,Sholomskas, Pottenger, Prusoff & Locke, 1977). The CES-D demonstrates strong inter-item reliability (.80 to .90) and validity estimates (above .80) with English- and Spanish speakingpsychiatric and nonclinical populations (Mosciki, Locke, Rae, & Boyd, 1989; Roberts, 1980). In this study, the CES-D was administered in two formats: (a) face-to-face and (b) cellular telephone-assisted speech recognition form. The computer method was completely audio since it was presented by cellular telephone. It was necessary to generate simpler and more intuitive verbal item choices for the participant to easily recall. Thus, the CES-D response format was converted from the standard four discrete choices to the specific number of days (0

Bilingual Computerized Screening 14

= none and 7 = everyday) that the respondent reported symptoms during the past week.

Beck Depression Inventory: The BDI is a self-report instrument designed to assess the severity of depression symptoms(Beck, Rush, Shaw, & Emery, 1979). The BDI contains 21 sets of four depression-related statements rated on a scale from 0 to 3. The total score ranges from 0 to 63. Scores from 0 to 9 are described as "minimal;" 10 to 16 as "mild;" 17 to 29 as "moderate;" and 30 or greater as "severe" (Beck & Steer, 1993). The BDI is widely used in psychiatric and nonclinical populationsof numerous languages and demonstrates strong internal consistency reliability (above .80) and concurrent validity (.55 to .75) coefficients (Beck, Steer, & Garbin, 1988). The BDI served as the validity criterion in this present study.Procedures

Five interviewers (three monolingual English and two bilingual English/Spanish) recruited participants at two primary care medical facilities and one social service agency in the north San Diego county area. A single interviewer targeted a particular language group, approached a potential participant, explained the nature of the study, and sought voluntary participation. If the recruit expressed interest in the study, then the interviewer led the participant to a private setting, presented further expectations, and obtained informed written consent. Recruits were explicitly informed that they would not receive compensation for their participation. During the interview, the participant first answered demographic questions

Bilingual Computerized Screening 15

and the Short Acculturation Scale. Afterwards, the participant completed the CES-D twice in a randomly ordered sequence using the computer-telephone or face-to-face format. During the administration, the interviewer recorded observed and verbally expressed participant reactions and prompted the participant, as necessary. Verbal response times were recorded in the face-to-face method by the interviewer using a stop watch or timed by thecomputer program in the computer-telephone method. After completing each method, the participant was asked to express initial reactions to the method and to rate each on a scale from 1 (very negative) to 10 (very positive). In addition, if the twomodes of administration differed, then the participant was queried for a method preference and reasons for the preference. The participant then completed the BDI in paper-and-pencil. One English-speaking participant was unable to read, thus the interviewer read the items and recorded the participant's responses. Before ending the session, the researcher debriefed each participant and made a referral to counseling services if the BDI score was above 17 (moderate to severe depression levels).Cellular Telephone Computerized Speech Recognition Program Procedures

A Macintosh Centris 650 with 12 Megabytes (MB) random accessmemory (RAM), 240 MB hard disk storage, CD-ROM, and ImageWriter printer located at a secure university facility, accessible only to the research team, supported the program. The telephone-assisted speech recognition program was a HyperCard 2.1 stack

Bilingual Computerized Screening 16

(Claris, 1991) integrated with the Voice Navigator Classic 2.3.2 speaker-dependent speech recognition and telephony (telephone-able) application (Articulate Systems, 1993). A Motorola TVS200 Transportable cellular telephone with 3 watts of power and 45 minutes of continuous talk time provided the communication interface. The computer program verbally administered the CES-D over the telephone by playing prompts recorded with a male voice and employing speech recognition to interact with the respondent.The program presented a three to four minutes training segment togenerate a "template" of the respondent's speech characteristics for each discrete CES-D choice. The program stored and used the template to match the participant's spoken responses to the CES-Ditems. If the participant was assigned to complete the computer-telephone method twice, then the program skipped the template-training on the second administration and directly proceeded to the CES-D items.

The following is a summary of the entire computer-telephone interview procedures. Prior to calling the computer, the interviewer presented the respondent with verbal instructions forcompleting the computer-telephone method. When the respondent was ready to begin, the interviewer called the computer. The program answered with a male voice and prompted the interviewer to enter the respondent's three digit identification number, language, and sequence (template-training or items-only) using assigned touch-tone digits. After entering the data, the interviewer immediately handed the telephone to the respondent. The program presented a greeting in the participant's language,

Bilingual Computerized Screening 17

gave the respondent explicit instructions for template-training, and prompted the respondent to repeat each phrase in three distinct tones (high, medium, and low). The program used the three repetitions to build and store an "average" template of therespondent's voice characteristics for saying each phrase.

After template-training, the program presented the CES-D items. The program instructed the respondent to verbally respondto each CES-D item according to "the actual number of days that he or she felt like the statement during the past seven days, 0 =none and 7 = everyday." Although the respondent was instructed to use a spoken response, the option to press the corresponding touch-tone digit was presented, if voice recognition difficultiesarose. The respondent could say "What" or press the "*" button to repeat the CES-D item and say "Help" or press the "#" button to repeat the initial instructions and the current item. The program presented a CES-D item and waited for the participant's response for up to one minute. The program recorded verbal response latency times for each item which began from the end of the item prompt until the first utterance was "recognized." The program also computed the accuracy level for matching a spoken response with the template. An 85% confidence level for recognition accuracy was necessary to minimize the incorrect matching of responses because of a poorly developed template. Ifrecognized at the 85% or greater level, then the computer beeped once and presented the next item. If there was less than 85% recognition accuracy, then the program would say "Pardon" and wait for the respondent to repeat the utterance. If the

Bilingual Computerized Screening 18

interviewer observed a respondent experiencing excessive recognition difficulty, then the interviewer prompted the respondent to use the touch-tone button. During the interaction,the program recorded all "recognized" responses, saved the accuracy level, and stored the number of times an item was repeated when "Help" or "What" was expressed. Upon completing the 20 CES-D items, the program thanked the respondent, requestedthat the interviewer be advised, and hung up. The program scoredthe responses, saved the data, and printed the results at the university facility accessible only to the research team.Design

The study employed a single session counterbalanced 2 x 4 (x 2) Language x Order x Time experimental design. The two CES-D methods were administered in random order, resulting in four possible immediate test-retest sequences: (a) face-to-face first,computer-telephone second, (b) computer-telephone first, face-to-face second, (c) face-to-face twice, or (d) computer-telephone twice. The English-speakers distributed equally among the randomsequences (8 per cell). However, since fewer Spanish-speakers were recruited, the random sequences were not counterbalanced forthe Spanish language group. For example, the face-to-face methodpreceded the telephone screening for 7 Spanish-speaking respondents, while 6 responded to the telephone method prior to the face-to-face method. Six of the Spanish-speakers conducted the face-to-face method twice and 4 completed the telephone method twice. Psychometric analyses of the computer-telephone and face-to-face methods examined the equivalence of total score

Bilingual Computerized Screening 19

means and variances as well as reliability estimates for alternate forms and internal consistency. The BDI served as the independent validity criterion for the CES-D methods. Participant acceptability included acceptance ratings of each CES-D method and participant preference and comfortability between the two methods. Finally, the relationships of verbal response times and acculturation levels to depression scores wereevaluated.

ResultsOur analyses examined the feasibility of recruitment, sample

characteristics, psychometric properties (equivalence of means and variances, reliability, and validity), respondent acceptability (acceptance ratings, method preference, and comfortability with method), feasibility of administration (respondent behavioral and verbal reactions), and respondent speech behavior (verbal response latency and voice recognition accuracy levels) for both CES-D methods.Sample

Of the 212 persons (68% were English-speakers) asked to participate, 154 declined (73% English-speakers) and 58 accepted (55% English-speakers). Three Spanish-speaking participants accepted but did not initiate the interviews and three initiated but did not complete the interviews. The final sample utilized for analyses (N = 55) consisted of 32 English- (66% male) and 23 Spanish-speaking participants (52% male). Self-identified ethnicity of the Spanish-speaking sample revealed that all were of Mexican ancestry. Race and ethnicity among the English-

Bilingual Computerized Screening 20

speakers were European American (66%), African American (6%), American Indian (6%), Latino American (6%), and Other (16%). Thus, the English-speaking sample was predominately comprised of White males and the Spanish-speaking sample was entirely of Mexican heritage, balanced for gender, but fewer in number.

Participant's age ranged from 17 to 74 (Spanish-speaking: M = 31.17,SD = 9.21; English-speaking: M = 45.06, SD = 14.40). To account for the significant differences in variances between the two groups, a "separate variance" independent samples t test computation was employed. Results demonstrated significant age differences between the two groups, t (52.41) = 4.36, p < .001, two-tailed. Formal education varied from 2 to 18 years (Spanish-speaking: M = 9.43, SD = 4.13; English-speaking: M = 13.38, SD = 2.08). A separate variance t test revealed that educational levels were significantly different, t (30) = 4.21, p < .001, two-tailed. Reported total family annual income ranged from 0 to$87,000 (Spanish-speaking: M = $10,435.16, SD = $4,504.45; English-speaking: M = $24,326.61, SD = $25,339.50). Analysis of family income using a separate variance t test displayed significant differences, t (33.03) = 2.98, p < .005, two-tailed. Computer experience was based on a self-reported rating scale ranging from 0 (novice) to 5 (proficient). Both groups reported a low-level of computer experience (Spanish-speaking: M = 1.17, SD = 1.47; English-speaking: M = 1.63, SD = 1.70), but levels were not significantly different, t (53) = 1.03, two-tailed. TheSpanish-speaking sample was younger, had less formal education,

Bilingual Computerized Screening 21

and reported lower incomes. However, the significant dissimilarities in age, education, and income were confounded by the small and unequal sample sizes. Thus, these differences between the language groups were interpreted with caution. The sample demographic characteristics are summarized in Table 1.

Insert Table 1 about here

Psychometric PropertiesTable 2 summarizes the total score means and variances of

the first administration of both CES-D methods by language group.A two-between, one-within Group Order x Language (x Time) Multivariate Analysis of Variance (MANOVA) analyzed CES-D total scores. Since the assumption for homogeneity of dispersion matrices was violated, Box's M = 46.84; F (21, 3343) = 1.86, p <.01, suggesting some unequal CES-D total score variances, a Huynh-Feldt epsilon correction was employed (Norusis, 1992). TheMANOVA did not find any main effects for group order, F (3, 45) =1.18; language, F (1, 45) = 1.32; and time, F (1, 45) = 2.42. However, there was a significant interaction between time and language on CES-D totals, F (1, 45) = 4.68, p < .04. The resultssuggested that there were no order effects or language differences. However, the interaction indicated that English-speakers' scores decreased and Spanish-speakers' scores increasedacross time, regardless of method.

Inter-item consistency analyses performed on both CES-D methods yielded high coefficient estimates for both groups.

Bilingual Computerized Screening 22

Immediate test-retest reliability coefficients for each CES-D method were also high, although the cell sizes were quite small for both groups. Alternate form reliabilities were assessed between the face-to-face and computer-telephone methods. These Pearson coefficients were high for both groups, r (14) = .97, p <.001, one-tailed, for English-speakers, and r (7) = .83, p < .003, one-tailed, for the Spanish-speakers. These results lentsupport for the strong psychometric reliability of the two CES-D methods.

To assess the criterion validity of the two CES-D methods, Pearson correlations were computed between the BDI scores and both CES-D methods within each language. Strong significant coefficients were obtained for the two CES-D methods in both groups. An analysis of the non-independent correlations between the two CES-D methods yielded no significant differences for the English-speakers, t (13) = 1.30, two-tailed, and Spanish-speakers, t (6) = .73, two-tailed. The similarly high coefficients provided evidence of the psychometric validity for two methods. Table 2 displays the reliability and validity coefficients of both methods by language.

The relationship of the Spanish-speakers' acculturation levels to depression scores was explored. The relationship was not evaluated for the English-speakers because it was a dominantly non-Hispanic White sample. Acculturation scores ranged from 1 to 3.75 (M = 1.58, SD = 0.82) indicating a low level of acculturation. No significant correlations emerged between acculturation and CES-D total scores for the face-to-face

Bilingual Computerized Screening 23

method, r (17) = -.03, two-tailed, or the computer-telephone method, r (13) = -.28, two-tailed. Acculturation levels and BDI scores were also not significantly correlated, r (17) = -.09, two-tailed. Thus, there was no significant association between acculturation levels and depression symptoms in our sample. Table 2 summarizes the results.

Insert Table 2 about here

AcceptabilityEnglish-speakers reported a mean acceptance rating of 8.13

(SD = 1.93) for the first administered face-to-face CES-D and 7.50 (SD = 2.67) for the first computer-telephone screening. Spanish-speakers reported a mean rating of 7.30 (SD = 2.01) for the first face-to-face administration and 7.30 (SD = 1.70) for the first computer-telephone interview. A two-between, one-within Language x Group Order (x Time) MANOVA was performed on the CES-D ratings. Once again, a Huynh-Feldt correction was computed because of heterogeneous dispersion matrices, Box's M = 32.81; F (15, 5907) = 1.88, p < .021. The results suggested that there were no main effects for group order, F (3, 44) = 1.44; language F ( 1, 44) = 1.01; and time F (1, 44) = 2.11. Again, there were no order effects. The two groups highly rated both methods and these ratings did not significantly differ by language. See Table 3.

Spanish-speakers who completed both CES-D methods reported no significant preference for either the face-to-face or

Bilingual Computerized Screening 24

computer-telephone method, 2 (1, N = 12) = 1.33. However, English-speakers significantly preferred the face-to-face method,2 (1, N = 16) = 6.25, p < .012. English-speakers reported primary reasons for preferring the face-to-face method as "more comfortable" (31%) and "personable" (31%) and a major secondary reason as "more interactive" (46%). Participants were also askedto indicate between the different methods, "with which were you most comfortable saying a personal response." Spanish-speakers reported no difference in comfort between the face-to-face and computer-telephone method, 2 (1, N = 12) = 1.33. English-speakers, however, reported more comfortability with the face-to-face, 2 (1, N = 16) = 4.00, p < .05. A secondary correlational analysis indicated that computer experience was not related to method preference for the Spanish-speakers, r (10) = .14, two-tailed, and English-speakers, r (14) = -.05, two-tailed. Thus, when both methods were presented to the same respondent, Spanish-speakers did not report a preference for either method. However,the English-speakers preferred and were more comfortable with theface-to-face method.Feasibility of Administration

The ease of administration for both CES-D methods was assessed, with a particular focus on the computer-telephone program. The interviewer recorded participant behavioral and verbally expressed reactions to each method. Similarly noted reactions were grouped into general categories. Idiosyncratic responses that were very dissimilar were grouped under the category "Other." No apparent observed or expressed concerns

Bilingual Computerized Screening 25

during a session were designated as "None." Among English-speakers, there were no outstanding observed or verbally expressed participant concerns with the first face-to-face method. For the first computer-telephone interviews in the English-speaking group, a prominent category of interviewer observations was "the participant's voice differed from template training" (37%, n = 16) meaning that the tone of the participant's voice changed during the course of the interview. Otherwise, no observed concerns (31%) were noted. For the first face-to-face interviews among Spanish-speakers, no concerns was the most cited behavioral observation (69%, n = 13) as well as the absence of verbally expressed concerns (54%). Overall, therewere no major trends in the participants' reactions to the first computer-telephone interviews. About 31% of English- and half ofthe Spanish-speaking participants verbal reactions in the computer-telephone interviews were categorized as "Other." The idiosyncratic responses were unrelated between the English and Spanish-speakers. The results suggested that there was not a single common or prevailing concern with the computer-telephone program.

To assess the equivalence of administration times, the computer-telephone and face-to-face modes were analyzed using twoprocedures: (a) comparing the total interview times of both methods, including the voice template-training times for the computer-telephone method; and (b) comparing the total interview times minus the template-training times (presentation of the items only). Our analyses focused on the first administration

Bilingual Computerized Screening 26

times for both methods since the second administration time was confounded by the participant's prior exposure to the CES-D items. In addition, we did not compare administration times between language groups because it generally took longer (or shorter) to verbally present equivalent instructions or items in one language compared to the other. Thus, our analyses focused on within language data. Among English-speakers, the mean total administration time for the face-to-face method (286.56 seconds, SD = 106.64) was less than the computer-telephone method with training time (425.13 seconds, SD = 85.81). However, the items-only mean administration time for the computer-telephone method without template-training was 220.31 seconds (SD = 84.32). The mean total administration time for the face-to-face method among Spanish-speakers (300.15 seconds, SD = 85.60) was also less than the computer-telephone method (508.70 seconds, SD = 107.39). Theitems-only mean administration time for the computer-telephone method was 290.40 seconds (SD = 79.21). Overall, the total computer-telephone interviews took significantly longer to complete than first face-to-face interviews for English- and Spanish-speakers, F (1, 30) = 16.40, p < .0003; and F (1, 21) = 26.92, p < .0001, respectively. However, when comparing items-only computer-telephone times with face-to-face total times, there was no significant differences between methods for English-and Spanish-speakers, F (1, 30) = 3.80 and F (1, 21) = .08, respectively. The results supported the equivalence of administration times for the computer-telephone and face-to-face method when template-training time was eliminated. See Table 3.

Bilingual Computerized Screening 27

Analyses on the ease of administration for the computer-telephone program included the number of interviewer interventions and participants' use of touch-tone rather than voice responses. On occasion, the interviewers had to stop the interview because of a "frozen" program resulting from a poorly trained respondent voice template. The number of times that interviewers had to intervene and redial the phone to reconnect with the computer program ("callbacks") was recorded. Out of 53 computer-telephone administrations, 13 required a single callback(25%), however, no session had more than one callback. Among the33 English language computer-telephone sessions, seven (21%) had callbacks. Of the 20 Spanish language computer-telephone sessions, six (30%) had callbacks.

Some participants experienced excessive voice recognition difficulty in answering an item. For example, the respondent voiced the response at least three times, but could not get recognition from the program. As a result, respondents would be prompted to use the telephone touch-tone keys for a response. The number of times that participants reverted to telephone touch-tone, rather than voice, during the computer-telephone administration was recorded. Twelve English- (36%) and nine Spanish-speaking (45%) participants reverted to touch-tone. Seven English- (21%) and five Spanish-speaking (25%) participantsused the touch-tone between one and three times. Five English (15%) and four Spanish-speaking (20%) participants used the touch-tone between four and seven times. Essentially, there are minor limitations with the computer-telephone interview requiring

Bilingual Computerized Screening 28

infrequent callbacks and respondents occasionally reverting to touch-tone rather than voice. These limitations seem largely attributable to poor voice recognition based on a poorly developed voice template.Speech Behavior

A major objective was to assess the relationship of verbal response time to depression scores. Participants' response latencies to the first presentation of each CES-D item during thefirst administration of each method were recorded. The average response latency for English-speaking participants on the face-to-face was 2.16 seconds (SD = .69) and the computer-telephone was 2.59 seconds (SD = .94). The Spanish-speakers average response latency for the face-to-face method was 2.34 seconds (SD= .75) and the computer-telephone was 3.03 seconds (SD = 1.30). The average response latencies were analyzed using a 2 x 2 (Method x Language) ANOVA. The computer-telephone method had a significantly higher average response latency than face-to-face, F (1, 50) = 4.66, p < .04. However, there was no significant main effect for language group, F (1, 50) = 1.39, and no significant interaction, F (1, 50) = .26. Since there were no significant language differences, the average response latencies of the two groups were combined. The average response latency for the entire sample in the face-to-face method was 2.44 seconds(SD = .71) and 2.76 seconds (SD = 1.09) in the computer-telephonemethod. Pearson correlation analysis revealed, irrespective of language, that average response latencies were positively relatedto CES-D total scores for the face-to-face method, r (26) = .38,

Bilingual Computerized Screening 29

p < .05, two-tailed and the computer-telephone mode, r (24) = .45, p <.03, two-tailed. The results suggested that verbal response latency was significantly higher for persons with high depression symptom levels for both methods. The computer-telephone method also displayed significantly higher latencies for both groups. See Table 3.

Insert Table 3 about here

A secondary analysis of speech behavior was the recognition accuracy of the computer-telephone program. The accuracy level reflected the ease (or difficulty) of the computer program to recognize the participant's spoken response. The program calculated a recognition accuracy score for matching a participant's spoken response to the stored voice template. The accuracy score ranged from 0% (a complete non-match) to 100% (a perfect match). The recognition accuracy scores were based on averages of the first presentation of each CES-D item during the first computer-telephone administration. The accuracy average for English-speaking participants was 86% (SD= 6.90) and 85% (SD = 6.60) for Spanish-speakers, indicating high accuracy for matching spoken responses in both groups. Independent t-test analysis revealed that the accuracy averages for two language groups were not significantly different, t (30) = .38, two-tailed. Since there were no significant language differences, the recognition accuracies of both groups were combined to evaluate their relationship to depression scores. Pearson

Bilingual Computerized Screening 30

correlation analysis revealed, irrespective of language, that therecognition accuracy averages were inversely related to depression scores for the CES-D, r (30)= -.37, p < .04, two-tailed, and BDI, r (27)= -.47, p < .02, two-tailed. Thus, lower recognition accuracy was associated with higher depression levels. In other words, a symptomatic respondent tended to experience more difficulty with achieving good voice recognition.

DiscussionThe purpose of this exploratory study was to test a cellular

telephone-assisted computerized speech recognition program for screening depression symptoms in English and Spanish. We evaluated the computer-telephone program through comparisons to an equivalent face-to-face method. We analyzed the psychometric properties, respondent acceptability, response latency times, andfeasibility of administration for both methods. We expected to make some general conclusions about the practicality of the computer-telephone program as a culturally- and linguistically appropriate tool for screening depression in English and Spanish.

As expected, the computer-telephone CES-D method demonstrated strong psychometric properties for reliability estimates, validity coefficients, and equivalency of total scoresmeans. Although both groups reported low computer experience, both highly rated the computer-telephone and face-to-face methods. Computer experience was not related to method preference, but surprisingly, English-speakers preferred the face-to-face method over the computer-telephone. The group's preference related to feelings of personability and

Bilingual Computerized Screening 31

comfortability. This contradicted earlier studies that consistently found English-speakers preferring the speech recognition methods (González et al., 1995). We might expect that Latinos were more likely to prefer the face-to-face method because of the cultural emphasis on "personalismo" (Marín & Marín, 1991). In our study, the Spanish-speaking sample reportedlow acculturation levels, yet found the computer-telephone methodas acceptable as the face-to-face interview. Obviously, computerized interviewing is not for everyone since many prefer social contact over non-human interaction.

In general, our findings were consistent with the literatureon the equivalency and positive acceptability of computer-assisted psychological assessment (Donnelly, Rosenberg, & Fleeson, 1970; González, 1995; Honaker, 1988). Admittedly, the computer-telephone did take significantly longer to complete because of template-training, however, administration times for presenting the CES-D items only were comparable to the face-to-face interview. The results of this study supported the strong potential of the computer-telephone program to serve as a viable culturally- and linguistically appropriate tool for screening depression in English and Spanish. Limitations

There were some clear limitations in our exploratory study. Small and self-selected sampling from a few field settings limited the generalizability of the findings. Significant language group differences in age, education, and income because of unequal samples sizes confounded our comparisons. Controlling

Bilingual Computerized Screening 32

for these demographic variables would minimize the confounds in future studies. Another important issue included the need for a closer examination of gender issues. Men and women may communicate and respond differently during verbal interaction andmeasurement (Tannen, 1990, Tavris, 1992). In this study, participants responded to the same pre-recorded male voice on thecomputer program. Yet, there is evidence that the willingness ofmen and women to self-disclose may significantly differ dependingon the ethnicity and gender of the interviewer (Snell, Miller, Belk, & García-Falconi, 1989). Future computer programs need to present male and female voices and assess if there are differences in self-disclosure.

In this study, there were some instances of unequal total score variances for the two CES-D methods in both languages. Thevariability may be related to extreme response styles (the tendency to choose the extreme ends of a response scale), which may occur among Latinos (Marín, Gamba, & Marín, 1992), alterations in participants' responses because items are verballypresented (Most, 1987), or measurement error (Shavelson, Webb, & Rowley, 1989). In-depth analyses of these issues need to be considered in future studies. Lastly, our statistical approach to confirm the null hypothesis for establishing the equivalency of methods, as in many tests of equivalence, was problematic since the alternative hypothesis is usually proposed for confirmation in hypotheses testing. Rogers, Howard, and Vessey (1993) and Goldstein (1994) recommend assessing confidence intervals for estimating equivalency of scores. Wilson, Genco,

Bilingual Computerized Screening 33

and Yager (1985) suggest employing generalizability theory to partition the sources of variance and measurement error as a procedure for determining equivalency. These statistical analyses will be employed in subsequent studies.Future Directions

The norms and cut off scores of the modified computer-telephone CES-D require further study. The CES-D shows evidence of strong sensitivity for detecting actual cases of clinical depression, but poor specificity for identifying false-positives since it taps into other forms of distress, such as anxiety disorders (Fechner-Bates, Coyne, & Schwenk, 1994). In our study,the participants were not screened for other mental health problems. It would be appropriate rule out other psychiatric andmedical disorders in order to evaluate the usefulness of the CES-D. Schulberg et al. (1985) argue that a CES-D score of 27 is a more accurate cut off for identifying high depression levels in primary care populations because of the co-morbidity of affectiveand physical disorders. In addition, some studies suggest that the factor structure of the CES-D may vary across gender and ethnicity among Latinos (García & Marks, 1989; Guarnaccia, Angel,& Worobey, 1989). In our study, we modified the standard CES-D response format by doubling the number of possible choices. The change likely contributed a positive "mean shift" or constant increment that elevated the total scores (González, 1995; Hofer &Green, 1985). Thus, exploring the restandardization of norms andcut point scores for the speech recognition CES-D in English- andSpanish-speaking populations would be appropriate.

Bilingual Computerized Screening 34

Current speech-recognition technology demonstrates both significant constraints and advantages. For example, changes in respondent's tone, pitch, or inflection results in decreased accuracy for "speaker-dependent" speech-recognition systems whichrequire template-training, such as our present computer-telephoneprogram (Noyes, Haigh, & Starr, 1989). In our study, about one-fourth of the respondents experienced some level of difficulty with our speaker-dependent computer program which required interviewer prompting or intervention. In addition, a noticeableproportion of the respondents reverted to touch-tone rather than voice responses. These limitations were largely attributable to difficulties with the template-training segment of the program which resulted in a poorly developed voice template and subsequently impaired voice recognition accuracy.

In other developments, new "speaker-independent" continuous speech recognition technology offers promising advancements (Kloosterman, 1994). Speaker-independent systems based on basic sounds and syntax of a particular language do not require template-training. The systems are adaptable to continuous speech and natural pauses in language. As we saw in our study, the elimination of template-training would improve the flexibility and time efficiency of administration for a variety of multilingual populations.

Our findings compel further research on the speech behavior of adults suspected of having depression. The relationship of response latency to depression levels was an important finding. Longer response latency may serve as a useful clinical marker

Bilingual Computerized Screening 35

because it is mediated by cognitive, affective, and physiologicalfactors stemming from psychomotor retardation (Vanger, Summerfield, Rosen, & Watson, 1992). In our study, speech recognition accuracy also served as a general indicator of depression level for both language groups. Respondents who had higher symptoms levels tended to have greater difficulty with voice recognition because of poor template-training. Therefore, the closer examination of human-computer verbal interaction may yield meaningful clinical data. Finally, the assessment of speech characteristics can contribute to the objective measure ofdepression (Kuny & Stassen, 1993; Scherer & Zei, 1988). Spectralanalyses of speech characteristics found voice fundamental frequency, frequency variation, and speech rates to be moderate predictors of depression (Breznitz, 1992; Darby, Simmons, & Berger, 1984; Hargreaves & Starkweather, 1964; Nilsonne, Sunberg,Ternstrom, & Askenfelt, 1988). Exploring the analyses of speech behavior is critical toward developing objective multilingual methods of assessing depression that extend beyond the limitations of self-report; interviewer, respondent, and culturalbiases; and the poor specificity of current screening instruments.

Ultimately, the aim of developing bilingual computerized speech recognition programs is to employ them for screening in randomly controlled depression prevention research trials with English- and Spanish-speaking populations. Individuals with highdepression levels, but not clinically depressed according to diagnostic interviews, are suitable candidates for prevention

Bilingual Computerized Screening 36

interventions (Muñoz, 1993). A culturally and linguistically reliable and valid speech computerized screener can be an effective tool for identifying persons with significant depression symptoms across diverse populations. Accurately detecting more symptomatic cases earlier and providing preventionservices will help reduce the huge social pain and high economic expense attributable to the misdiagnosis and inappropriate treatment of depression (Greenberg et al., 1993; Stoudemire, Frank, Kamlet, & Hedemark, 1987). Computerized screening and depression prevention strategies are crucial toward increasing accessibility to mental health services and minimizing the occurrence of clinical depression in multilingual and multicultural populations.

Bilingual Computerized Screening 37

Author's NoteThe primary author acknowledges the support of the CSUSM

Social and Behavioral Research Institute and the CSUSM Faculty Affirmative Action grant for developing and testing the computer program. I thank Francisco Moreno for his contributions to the data collection and manuscript.

Bilingual Computerized Screening 38

ReferencesArticulate Systems. (1993). Voice Navigator 2.3.2.

[Computer program] Cambridge, MA: Articulate Systems.Beck, A.T., Rush, A.J., Shaw, B.F., & Emery, G. (1979).

Cognitive therapy of depression. New York: Guilford Press.Beck, A.T. & Steer, R.A. (1993). Beck Depression Inventory

Manual. Orlando, FL: The Psychological Corporation.Beck, A.T., Steer, R.A., & Garbin, M. (1988). Psychometric

properties of the Beck Depression Inventory: Twenty-five years ofevaluation. Clinical Psychology Review, 8, 77-100.

Bernal, M.E. & Castro, F.G. (1994). Are clinical psychologists prepared for service and research with ethnic minorities? American Psychologist, 49, 797-805.

Breznitz, Z. (1992). Verbal indicators of depression. TheJournal of General Psychology, 119, 351-363.

Broadhead, W.E., Clapp-Channing, N.E., Finch, J.N., & Copeland, J.A. (1989). Effects of medical illness and somatic symptoms on treatment of depression in a family residency practice. General Hospital Psychiatry, 11, 194-200.

Butcher, J.N. (1987). Computerized psychological assessment: A practitioner's guide. New York: Basic Books.

Carr, A.C., Ghosh, A., & Ancill, R.J. (1983). Can a computertake a psychiatric history? Psychological Medicine, 13, 151-158.

Claris. (1991). HyperCard 2.1. [Computer program] Santa Clara, CA: Claris.

Bilingual Computerized Screening 39

Climent, C.E., Plutchik, R, & Estrada, H. (1975). A comparison of traditional and symptom checklist-based histories. American Journal of Psychiatry, 132, 450-453.

Comstack, G.W. & Helsing, K.J. (1976). Symptoms of depression in two communities. Psychological Medicine, 6, 551-563.

Darby, J.K., Simmons, N., & Berger, P.A. (1984). Speech and voice parameters of depression: A pilot study. Journal of Communication Disorders, 17, 75-85.

Donnelly, J., Rosenberg, M., & Fleeson, W.P. (1970). The evolution of mental status - past and future. American Journal of Psychiatry, 126, 997-1002.

Erdman, P.E., Klein, M.H. & Greist, J.H. (1985). Direct patient interviewing. Journal of Consulting and Clinical Psychology, 53, 760-773

Fechner-Bates, S., Coyne, J.C., & Schwenk, T.L. (1994). The relationship of self-reported distress to depressive disorders and other psychpathology. Journal of Consulting and Clinical Psychology, 62, 550-559.

García, M. & Marks, G. (1989). Depressive symptomatology among Mexican American adults: An examination of the CES-D scale.Psychiatry Research, 27, 137-148.

Goldstein, R. (1994). Equivalency testing. Stata Technical Bulletin, STB-17, 13-18.

González, G.M., Costello, C.R., Valenzuela, M., Chaidez, B.,& Nuñez-Alvarez, A. (1995). Bilingual computerized speech-recognition screening for clinical depression: Evaluating a

Bilingual Computerized Screening 40

cellular telephone prototype. Behavior Research Methods, Instruments, & Computers, 27, 476-482.

González, G.M., Spiteri, C.B., & Knowlton, J. (1995). A computerized speech recognition pilot study for screening depressive symptoms. Computers in Human Behavior, 11, 85-93

González, G.M. (1993a). Computerized speech recognition inpsychological assessment: A Macintosh prototype for screening depressive symptoms. Behavior Research Methods, Instruments, & Computers, 25, 301-303.

González, G.M. (1993b). A computerized speech recognition telephone application for screening clinical depression. Proceedings of the 17th Annual Symposium on Computer Applicationsin Medical Care (p. 936), Washington, DC.

Greenberg, P.E., Stiglin, L.E., Finnelstein, S.N., & Berndt,E.R. (1993). The economic burden of clinical depression. Journal of Clinical Psychiatry, 54, 405-418.

Griest, J.H. & Klein, M.H. (1980). Computer programs for patients, clinicians, and researcher in psychiatry. In J.B. Sidowski, J.H. Johnson, & T.A. Williams (Eds.), Technology in mental health care delivery systems (pp. 161-182). Norwood, NJ: Ablex.

Greist, J.H., Klein, M.H., Erdman, H.P., & Jefferson, J.W. (1983). Clinical computer applications in mental health. Journal of Medical Systems, 7, 175-185.

Guarnaccia, P.J., Angel, R., & Worobey, J.L. (1989). The factor structure of the CES-D in the Hispanic Health and

Bilingual Computerized Screening 41

Nutrition Examination Survey: The influences of ethnicity, gender, and language. Social Sciernce Medicine, 29, 85-94.

Hargreaves, W.A. & Starkweather, J.A. (1964). Voice quality changes in depression. Language and Speech, 7, 84-88.

Helzer, J.E., Robins, L.N., Croughan, J.L., & Ratcliff, K.S.(1981). National Institute of Mental Health Diagnostic InterviewSchedule: Its history, characteristics, and validities. Archivesof General Psychiatry, 38, 381-389.

Honaker, L.M. (1988). The equivalency of computerized and conventional MMPI administration: A critical review. Clinical Psychology Review, 8, 561-577.

Hough, R.L., Landsverk, J.A., Karno, M., Burnam, M.A., Timbers, D.M., Escobar, J.I. & Regier, D.A. (1987). Utilizationof health and mental health services by Los Angeles Mexican Americans and Non-Hispanic Whites. Archives of General Psychiatry, 44, 702-709.

Katon, W. (1987). The epidemiology of depression in medical care. International Journal in Medicine, 17, 93-112.

Kloosterman, S.H. (1994). Design and implementation of user-oriented speech recognition interface: The synergy of technology and human factors. Interacting with Computers, 6, 41-60.

Kobak, K.A., Reynolds, W.M., Rosenfeld, R., & Greist, J.H. (1990). Development and validation of a computer-administered version of the Hamilton Depression Rating Scale. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 56-63.

Bilingual Computerized Screening 42

Kuny, S. & Stassen, H.H. (1993). Speaking behavior and voice sound characteristics in depressive patients during recovery. Journal of Psychiatric Research, 27, 289-307.

Lucas, R.W., Mullin, P.J., Luna, C.B.X. & McInroy, D.C. (1977). Psychiatrists and a computer as interrogators of patients with alcohol-related illnesses: A comparison. British Journal of Psychiatry, 131, 160-167.

Lukin, M.E., Dowd, E.T., Plake, B.S., & Kraft, R.G. (1985).Comparing computerized versus traditional psychological assessment. Computers in Human Behavior, 1, 49-58.

Mandal, M.K., Srivastava, P., & Singh, S.K. (1990). Paralinguistic characteristics of speech in schizophrenics and depressives. Journal of Psychiatric Research, 74, 191-196.

Marín, G., Gamba, R.J., & Marín, B.V. (1992). Acquiescenceand extreme response sets among Hispanics. Journal of Cross-Cultural Psychology, 23, 498-509.

Marín, G. & Marín, B.V. (1989). Research with Hispanic populations. Newbury Park, CA: Sage Publications.

Marín, G., Pérez-Stable, E.J., & Marín, B.V. (1989). Cigarette smoking among San Francisco Hispanics: The role of aculturation and gender. The American Journal of Public Health, 79, 196-198.

Marín, G. , Sabogal, F., Marín, B.V., Otero-Sabogal, R., andPérez-Stable, E.J. (1989). Development of a short accultuation scale for Hispanics. Hispanic Journal of Behavioral Sciences, 9,183-205.

Bilingual Computerized Screening 43

Moore, N.C., Summer, K.R. & Bloor, R.N. (1984). Do patients like psychometric testing by computer? Journal of Clinical Psychiatry, 40, 875-877.

Mosciki, E. K., Locke, B.Z., Rae, D.S., & Boyd, J.H. (1989). Depressive symptoms among Mexican Americans: The Hispanic health and nutrition examination survey. American Journal of Epidemiology, 120, 348-360.

Most, R. (1987). Levels of error in computerised psychological inventories. Applied Psychology: An International Review, 36, 375-383.

Muñoz, R.F., (1993). The prevention of depression: Currentresearch and practice. Applied & Preventive Psychology, 2, 21-33.

Muñoz, R.F., González, G.M., & Starkweather, J. (1995). Automated screening for depression symptoms: Toward culturally and linguistically appropriate uses of computerized speech recognition. Hispanic Journal of Behavioral Sciences, 17, 194-208.

Muñoz, R.F., Ying, Y.W. (1993). The prevention of depression: Research and practice. Baltimore, MD: Johns Hopkins University Press.

Nilsonne, Å., (1988). Speech characteristics as indicatorsof depressive illness. Acta Psychiatra Scandinavia, 77, 253-263.

Nilsonne, Å., Sunberg, J., Ternstrom, S., & Askenfelt, A. (1988). Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. Journal of the Acoustical Society of America, 83, 716-728.

Bilingual Computerized Screening 44

Norusis, M.J. (1992). SPSS for Windows: Advanced Statistics Release 5. Chicago, IL: SPSS, Inc.

Noyes, J.M., Haigh, R., & Starr, A.F. (1989). Automatic speech recognition for disabled people. Applied Ergonomics, 20, 293-298.

Organista, K.O., Muñoz, R.F. & González, G.M. (1994). Cognitive-behavioral therapy with low-income medical patients: Utilization and outcome. Cognitive Therapy and Research, 18, 241-259.

Pérez-Stable, E.J., Miranda, J., Muñoz, R.F. & Ying, Y.W. (1990). Depression in medical outpatients: Underrecognition and misdiagnosis. Archives of Internal Medicine, 150, 1083-1088.

Radloff, L.S. (1977). The CES-D scale: A self-report depression scale for research in the general population. AppliedPsychological Measurement, 1, 385-401.

Richards, J.S., Fine, P.R., Wilson, T.L., & Rogers, J.T. (1983). A voice-operated method for administering the MMPI. Journal of Personality Assessment, 47, 167-170.

Robins, L.N., Helzer, J.E., Orvaschel, H., Anthony, J.C., Blazer, D.G., Burnam, A. & Burke, J.D., Jr. (1985). Chapter 8. In W.W. Eaton & L.G. Kessler (Eds.), Epidemiological field methods in psychiatry, NIMH Epidemiological Catchment Area program. New York: Academic Press.

Rogers, J.L., Howard, K.I., & Vessey, J.T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553-565.

Bilingual Computerized Screening 45

Rozensky, R.H., Honor, L.F., Rasinski, K., Tovian, S.M., & Herz, G.I. (1986). Paper-and-pencil versus computer-administered MMPIs: A comparison of patients' attitudes. Computers in Human Behavior, 2, 111-116.

Scherer, K.R. & Zei, B. (1988). Vocal indicators of affective disorders. Psychotherapy and Psychosomatics, 49, 179-186.

Schulberg, H.C., Saul, M., McClelland, M., Ganguli, M., Christy, W., & Frank, R. (1985). Assessing depression in primary medical and psychiatric practice. Archives of General Psychiatry, 42, 1164-1170.

Shavelson, R.J., Webb, N.M., & Rowley, G.L. (1989). Generalizability theory. American Psychologist, 44, 922-932.

Simmons, E.M. & Miller, O.W. (1971). Automated patient history-taking. Hospitals, 45, 56-59.

Shapiro, S., Skinner, E.A., Kessler, L.G., Von Korff, M, German, P.S., Tischler, G.L., Leaf, P.J., Benham, L., Cottler, L.& Regier, D.A. (1984). Utilization of health and mental health services: Three Epidemiological Catchment Area sites. Archives of General Psychiatry, 41, 971-978.

Slack, W.V. (1971). Computer based interviewing system dealing with nonverbal behavior as well as keyboard responses. Science, 171, 84-87

Snell, W.E., Miller, R.S., Belk, S.S., & García-Falconi, R. (1989). Men's and women's emotional disclosures: The impact of disclosure recipient, culture, and the masculine role. Sex-Roles, 21, 467-486.

Bilingual Computerized Screening 46

Starkweather, J.A., & Muñoz, R.F. (1989, May). Identification of clinical depression among foreign speakers. Paper presented at the meeting of the American Association for Medical Systems and Informatics, San Francisco, CA.

Stassen, H.H., Bomben, G., & Günther, E. (1991). Spesch characteristics in depression. Psychopathology, 24, 88-105.

Steer, R.A., Rissmiller, D.J., Ranier, W.F., & Beck, A.T. (1994). Use of the computer administered Beck Depression Inventory and Helplessness scale with psychiatric inpatients. Computers in Human Behavior, 10, 223-229.

Stoudemire, A., Frank, R., Kamlet, M., & Hedemark, N. (1987). Depression. In R.W. Amler & H.B. Dull (Eds.), Closing the gap: The burden of unnecessary illness (pp. 65-72). New York: Oxford.

Stout, R.L. (1981). New approaches to the design of computerized interviewing and testing systems. Behavior ResearchMethods, Instruments, & Computers, 13, 436-442.

Tannen, D. (1990). You just don't understand: Women and men in conversation. New York: William Morow.

Tavris, C. (1992). The mismeasure of women. New York: Simon & Schuster.

Vanger, P., Summerfield, A.B., Rosen, B.K., & Watson, J.P. (1992). Effects of communication on speech behavior of depressives. Comprehensive Psychiatry, 33, 39-41.

Weissman, M.M., Sholomskas, D., Pottenger, M., Prusoff, B.A., & Locke, B.Z. (1977). Assessing depressive symptoms in

Bilingual Computerized Screening 47

five psychiatric populations: A validation study. American Journal of Epidemiology, 106, 203-214.

Wilson, F.R., Genco, K.T., & Yager, G.G. (1985). Assessingthe equivalence of paper-and-pencil vs. computerized tests: Demonstration of a promising technology. Computers in Human Behavior, 1, 265-275.

Bilingual Computerized Screening 48

Table 1Demographic characteristics of the language groups

M SD Age (years)English 45.06 14.40Spanish 31.17 9.21

Education (years)English 13.38 2.08Spanish 9.43 4.13

Annual total family income ($)English 24,326.61 25,339.50Spanish 10,435.16 4,504.45

Computer experience rating (1 to 5)English 1.63 1.70Spanish 1.17 1.47

Bilingual Computerized Screening 49

Table 2Psychometric properties of the CES-D methods by language Variables Computer-telephone Face-to-face First ordered CES-D total score (M [SD])English 23.13 [11.85] 21.56 [13.16]Spanish 31.70 [11.15] 16.76 [10.56]

Internal consistency reliability ()English .89 .90Spanish .85 .86

Immediate test-retest reliability (r)English .87* .99**Spanish .99** .98**

Inter-correlation with BDI (r)English .92** .88**Spanish .80* .88**

Correlation with Acculturation scoresEnglish NE NESpanish -.28 -.03

* p < .005, ** p < .001, NE - Not Evaluated

Bilingual Computerized Screening 50

Table 3Descriptive statistics of the first ordered CES-D methods by language Variables M SD Computer-telephone acceptance ratingsEnglish 7.50 2.67Spanish 7.30 1.70

Face-to-face acceptance ratingsEnglish 8.13 1.93Spanish 7.30 2.01

Computer-telephone CES-D total time (seconds)English 425.13 85.81Spanish 508.70 107.39

Face-to-face CES-D time (seconds) English 286.56 106.64Spanish 300.15 85.60

Computer-telephone CES-D items-only time (seconds) English 220.31 84.32Spanish 290.40 79.21

Face-to-face CES-D item response latencies (seconds) English 2.16 .69Spanish 2.34 .75

Computer-telephone CES-D item response latencies (seconds) English 2.59 .94Spanish 3.03 1.30

Bilingual Computerized Screening 51