Coding of Speech and Wideband Audio - The Telecom Archive

17
CODING OF SPEECH AND WIDEBAND AUDIO _______ Nikil S. Jayant, Victor B. Lawrence, and Dimitrios P. Prezas -. _ Nikil S. Jayant is head of the Signal Process- ing Research Depart- ment of AT&T Bell Laboratories, Murray Hill, New Jersey. Victor B. Lawrence is head of the Data Communica- tions Research Depart- ment of AT & T Bell Lab- oratories, Middletown, New Jersey. The late Dlmitrios P. Prezas was a supervisor in the Advanced Services Technology Department of AT&T Bell Labora- tories (Indian Hill Park), Naperville, Illinois. Mr. Jayant is responsi- ble for research in sig- nal processing, includ- ing coding and com- munication of speech, image, and wideband audio. Hejoined the company in 1968 and has a Ph.D. in electri- cal communications engineering from the Indian Institute of Sci- ence (Bangalore, India). Mr. Lawrence is responsible for explora- tory development of data communications equipment and ser- vices. Hejoined the company in 1974 and (continued on page 41) Advances in coding algorithms and digital signal pro- cessing have ledto sophisticated technologies for speech communication for a variety of applications, as well as to greater flexibilities in the design of ISDN termi- nals for integrated communication of speech, images, and data. For traditional telephony with a signal band- width of 3.2 kHz, the transmission rate for network- quality speech is now down to 16 kb/s. Robust communications-quality speech appropriate for cellular radio has been realized at 8 kb/s. Research attention is shifting toward 4 kb/s, focused on improving speaker identification and the naturalness of coded speech. For wideband audio with a signal bandwidth of7 kHz, high- quality coding is now possible at 32 kb/s, which implies stereo teleconferencing or dual-language programming over a 64-kb/s channeL Transparent coding of 20-kHz audio has been demonstrated at 128 kb/s, with near- transparent performance at rates as low as 64 kb/s for some classes of signals. Introduction This paper is a review ofthe technology for digital speech cod- ing. First,we discusstraditional telephone speech with a bandwidth of about3.2kHz (kilohertz). Then, wetum our attention to higher grade wideband speechwitha bandwidth of7 kHz, and we briefly discuss wideband audio witha bandwidth of20 kHz. The bit rate in the digital representation of speech could vary from 2 to 128 kb/s (kilobits per second),depending on the application and on user expectations of signal quality. To describethe perform- ance of a digital codingsystem, we use several parameters, such as: - Processing delay - Tolerance oftransmission errors and multiple stages of coding and decoding AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990 25

Transcript of Coding of Speech and Wideband Audio - The Telecom Archive

CODING OF SPEECH ANDWIDEBAND AUDIO

_______ Nikil S. Jayant, Victor B. Lawrence, and Dimitrios P. Prezas -. _

Nikil S. Jayant is headof the Signal Process­ing Research Depart­ment of AT&T BellLaboratories, MurrayHill, New Jersey. VictorB. Lawrence is head ofthe Data Communica­tions Research Depart­ment of AT&T Bell Lab­oratories, Middletown,New Jersey. The lateDlmitrios P. Prezaswas a supervisor in theAdvanced ServicesTechnology Departmentof AT&T Bell Labora­tories (Indian Hill Park),Naperville, Illinois.Mr. Jayant is responsi­ble for research in sig­nal processing, includ­ing coding and com­munication of speech,image, and widebandaudio. He joined thecompany in 1968 andhas a Ph.D. in electri­cal communicationsengineering from theIndian Institute of Sci­ence (Bangalore,India). Mr. Lawrence isresponsible for explora­tory development ofdata communicationsequipment and ser­vices. He joined thecompany in 1974 and(continued on page 41)

Advances incoding algorithms and digital signal pro­cessing have ledto sophisticated technologies forspeech communication for a variety ofapplications, aswell as togreaterflexibilities in the design ofISDN termi­nals for integrated communication ofspeech, images,and data. Fortraditional telephony with a signal band­width of3.2 kHz, the transmission ratefor network­quality speech is now down to 16 kb/s. Robustcommunications-quality speech appropriate for cellularradio has beenrealized at8kb/s. Research attention isshifting toward 4kb/s, focused onimproving speakeridentification and the naturalness ofcoded speech. Forwideband audio with a signal bandwidth of7kHz, high­quality coding is now possible at 32 kb/s, which impliesstereo teleconferencing or dual-language programmingover a 64-kb/s channeL Transparent coding of20-kHzaudio has beendemonstrated at 128 kb/s, with near­transparent performance at ratesas low as 64 kb/s forsome classes ofsignals.Introduction

This paper is a review ofthe technology for digital speechcod­ing. First,we discusstraditional telephone speechwith a bandwidth ofabout3.2kHz (kilohertz). Then, wetum our attention to higher gradewideband speechwitha bandwidth of7 kHz, and webriefly discusswideband audio witha bandwidth of20kHz.

The bit rate in the digital representation ofspeechcould varyfrom2 to 128 kb/s (kilobits per second),depending on the applicationand on user expectations ofsignalquality. To describethe perform­ance ofa digital codingsystem, we use several parameters, such as:- Processing delay- Tolerance oftransmission errors and multiple stages ofcoding and

decoding

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

25

Panel1. Acronyms and Terms

26

ADPCMAMAPCAPLATMAUDIXCADNccrrr

CDCELPcentrex

codecCPECTIA

DAM

DCMEDDSDRT

DSIDSPFMFXG.711G.721

adaptive-differential pulse-code modulationamplitude modulationadaptive-predictive coderanalog private lineautomatic-teller machineaudio-information exchangecellular accessdigital networkInternational Telegraph andTelephone

Consultative Committeecompact diskcode-excited linearpredictioncentralexchange; a service provided by

the local telephone company that per­mitsanytelephone extension within acompany to callanotherextensionwithin the company or dial directly to anoutside line

coder-decodercustomer-premises equipmentCellular Technology IndustryAssociation

(North America)diagnostic acceptability measure; reflects

acceptability ofspeechcommunicationina multidimensional sense

digital circuit-multiplication equipmentdigital dataservicediagnostic rhymetest; a measureofword

intelligibilitydigital speechinterpolationdigital signal processorfrequency modulationforeign exchangeccrrrstandardfor PCM at 64kb/sccrrrstandardforADPCM at 32kb/s

G.722

G.723

G.727

G.764

GSM

HDTVlACSINMARSATISDNISO

LPCLD-CELPMFLOPmodemMOS

MSATMSENSAPBXPCMPSTNSELPSNRSTUTASIvocoder

ccrrr standard for 7-kHz audio; a 64-kb/salgorithm for ISDN teleconferencing andloudspeaker telephony

ccrrr standardforADPCM at 24, 32, and40kb/s

draftccrrr standardforADPCM at 16, 24,32, and40kb/s

ccrrr standardforpacketspeechtransmis­sion

Group Speciale Mobile (Europe);standards organization fordigital cellu­lar radio

high-definition televisionintegrated accessandcross-eonnect systeminternational maritime satelliteIntegrated Services Digital NetworkInternational Organization forStandardiza-

tionlinear-predictive codinglow-delay CELP106 floating-point arithmetic operationsmodulator-demodulatormeanopinion score; usedforevaluating the

performance ofcoding algorithmsmobile satellitemeansquarederrorNational Security Agency (U.S.)private-branch exchangepulse-eode modulationpublic-switched telephone networksum-excited linearpredictionsignal-to-noise ratiosecure telephone unit (STU-II or STU-III)time-adaptive speechinterpolationvoice coder

- Ability to handlenonvoice signals, such as voicebandmodem waveforms.

However, the most important descriptors ofcoderper­formance are the quality ofthe digitized speechat atarget bit rate and the way the quality diminishes withdecreasing bit rate.

Measuring Speech Quality. The measurement ofspeechquality has been a difficult and long-standingproblem. In this paper, weuse a subjective ratingscale of1to 5,wherever possible, to quantify the level ofdigitalspeechquality. This is the so-called mean opinion score,or MOS scale.l- that is widely used forevaluating coding

AT&TTECHNICALJOURNAL. SEPTEMBER/OCTOBER 1990

algorithms fordigital telephony. (Panel 1defines acro­nyms and terms.)

Ascoreof4.0 on the MOS scalewill signify highquality, or near-transparent coding. Network quality willimply highquality as a necessary condition, but nottheonly one. It alsoimplies that the speechcoderprovidesfurthercapabilities demanded bythe telecommunica­tionsnetwork environment.

An MOS of3.5 will denote communications qual­ity. Atthis level, speechdegradation is easily detectable,but notbad enough to impede natural communication.

Finally, synthetic quality will imply a signal that is

Digital-coding _ CCITT CCITT CCITT GSM CTIA NSA NSAstandard 1972 1984 1991 1988 1989 1989 1982

Applications_ Network Mobile radio, Secure voiceVoice mail

Quality (MOS)- 4.0-4.5 3.5-4.0 2.5-3.5

MOS scale I I I I I5 Excellent4 Good 64 32 16 8 4.8 2.43 Fair Bit rate (kb/s)2 Poor1 Unacceptable

Figure 1. Digital telephony standards, typical applications,and ranges of speech quality (which is expressed using thefive-point MOS scale). The frequency range oftelephonespeech is 200 to 3400 Hz; hence, the speech bandwidth is3.2 kHz. An MOS score of 4.0 signifies high quality, or near­transparent coding. The coding standards define dlgltal­coding algorithms at the particular bit rate; the 16-kb/sstandard is likely to be a hybrid coding algorithm.

characterized by an inadequate level ofnaturalness andspeakerrecognizability, although it mayhave high intelli­gibility. These deficiencies are usually reflected byanMas that does not exceed3.0.

Wewill alsouse the five-point Mas scaleinourdiscussion ofcoding algorithms for 7-kHz speech.

Mas measurements ofspeechquality aresupplemented, especially in low-bit-rate speechtechnol­ogy, by scores ofDRT (diagnostic rhymetest) and DAM(diagnostic acceptability measure). The DRT is a word­intelligibility measure, while the DAM reflects acceptabil­ity for speechcommunication in a broadermultidimen­sional sense.3,4

Digital Coding of Telephone SpeechFigure1describes the current stateoftelephone

speechcoding in terms ofstandards activity, bit rate, typ­ical application, and quality ofdecoded speech. Weassumethat the frequency rangeofthe input signal is200 to 3400 Hz (hertz), and that the quality ofthe outputspeechis measured on the five-point Mas scale.

The currentgoalsin speechcoding include theachievement ofnear-transparent or transparent qualityat 8 kb/s, and robust telecommunications quality at4.8 kb/s and lower. (By robust, we mean the perform­anceis not degraded drastically acrossvarious speechsignals, various speakers, andvarious transmissionenvironments.)

Figure 2presents a more quantitative descriptionofspeechquality as a function ofbit rate.The historicalprogression is from right to left. In the figure, the charac­teristics depicted as solid curves refer togeneric examplesofcoding algorithms. The dashedcurve describes aresearchgoalthat is believed to be achievable in that itdoesnotviolate fundamental limits incoding capability.

Pulse-code modulation (peM) is the simplest

AT&T TECHNICAL JOURNAL.SEPTEMBER/OCTOBER 1990

27

64

G.711

328 16Bit rate (kb/s)

4

G.711, 64-kb/s PCMG.721, 32-kb/s ADPCMG.7xy, 16-kb/s coder

• Vocoder

Fair 3

Bad 1 L..-.__-L-__---..I -L-__----L__----J

2

Poor 2

Excellent 5

OJ Good 4rooeneno~

~ro::l0­s:()

<Il<IlC.en

Figure 2. Quality of telephone-bandwidth speech(using the MOS scale) as a function of transmis­sion rate. The signal bandwidth Is 3.2 kHz.G.711 and G.721 are existing CCITT dlgital­coding standards, while G.7xy is the pendingCCITT standard. The research goal fits withinthe constraints of the fundamental limits In cod­Ing capability.

28

coding system, a memoryless quantizer.The waveform codercurveis that ofa high­

complexity algorithm, such as adaptive predictive cod­ing. Waveform coding uses redundancy-removing opera­tionsto presenta signal oflower energyto the amplitudequantizer, which results ina lower bit rate fora specifiedlevel ofoutput-speech quality.

The vocoder pointrepresentsan algorithm thatproduces intelligible but synthetic-sounding speechatverylow transmission rates by usinga highly compactexcitation-modulation model (Figure 3a). The syntheticquality in this systemis accepted inapplications wheredigital encryption and low transmission rate are ofpara­mountimportance. These could include commercialapplications, such as banking, but the principal custo­mers,by far, are government and defense agencies.

In Figure2, the hybrid codercurve describesthe performance ofa classofalgorithms that combinethe high-quality potential ofwaveform coding with thecompression efficiency ofa model-based vocoder. Here,the ideais to use a time-varying excitation model that is

muchmoresophisticated than that ofa traditionalvocoder.t-" This model uses waveform-coding principlesto compute an excitation that minimizes distortion forevery frame [say, 16ms (milliseconds)1ofinput speech.(See Figures3b and 3c.) Hybrid codersfor4.8, 8, and16kb/s are discussed later.

The solid dots in Figure 2 referto coding algo­rithmsthat provide highquality at 64,32, and 16kb/s.Both PCM at 64kb/s andadaptive-differential PCM(ADPCM) at 32kb/s are ccrrrstandards, as defined inFigure 1,and are called G.711 and G.721, respectively.(ccrrr is the International Telegraph andTelephoneConsultative Committee.) These algorithms providenetwork-quality coding.

Currently, the CCnT is considering the definitionofa high-quality speechstandard at 16kb/s. The tech­nique is likely to be a hybrid coding algorithm.

Figure 2 shows that our currentunderstandingofcoding has notyielded high-quality speechat bit ratesbelow about8 kb/s-in particular, in the importantneighborhood of4 kb/s.

AT&T TECHNICAL JOliRNAL.SEPTEMBER/OCTOBER 1990

29

Figure 3. Models ofspeech excitation.(a) LPC vocoder andhybrid coders, whoseperformance is givenin Figure 2. (b) Multi­pulse LPC. This coderuses a more sophisti­cated time-varyingexcitation model thana traditional vocoder.The excitation com­puted minimizes dis­tortion for everyframe (e.g., 16 ms) ofinput speech.(c) Codebook-excitedLPC. This coderselects the best exci­tation vector from acodebook of possiblevectors.

SpeechSpectralenvelope

Short-delaycorrelation

filter

Short-delaycorrelation

filter

Short-delaycorrelation

filter

Finestructure

Voiced orunvoiced

switch

Long-delaycorrelation

filter

Long-delaycorrelation

filter

Excitation

(b)

(c)

(a)

I" I ,I I I I, I , II I ~ I I

Digital Telephony at 4 kb/sHigh-quality coding at 4 kb/s is a primaryfocus

in speech research.5-II Speech coding at bit rates of4kb/s is important for:- Enhancing secure telephony in government and mili­

tary applications. 12 (See the next section.)- Providing the central capability offuture band-efficient

systems for digitalradio;for example, cellularchan­nels with a user bandwidthof 5 kHz. These arewireless-access channels for movingvehicles in rural,suburban, or urban environments that are served byterrestrial base stations.

- Mobile-satellite (MSAT) communication applicationsfor providing wireless access to movingvehicles inremote areas.

- INMARSAT (International MaritimeSatellite) applica­tions with a 6.4-kb/s target for the total transmissionrate (i.e., the speech-coderbit rate plus overhead for

channel-errorprotection).- Storage ofcoded speech. When coded at 4 kb/s, one

hour of spoken materialcouldbe stored on a single16-Mb (megabit) memorychip.

Table I summarizes DRT, DAM, and MOS perform­ances ofvarious speech coders that range from 64to2.4kb/s. I ,13,14 The numbers in this table are not theresult of a single well-controlled experiment, but areaccumulated fromvarious independent sources. As such,the scores in the table should be used for a rough,overall indication ofperformancerather than for veryfine comparisonsand judgments.

The code-excited linear prediction (CELP)algorithms in Table I typify the hybrid coders depictedin Figure 2. Recentadvancesin CELP coding have pro­duced significant improvements relative to 2.4-kb/svocoding. This is reflected in a standardized,4.8-kb/salgorithm (Figure 1) with high levelsof DRT and DAM.

AT&T TECHNICALJOURNAL.SEI'TEMBER/OCTOBER 1990

30

Table I. DRT, DAM and MOS Scores for Standard Speech Coders

Score

Coder DIU DAM MOS

64·kb/s PCM 95 73 4.332-kb/sADPCM 94 68 4.116-kb/sLD-CELP 94* 70* 4.08-kb/s CELP 93+ 68* 3.94.8-kb/sCELP 931 64 3.2t

2.4-kbps LPC (vocoder) 90 53 2.5*

* estimatest upperbound1 lowerboundNOTE: See Panel1for definitions for acronyms.

However, the MaS quality of4.8-kb/s speech, as alreadynoted in Figure 2, falls well below the high-quality levelof4.0, whichsuggests further research is needed at thisimportant bit rate.

Secure Voic_A Case Study. United Statesgovern­ment agencieshave deployed secure telecommunicationsover the public-switched telephonenetworkfor severalyears.As modemtechnology advanced, highlyeffectivedigital encryption techniques became available for suchapplications. Low-bit-rate voice codingwas required totake advantage ofthese sophisticated methods.

In the early 1980s, the DepartmentofDefenseintroducedthe Government StandardLPC voice-codingalgorithm at 2.4 kb/s (see Figure 1).15 (LPC is linearpredictive coding.) This vocoderfeatured a simplifiedsource-excitation modelthat provided fair speech intelli­gibility. However, the vocoderand its speech lackednaturalnessand robustness, exhibitedinconsistent per­formance across the speaker population, and allowed lit­tle, if any,speaker recognition. (Thatis, the listener usu­ally couldnot identify the person whowas speaking.)

This vocodertechnology was incorporated in alimited number ofbulkyand expensive secure-telephoneunits (called the STIJ-u). These units alsofeatured a9.6-kb/s adaptive-predictive coder (APC) 16 that was

AT&T TECHNICALJOURNAL. SEPTEMBER/OCTOBER 1990

capable ofproviding near-communications quality. How­ever,network coverage at 9.6kb/s was inadequate, andthe performance ofthe waveform-type coder dramaticallydeterioratedwhen it wasoperatedat and below 4.8kb/s.Therefore, further research wasdirected towardimprovements at the 2.4-kb/s rate.

This effortresulted in FederalStandard1015,17an enhancedversionofthe old Government StandardLPC. (The2.4-kb/s LPC systeminTable I is the enhancedversion.) Primarychanges occurred in the voicing andpitch-detection areas," withsecondarymodifications inexcitation format and spectralshaping. 19 These changesyielded a first-order effect on coder robustness,and mar­ginalimprovements in intelligibility, naturalness, andspeaker recognition were attained. The result wasasizable net increase in the DAM fromabout 50to about55, whichplaced the coder at the upper end ofthesynthetic-quality range.

This enhanced performance wasthe motivatingfactorfor the introduction ofthe STU-III programin themid-1980s. The program,whichwasdriven by theNational Security Agency (NSA) , provided governmentsupportfor the development ofthe next-generationsecure-telephone units (called the STIJ-III) and fosteredpurchase ofthe units byvariousgovernmentagencies.Compact and cost-effective compared to their predeces­sors, the STU-IIIs featured the new2.4-kb/sstandard.AT&T's participation in this programresulted in thedevelopment ofthe Security-Plus terminal, which hasbeen in production for the last three years.

The introduction of CELP coders" in the mid­1980s made communications quality feasible at 4.8kb/s.Atthe same time,newmodemtechnology permittedwide networkcoverage at this bit rate. Overall coderrobustness, naturalness, and speaker recognition farexceeded those ofa 2.4-kb/ssystem. Afterits introduc­tionby the Acoustics Research Departmentat AT&TBellLaboratories (a division ofAT&T), CELP codingattracted the attention ofnumerouspotential users,including the NSA. AT&T's ability to develop organiza-

tional synergies, within and outside the corporation:- Resulted in swift transfer of technology fromresearch

to development- Effectively identified the customer's needs- Hada timely impacton the NSA's decisionsabout stan-

dardization.'Continued workby AT&T BellLaboratories

focused on improving the computational and perform­ance profile of the new CELP coder.This facilitated thecoder's rapid implementation on the digital signalproces­sors (DSPS) available at the time. Fast techniques thatexpedite searching through codebooks? brought abouta tenfold reductionin computational load. Constrainedexcitation11 and fractional pitch-delay trackingtech­niques-? contributedto a net DAM gain ofabout 10 unitsover FederalStandard1015. Algorithms that addresssourceand channel noise7,2o increased CELP'S robust­ness for use in real-world applications.

During 1987, the NSA launcheda newstandardi­zation efforttoward the 4.8-kb/s rate. In early 1988,AT&T BellLaboratories demonstratedthe feasibility ofthe 4.8-kb/s CELP coder, using laboratory prototypehardware that reflected the computational capacity ofthevoice sectionofthe Security-Plus terminal. In mid-1988,AT&T demonstrated,at the NSA, secure-call completionat 4.8kb/s using CELP-modified Security-Plus terminals.The 4.8-kb/s standardization process,whichhad beenaccelerating, peaked by mid-1989 when the NSA issuedthe first predraftofFederalStandard1016 and submittedit to the U.S. Office ofStandardsfor approval. 12 (The4.8-kb/scoder inTable I is this versionofthe standard.)

Toward the end of 1989, the NSA awardedcon­tracts to vendorsfor incorporating into security-terminalproduction a 4.8-kb/s CELP coder that wascompatiblewith FederalStandard 1016. Compatibility requirementspermitted shorter codebooksand allowed for optionalfeaturesto accommodate immediate implementationsthat used current DSP products.AT&T's earlyefforts andcontributions in the CELP-coding area havefacilitated theinclusion ofnewtechnology into a preexisting product.

Channel quality

Figure 4. Speech quality versus channel quality in cellulartelephony, based on semiquantitative estimates. In the firstNorth American digital-cellular standard (i.e., the pendingeTIA standard), speech coding is at 8 kb/s.

AT&T's newversionofthe Security-Plus terminal, thefirst ofsuch products to feature a 4.8-kb/s CELP coder,has been in production sinceAugust1990.

Digital Telephony at 8 kb/sSystems for first-generation digital cellular radio

use bit rates ofabout 8 kb/s for speech coding (Fig-ure 1). In NorthAmerica, the proposed systemhas aper-userchannelbandwidth of 10kHzand a total trans­mission rate ofabout 13kb/s for speech codingandchannel-error protection. 7,21,22 The systemwill eventu­ally replacethe current practice ofanalogFM speech thathas a 3D-kHz user bandwidth. The digital system pro­videsgreater robustness to channelnoiseand fading, aswell as better reuse of individual carrier frequencies. Asa result, the improvement in callcapacity (numberof

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

31

32

users) will exceed the factor of3 implied by the change­overfrom 30kHzto 10kHz in user bandwidth. The netgain is expectedto be a factor of5 to 7.

In NorthAmerica, cellular telephony is expectedto be based on a cELP-coding algorithm, i.e., the pendingCIlA standard. (CfIA is the Cellular Technology IndustryAssociation.) Speechquality at 8 kb/s currentlyfallsslightly below the high-quality threshold (MOS = 4.0).However, the communications quality that the 8-kb/sCELP coder provides is adequatefor improvements overanalog FM telephony, especially at lowlevels ofradio­channelquality [say, a channelsignal-to-noise ratio (SNR)below 15to 18dB (decibels)]. This is depicted in theimpressionistic curvesofFigure4.The communicationdelaycaused by the speech codec (coder-decoder) isexpectedto be about40ms, a value considered accept­able in the cellular application.

For digital cellular radio in Europe, the recom­mendation ofthe GSM (Group Speciale Mobile) is alsoahybridcoder. It is a regular pulse-excitation algorithmwith a bit rate of 13.2 kb/s (outofa total transmissionrate of22.8 kb/s) and a codec delayof40ms.23.24 Thecodingtechnique is similar to a 9.6-kb/smultipulseexcitation coder for the Skyphone" airline application.(Skyphone is a registered service mark ofBritishTele­communications, PLC.)

Network-Quality Speech CodingFor ubiquitous application in networks, a

speech-coding algorithm has to satisfy several perform­ancecriteria, including:- Alevel ofspeech quality that is high enough to with­

stand multiple stages ofcoding and decoding- Aprocessingdelay that is lowenough to withstand

echoes and additional delaycomponents in the net­work

- The ability to handle nonspeech signalsin the tele­phoneband.

PCM and Variable Bit-Rate ADPCM. Algorithms at64kb/s (G.711, PCM) and 32kb/s (G.721, ADPCM) satisfya broad class ofnetwork requirementsand are inter-

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

national CCITf standards.25- 28These standardsare inwidespread use in both public and private speechtele­communications.

The 32-kb/s standard, G.72l, is relatively recent(i.e., 1984). An important application ofthis codec is indigital circuit-multiplication equipment (DCME). Here,the combination of2:1 compression (from 64kb/s to32kb/s) withexploitation ofthe so-called 2.5:1 TASI orDSI gain (i.e., the effect ofsilencesin speech) provideseffective circuitexpansions of5:1 overtraditional tele­phone systems. (TASI is time-adaptive speech interpola­tion. The silences in naturalspeechoccurwhenthe talkerpauses to breathe or collect his or her thoughts or stopsspeakingand waits for the other personto beginspeak­ing, andwhen he or she is listening to the otherspeaker.)

The 32-kb/salgorithm has been extended to24and 40kb/s in the G.723 standard. Also, embeddedADPCM 29 is a draft CCITf standard, G.727, at the 40, 32,24, and 16-kb/srates. G.727 can be used withG.764 forwideband packetnetwork applications, such as inAT&T'sintegrated access andcross-connect system (IACS).30

Lower transmission rates such as 24and16kb/s, if based on ADPCM, do not provide network­quality coding. However, they permitoccasional burstsofheavy telephone traffic to be accommodated, withoutexplicit coordination amongallnodes in the pathofthecall. Adaptive algorithms can be usedfor postfiltering'"at the receiverto enhance the lower speech quality ofthe24- and 16-kb/ssystems. However, the use ofpostfilter­ingadversely affects the performance ofa systemthathas multiple stages ofencoding and decoding.

The higher ADPCM rate of40kb/s (from theG.723 standard) provides the capability for transmitting9.6-kb/smodem waveforms. The simplicity ofthe G.721algorithm and relatedalgorithms alsomakesthem attrac­tive forwireless-access applications that requireverylowtransmitterpower; for example, in terminals that arewithin or near a building that has indoorwireless com­munication.32,33

The 32-kb/sADPCM algorithm ofthe G.721 stan­dard is also robust to (i.e., it can tolerate) multiple stages

Input speech

~

Vectorquantizer

index

Side information

MinimumMSE

Perceptualweighting

Buffer andanalysisr--

II

LPCsynthesis

filter

Pitchsynthesis

filter

~ ~ J

Excitationvector

quantizercodebook

/,------- 20 ms-----....."

(a)' .....----- 20 ms ------'/ --Current speech vector

to be coded

33

Vectorquantizer

indexMinimumMSE

Perceptualweighting

Input speech

Predictoradaptation

LPCsynthesis

filter

Gainadaptation

III

I IL ~

Excitationvector

quantizercodebook

(b)

ofencoding and decoding, and more robust than 64-kb/sPCM to digital errors in transmission. Ata bit-error rate of1in 1000, the degradation in speech quality for32-kb/sADPCM is graceful. Ata bit-error rate of1 in 100, speechintelligibility is good, although the quality is poor.

Low-Delay Speech Coding at 16 kb/s. Currently,the ccm is considering the definition ofa low-delaynetwork-quality speech standardat 16kb/s (Figure 1).Possible applications include DCME, ISDN transmission,packetized speech,cordlesstelephones, and speechforvideophone service. (ISDN is the Integrated ServicesDigital Network.)

Figure5b is the block diagram ofa backward­adaptive CELP coder proposed byAT&T for this CCITfstandard.34•35 In this system, the only sourceofencoding

Figure 5. Code-excited linear prediction. The waveform's2o-ms segments are used to perform speech analysis toprovide LPC filter coefficients. (a) Conventional or fullyforward-adaptive CELP; uses the 2o-ms segment on theright in the waveform. (b) Backward-adaptive system forlow-delay CELP, the proposed CCITT standard at 16 kb/s;uses the left 2o-ms segment of the waveform. In this sys­tem, the forward adaptation of the shape of the excitationsignal is the only source of encoding delay.

delay is the forward adaptation ofthe shapeoftheexcitation signal. This delay comesfrom selecting thebest excitation vectorfrom a rich codebook ofpossibleexcitation vectors, each a sample vectoroflength5(0.625 ms). Figure5ais the more traditional, fully

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

15,000I I20,000

FM radio

Compact disk

AM radio ~J

Telephone II

200 3400 7000Audio bandwidth (Hz)

CCITT 1987 (64 kb/s)

CCITT 1972 (64 kb/s) }CCITT 1984 (32 kb/s)CCITT 1991 (16 kb/s) ,L--II----II_---L ..I....-_~_-'-_ __L_.....

102050

Digital-codingstandard

ISO 1990(128 or 96 kh/s)

Figure 6. Four grades of aUdio-signal bandwidthand corresponding standards for digital coding.Audio is stored on today's compact disk at a bitrate of about 700 kb/s per sound channel, butthe emerging ISO standard calls for a single­channel bit rate of 96 or 128 kb/s. The CCITT's1991 standard will define low-delay, network­quality speech.

34

forward-adaptive eELPsystem.An important challenge for the proposed algo­

rithm (Figure 5b) is to combine high quality with lowprocessing delay. The delay requirement meansthat thetime-varying model for the speech-spectral envelope hasto be estimated in a backward-adaptive mode, usingahistory ofalready quantized speech. Forforward spectralestimation, tens ofmilliseconds ofinputspeechare buf­fered. 5,6,22 In the backward-adaptive mode, the challengethen is to realize adequate spectral estimation eventhough quantization noise is present in the past speechsamples used forbackward spectral analysis.

The algorithm is complex, with the codebooksearchas the single, mostdemanding component. A25-MFWP processorwith advanced memory capabilities(Le., the AT&T WE® DSP32e processor) is available,which permitsa single-chip (half-duplex) implementationofthe coder. Currently, a full-duplex coderrequiresatwo-chip implementation, but prospects for a single-chipimplementation withnearly equalspeechquality aregood. (MFWP standsfor 106 floating-point arithmeticoperations. In full duplex transmissions, datais transmit­ted and received simultaneously. With halfduplex, datacanbe transmitted and received, but only in one direc­tionat a time.)

Also ofinterestare the possibility of:- Integer-point processing, usingthe AT&T WE DSP16A

processor.- Extending the low-delay property to lower bit rates,

such as 8kb/s, while maintaining the communicationsquality offered bytraditional high-delay codersatthose bit rates. (Digital telephony at 8 kb/s wasdis­cussedearlier.)

Applications of 16- and 8-kb/s Coders in CPE. Theneedfordigitized, low-bit-rate voice products incustomer-premises equipment (ePE) is expected toincrease, as will the use ofdigital transmission facilitiesfor integrated voice anddataservices. The use oflow-bit­ratevoice will alsogrowbecause ofthe demand forstore-and-forward voice mail andforvoice-security appli­cations. Speech codersat 16and8 kb/s are prime candi­datesforePEapplications.

Intelligent 11 multiplexers. Today, several vendorsare offering intelligent T1 or fractional-T1 multiplexersfor large, corporate, T-carrier trunk-based networks.These networks are complete telecommunications sys­tems that carrybothvoice and datatraffic. For these net­works, the economies ofscaleandthe dynamic realloca­tionofbandwidth offer potential costsavings.

In many ofthese applications, users select64, 32,

AT&TTECHNICALJOURNAL. SEPTEMBERJocroBER 1990

24, or 16kb/s as the bit rate ofthe voice circuits, accord­ing tocost-performance tradeoffs.

Whenthe voice signal is compressed to 16kb/s,theT1voice-channel capacity (which originally was24channels) is increased to 96channels, andextra band­width is available for dataand image applications. Moresophisticated T1 multiplexers double the voice-channelcapacity by usingdigital speechinterpolation (DSI) ,which removes the silentpauses in speech. WhenDSI isused with 16-kb/scompressed speech, T1 multiplexerscan offer 192 or morevoice circuits overaT1link, withminimal voice degradation. Soon, the use of8-kb/s cod­ing techniques will double the capacity ofthe T1links to384 voice circuits or more.

Mostcorporate, private, T-carrier trunk-basednetworks are PBX to PBX connections. Therefore, a voice­coding algorithm that performs well in asynchronoustandem applications is essential. In addition, the desira­bility ofavoiding echocancellation suggeststhe use ofalow-delay coding algorithm.

Compressed voice over APL and DDS circuits. Today,some ePE products multiplex voice and dataoverleased,digital-data-service (DDS) linesor analog, private lines(APLs) to smaller locations that cannotjustify the capacityorcostofT1circuits.

The DDS systemscanbe configured to provide56-kb/s service formultiple 8-kb/svoice channels and16-kb/s datachannels between PBX, centrex, or FX loca­tions. These multiplexers are especially neededin inter­national circuits. Because suchcircuits are expensive,usersnormally liketo multiplex as many voice connec­tions as possible ontoa single circuit [for example, fivevoice channels plusone datachannel; i.e., (5x 8 kb/s) +(1 x 16kb/s) = 56kb/s], For international applicationsthatuse satellite links, the delay and echocharacteristicsassociated with the linksmakea low-delay, high-quality,compressed-voice algorithm highly desirable.

Another application for low-bit-rate speechis forautomatic-teller machines (ATMS). In this application, ahigh-speed, 19.2-kb/s APL circuit connects eachATM to a

central site. Besides voice, both dataand still-frameimages canbe multiplexed onto a single 19.2-kb/s circuit

Store-and-forward voice mall. Otherapplications ofhigh-quality, low-bit-rate speechcoding at 16kb/s (andperhaps, in the future, at 8 kb/s) are the call-answer andstore-and-forward voice mail features offered in manyPBXs today.

Voice messages received at 64kb/s canbe com­pressedto lower bit rates forefficient storage. In addi­tion, customers canrecordand send messages to oneormorerecipients whoare connected to the network ofPBXs. Although there is adequate bandwidth to support64-kb/svoice transmission within the premises, the needforcompressed voice at 16kb/s or below arisesbecauseofthe limitations in storagerequirements.

AT&T's systemforAUDIX Voice Mail is typicalofthese store-and-forward services. (AUDIX standsforaudio-information exchange.)

Digital Coding of Wid.band Speech and AudioFigures1and 2 referred specifically to

telephone-band speech. In Figure 2,the achievement ofhigher quality at a given bit rate implied reduced speechdistortion, without anychangeofbandwidth. Butwhateffect doeschanging the signal bandwidth have onspeechquality andintelligibility?

Figure6 defines fourcommonly understoodgrades ofaudio bandwidth. If the audio signal is speechinstead ofmusic, the perceived gainsin quality are,perhaps, greatestwhenone progressesfrom the tele­phonelevel to the commentary, or AM-radio, level. Thegainsin quality are in terms ofincreased intelligibility,naturalness, andspeakerrecognition. Low-frequencyenhancement (i.e., 50to 200 Hz) contributes to increasednaturalness and speakerpresence, andhigh-frequencyenhancement (i.e., 3400 to 7000 Hz) provides greaterintelligibility andfricative differentiation (for example,sversusf).

In the rest ofthis section, wedescribe high­quality compression ofwideband audio and ISDN

AT&T TECHNICAL JOURNAL. SEITEMBER/OCTOBER 1990

35

Auxiliary data­channel input;

o 8 or 16 kh/sI 16 kb/s JAudio- I Higher subband

signal I TransmitADPCM encoder 64-kb/s

input I quadrature Data- outputMultiplexer f---- insertion

I mirror Ifilters device

I Lower subband 48 kb/s II ADPCM encoder II I

14 bits,16 kHz

I16 kb/s I

Audio- I Higher subband Isignal I ReceiveADPCM decoder Data- Ioutput

I quadrature extraction IDemulti- - deviceI mirror plexer

filters (determines Input

I Lower subband 48 kb/s mode)ADPCM decoder

I (3 variants)

A T ~II t UXI rary data-

channel output;

Mode indication 0, 8, or 16 kb/s

36

Figure 7. Block diagram of a two-band subband coder for64-kb/s coding of 7-kHz aUdlo,37 the basis for the G.722standard. The low- and high-frequency subbands are quan­tized using 6 and 2 bits per sample, respectively. Theanalysis and synthesis filters produce a communicationdelay of about 3 ms.

applications ofdigital audio. We alsodiscussa CCIn cod­ing standardfor 7-kHz audio and a 2o-kHz audio standardthat is beingdefined by the ISO (International Organiza­tionfor Standardization).

The naturalnessofwideband speech is a signifi­cant featurefor extended telecommunications processes,such as audioteleconferencing and programbroadcast­ing. Basic-rate ISDN provides a naturalframework for a64-kb/s algorithm to encodewideband audio for suchapplications. [Basic-rate ISDN provides two 64-kb/scircuit-switched channels (bearerchannelsfor the custo­mer's voice, data,or video) and one 16-kb/spacket­switched channel (datachannelfor the network's infor­mation).] The digital connectivity afforded by ISDN36hasprompted a worldwide revisiting ofaudio-transmissionquality. In particular, end-to-end digital connectivity hasmadepossible the inclusion oflowfrequencies down to

50Hzin the transmittedaudio band. 37Coding of 70kHz Audio. The CCIn standardfor

7-kHz audio (G.722) is a 64-kb/salgorithm developedprimarily for ISDN teleconferencing and loudspeakertelephony. Because ofthe 64-kb/scapability, a single''voice-grade'' channelon a digital or analog, public­switched telephone network (PSTN) can transportacommentary-quality sound programoveranydistanceandyield a broadcast-grade voice programat thereceiving end.

The G.722-eoding algorithm is based on atwo-band subbandcoder,withADPCM codingofeachsubband (Figure 7).37,38 The low- and high-frequencysubbandsare quantized using6 and 2 bits per sample,respectively. The filterbanks that are used for analysisand synthesisproducea communication delay ofabout3 ms.This delayturns out to be a desirable featurebecauseofthe expectedinterconnections ofG.722 withnarrowband links. For these interconnections, uncan­celedechoes could pose a problem, if compounded bycodecdelay. (Inisolation, digital wideband linksdo nothavetwo-wire/four-wire hybridsand the resultinguncanceled echoes.)

The 64-kb/s algorithm can toleraterandomerror

AT&T TECHNICAL JOURNAL- SEPTEMBER/OCTOBER 1990

~roo(/)

eno~

Excellent 5

Good 4

Fair 3

Poor 2

Research goalfor 7-kHz speech -0-------------- 1

o 240 kb/s PCM

• G.722, 64 kh/s SB-ADPCM

Figure 8. Quality of 7-kHz digital audio as a func­tion of bit rate In the G.722 algorlthm. 3 7 Signalbandwidth is 7 kHz. The G.722 points are for thetwo-band subband coder (Figure 7), which usesADPCM coding for each subband. The PCMpoints are supplied for comparison and repre­sent a 15-blt audio Input sampled at 16 kHz.Again, the research goal Is realistic.

Bad 1 L...-__....L- -'- ----'

48 56Bit rate (kb/s)

6437

ratesofabout 1 in 10,000 and fourtandemstages ofrepeated encoding and decoding. The simplicity ofthequantizing, predicting, and filtering (24-tap) algorithmspermits a single-chip, fixed-point implementation on theDSP16A processor.

ISDN applications suggest that the audio-codingalgorithms be operatedat slightly lowerbit rates. Here,the use ofan embedded coding techniqueforADPCM per­mits operation ofthe low-frequency subbandat one ofthree quantizing rates (i.e., 6,5,or 4 bits per sample),with graceful degradation ofquality. The correspondingaudio rates are 64, 56, and 48kb/s. For the 56- and48-kb/s rates, capacities of8 and 16kb/s are availablefor simultaneous data transmission overthe 64-kb/sbasic-rate channel.

Figure8 showsaudio quality on the MOS scaleforspeech and musicmaterial at rates of64, 56, and48 kb/s. For comparison, we alsoshowthe performanceof24D-kb/s linearPCM (i.e., a 15-bit audio inputthat was

sampled at 16kHz). In anothercomparison that involvesG.722, G.721, and 128-kb/s PCM (16 kHzx 8 bits persample), the G.722 algorithm at 64kb/s wasshown tohavean equivalent SNR gainof13dBoverthe G.721 algo­rithm.38,39 Ofthis SNR gain, 6 dBcanbe attributed toincreased inputbandwidth.

Figure8 alsoshows the current researchgoalfor the coding of7-kHz audio, a goalthat is believed tobe realistic. One implication ofthis goalis the possibilityofcoding 7-kHz audio at 32kb/s with high quality(MOS = 4.0). This will permitthe transmission oftwobilingual or stereo wideband channelsat 64kb/s. Forstereo, the use ofcross-ehannel correlations canprovidea further increaseofcapability. For example, a band­width greater than 7 kHzcould be accommodated inthe 64-kb/s system.

There are at least two approaches to the problemofhigh-quality coding ofaudio at 32kb/ s:- Linear-prediction approach, exemplified by CELP

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

60

16

Just-noticeabledistortion

128Frequency (kHz)

440 L....-__--L -'----__---''-----__---'-- .L..-__--L. .....

o

Figure 9. Threshold ofjust-noticeable distor­tion as a function offrequency for an illus­trative audio signal (atrumpet).40 Researchon perception willplay an increasingrole by enhancing ourunderstanding of howto mask noise, espe­cially In the timedomain.

38

- Frequency-domain techniqueoftransform or subbandcoding.

For both, the attainment ofhigh audioquality willdependon the use ofperceptual tuning ofthe algorithmto provide effective shapingofthe quantization noise.The CELP techniqueoffers the additional possibility oflow-delay codingthrough backward adaptation, as illus­trated in Figure 5b.

Coding of 2G-kHz Audio. Although a bandwidthof7 kHzprovides verynatural reproduction ofspeech,20kHzis a well-accepted bandwidth standardfor moregeneral classes ofaudio, including vocal and instrumen­tal music.

The ISO is committed to the standardization (inthe 1990 to 1991 timeframe) ofa low-bit-rate coding algo­rithmfor 2G-kHz audio. Applications for low-bit-rate wide­band speech include electronic publishing, travel andguidance, teleteaching, multilocation games,multimediamemoranda, and databasestorage.Anothermajor appli­cation for 2G-kHz digital audio is in advanced television

systems, such as high-definition television (HDTV).On current compact disks (CDs), audio is stored

at a bit rate ofabout700 kb/s per soundchannel (i.e.,16-bit PCM codingof44.1-kHz sampled signals). However,the emergingISO standardcallsfor a single-channel bitrate of96or 128 kb/s. Production ofhigh-quality audio atthese verylowrates callsfor a newgeneration ofcodingalgorithms. These algorithms will achieve coding gainsby removing signalredundancy. But these gains mustbeaugmented by the liberties permittedby the humanaudi­tory process,as predicted by sophisticated models ofjust-noticeable distortion (Figures 9 and 10).40.41

Future TrendsSophisticated algorithms forcoding will lead

to transmission techniquesthat do not permitquanti­zation noiseto limit speechquality. In addition, thenotion ofenhancingspeechquality by usinggreaterinputbandwidth will becomemorepervasive. Codingsystems in the 8-to 64-kb/s rangewill thus provide

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

Leftsignal

Rightsignal

Frequencyanalysis

Frequencyanalysis

AnalysisFilter bank

Quantization

Quantization

Noiselesscoding

Noiselesscoding

Bit-streamcontrol

Bit-streamcontrol

Channel

-- Optional (stereo coding)

39graceful flexibilities in terms ofselected bandwidth andspecial features, such as stereo separation in telecon­ferencing. Advances in coding will be supported by newtechnologies forwideband transducers, noise-cancelingsystems foraudio pickup, and autodirective microphonearrays. 42,43

Ascoding algorithms become increasinglyefficient and approach fundamental capabilities, researchonperception will play an increasing rolebyenhancingourunderstanding ofnoisemasking, especially in thetime domain.

Advances in signal-processor technology willcontinue to supportincreasingly complex algorithms forcoding and decoding. The synergistic working ofcodingtheory, perception science, and signal processing willbringsophisticated speechtechnology to the humanlistener in affordable forms.

AcknowledgmentWethank the following colleagues for reviewing

anearlierversion ofthis paper: B.S.Atal, K H.Branden-

Figure 10. Perceptual coding of wldeband audio; blockdiagram of a perceptual frequency-domain coder. 41 Thedashed lines identify an option for using left-right channelcorrelations to increase efficiency in stereo coding. ForhlglHluallty audio at low bit rates, the liberties permitted byhuman perception must augment the coding gains achievedfrom removing signal redundancy.

burg,].-H. Chen, R V. Cox,]. D. Johnston, W.B.Kleijn,D.]. Krasinski, P. Noll, M. H.Sherif, andY. Shoham.

References1. W. R Daumer, "Subjective Evaluation ofSeveral Efficient Speech

Coders:' IEEE Transactions onCommunications, Vol. 30,No.4,April 1982, pp.655-662.

2. N.S.Iayantand P. Noll, Digital Coding ofWaveforms: PrinciplesandApplications to Speech andVideo, Prentice Hall, EnglewoodCliffs, New Jersey,1984.

3. W.D.Voiers, "Diagnostic Evaluation ofSpeech Intelligibility:'Speech Intelligibility andSpeaker Recognition, M.E. Hawley (ed.),Dowden Hutchinson Ross, Stroudsburg, Pennsylvania, 1977.

4. W.D.Voiers, "Diagnostic Acceptability MeasureforSpeech Com­munication Systems," ICASSP '77, IEEE International Conference

AT&TTECHNICAL JOURNAL • SEPTEMBER/OcroBER 1990

40

onAcoustics, Speech, andSignal Processing, Hartford, Connecticut,May9 to 11,1977, IEEE,NewYork, May1977, pp. 204-207.

5. B.S.Ataland M.R Schroeder,"Stochastic CodingofSpeechatVery Low BitRates," Links forthe Future: Science, Systems andSer­vices forCommunications, Proceedings ofthe International Confer­ence on Communications, Amsterdam, The Netherlands, P. Dewildeand C.A May (eds.), North-Holland, NewYork, May1984,pp. 1610-1613.

6. B.S.Atal, "High-quality speech at lowbit rates: Multi-pulse and sto­chastically excitedlinearpredictive coders,"ICASSP '86, Proceed­ings ofIEEE International Conference onAcoustics, Speech, andSig­nalProcessing, Tokyo, Japan,April 7 to 11,1986, Vol. III,IEEE, NewYork, April 1986, 1986, pp. 1681-1684.

7. R V.Cox,W. B.Kleijn, and P. Kroon, "RobustCELP CodersforNoisy Backgrounds and Noisy Channels," ICASSP '89, Proceedingsofthe International Conference onAcoustics, Speech, andSignal Pro­cessing, Glasgow, Scotland, May23to 26,1989, Vol. II, IEEE, NewYork, May1989, pp. 739-742.

8. N. S.Iayant andJ-H. Chen,"SpeechCodingwithTime-Varying BitAllocations to Excitation and LPC Parameters,"ICASSP '89,Proceedings ofthe International Conference onAcoustics, Speech,andSignal Processing, Glasgow, Scotland, May23to 26,1989,Vol. I, IEEE, NewYork, May1989, pp. 65-68.

9. W.B. Kleijn, D.J Krasinski, and R. H.Ketchum, "Improved SpeechQuality and Efficient VectorQuantization in SELP," ICASSP '88,Proceedings ofthe International Conference onAcoustics, Speech,andSignal Processing, New York, April 11to 14,1988, Vol. I, IEEE,NewYork, April 1988, pp. 155-158.

10. P. Kroonand B.S.Atal, "PitchPredictorswithHighTemporalResolution," ICASSP '90, Proceedings ofthe International Confer­ence onAcoustics, Speech, andSignal Processing, Albuquerque, NewMexico, April 3 to 6,1990,Vol. II, IEEE, NewYork, April 1990,pp. 661-664.

11. Y.Shoham,"Constrained Excitation CELP Coding," IEEEWorkshop on Speech Coding for Telecommunications, Vancouver,BritishColumbia, Canada, September1989, p. 65.

12. Telecommunications: Analog toDigital Conversion ofRadio Voice by4800bit/sec. Code Excited Linear Prediction (CELF), FED-STD­1016, SecondDraft, Office ofTechnology and Standards,NationalCommunications System,Washington, D.C., November 1989.

13. D. P. Kemp, R. A Sueda,andT. E.Tremain,"AnEvaluation of4800 bps Voice Coders," ICASSP '89,Proceedings ofthe Interna­tional Conference onAcoustics, Speech, andSignal Processing, Glas­gow,Scotland, May23to 26,1989, Vol. I, IEEE, NewYork, May1989, pp. 200-203.

14. V.C.Welch, T. E.Tremain,and J P. Campbell, "AComparison ofU.S. Government StandardVoice Coders," MILCOM 89, Bridgingthe Gap: Interoperabiliiy, Survivability, Security, Conference record,IEEEMilitary Communications Conference, Boston, Massachu-

AT&T TECHNICAL JOURNAL.SEPTEMBER/OCTOBER 1990

setts, October15to 18,1989, Vol. 1,IEEE, NewYork, October1989, pp.269-273.

15. T. E.Tremain, 'The Government StandardLinear Predictive Cod­ingAlgorithm: LPC-lO," Speech Technology, Vol. 1,No.2,April 1982, pp.4G-49.

16. B.S.Atal and M. R. Schroeder,"Adaptive Predictive Coding ofSpeechSignals," The Bell System Technical journal, Vol. 49,No.8,October1970, pp. 1973-1986.

17. Telecommunications: Analog toDigital Conversion ofVoice by 2,400Bit/sec. Linear Predictive Coding, FED-STD-1015, Office ofTechno1­ogyand Standards,National Communications System, Washington,D.C., March 1983.

18. G.S.Kangand S. S. Everett, "Improvement ofthe ExcitationSourcein the Narrow-Band Linear Prediction Vocoder,' IEEETransactions onAcoustics, Speech andSignal Processing, Vol. ASSP­33,No.2, April 1985, pp. 317-386.

19. G.S. Kangand S.S. Everett, "Improvement ofthe NarrowbandLinear Predictive Coder,Part 2:Synthesis Improvements,"Report8799, Naval Research Laboratory, Washington, D.C.,June 1984.

20. J. R. B.De Marca,N. Farvardin, N.S.Iayant,andy. Shoham,"RobustVectorQuantization for Noisy Channels," Proceedings ofthe M-SAT Conference, Jet Propulsion Laboratories, Pasadena, Cali­fornia, May1988, pp.515-520.

21. E. S. K Chien,D.J Goodman andJ E. Russell, "Cellular AccessDigital Network (CADN): WirelessAccessto Networks oftheFuture,"IEEE Communications Magazine, Vol. 25,No.6,June 1987, pp.22-27.

22. 1. A Gersonand M.A.Jasiuk,"Vector SumExcited Linear Predic­tion (VSELP)," IEEE Workshop on Speech Coding forTelecommuni­cations, Vancouver, BritishColumbia, Canada, September1989,pp.6tHi9.

23. P. Kroon, E. F. Deprettere,and R. J. Sluyter, "Regular-PulseExcitation-A Novel Approach to Effective and Efficient MultipulseCodingofSpeech," IEEE Transactions onAcoustics, Speech andSignal Processing, Vol. ASSP-34, No.5, October1986, pp. 1054-1063.

24. P.Vary, K Hellwig, R. Hofmann, R. J. Sluyter, C.Galand, andM. Rosso, "SpeechCodecfor the European Mobile Radio System,"ICASSP '88, Proceedings ofthe International Conference onAcous­tics, Speech, andSignal Processing, NewYork, April 11to 14,1988,Vol. I, IEEE, NewYork, April 1988, pp.227-230.

25. M.Taka and X. Maitre,"ccm Standardizing Activities on Speechcoding," ICASSP '86, Proceedings ofIEEE International ConferenceonAcoustics, Speech, andSignal Processing, Tokyo, Japan,April 7 to11,1986, Vol. II, IEEE, NewYork, April 1986, pp.817-820.

26. "G.721-32 kbits/s Adaptive Differential PulseCodeModulation(ADPCM)," ReportR57, Part II, CCITT StudyGroupXVIII, IXthPlenaryAssembly, Melbourne, Australia, 1988.

27. "Recommendation G.727-5, 4, 3, 2 bit per Sample Embedded

ADPCM," CCIIT StudyGroup XV, 1990.28. "Recommendation G.764-Packet Voice," CCIIT StudyGroup

XVIII, 1990.29. M.H.Sherif, D.O. Bowker, G.Bertocci, B.A.Orford, and G.A.

Mariano, "Overview ofCCIIT/ANSI Embedded ADPCM Algo­rithms," ICC '90, IEEE International Conference on Communica­tions" Atlanta, Georgia, April 16to 19, 1990, IEEECommunicationsSociety, NewYork, April 1990, pp. 1014-1018.

30. M.K Verma, D. Prezas,T. L. Russell, M.H.Sherif, and R. Thor­kildsen, "Novel Applications ofSpeechProcessinginAT&T Net­workSystemsProducts," AT&TTechnicaljournal, Vol. 69, No.5,September/October1990, pp.77-86.

31. V. Ramamoorthy, N.S.Jayant, R. V. Cox, and M.M.Sondhi,"Enhancement ofADPCM SpeechCoding withBackward-AdaptiveAlgorithms for Postfiltering and NoiseFeedback," IEEE journal onSelected Areas in Communications, Vol. 6, No.2, February1988,pp.364-382.

32. D.C.Cox, "Portable Digital Radio Communications-AnApproachtoTetherlessAccess," IEEE Communications Magazine, Vol. 27,No.7, July 1989, pp.3Q-40.

33. R. Steele,'The Cellular Environment ofLightweight HandheldPortables," IEEE Communications Magazine, Vol. 27, No.7, July1989, pp.20-29.

34. J-H. Chen,"ARobust Low Delay CELP SpeechCoderat 16kbps,'Communications Technology forthe 19905 andBeyond, Globecom'89, 8th IEEEGlobal Telecommunications Conference, Dallas,Texas,November 27to 30, 1989, IEEE, NewYork, 1989,pp.3411-3415.

35. J.-H. Chen,"HighQuality 16kbps speechcodingwitha one-waydelayless than 2 ms,"ICASSP '90, Proceedings ofthe InternationalConference onAcoustics, Speech, andSignal Processing, Albu­querque,NewMexico, April 3 to 6, 1990, Vol. I, IEEE, NewYork,April 1990, pp.453-456.

36. T. Irmer,"An IdeaTurns Intoa Reality-CCITI Activities on theWayto ISDN," IEEE journal onSelected Areas in Communications,May1986, pp.316-319.

37. P. Mermelstein, "G.722, ANewCCIIT Coding Standardfor DigitalTransmission ofWideband Audio Signals," IEEE CommunicationsMagazine, Vol. 26,No. I, January 1988, pp.8-15.

38. "G.722-7 kHzAudio CodingWithin 64kbits/s,' Report R57,Part II,CCIIT StudyGroupXVIII, IXth Plenary Assembly,

Melbourne, Australia, 1988.39. G.Modena, A.Coleman, P. Usai, and P. Coverdale, "Subjective per­

formance evaluation of the 7 kHzAudio Coder," CSELT TechnicalReport (Centro Studie Laboratori Telecommunicazion, Turino,Italy), Vol. 15, No.2, March 1987, pp.171-176.

40. J. D.Johnston,'TransformCoding ofAudio Signals Using Percep­tual Noise Criteria," IEEE journal onSelected Areas in Communica­tions, February1988, pp.314-323.

41. J. D.Johnstonand K H.Brandenburg, "Sound Coding Algorithm,"MPEG-891-148, Report ofISO-IEC/JTCI/SC2/WG8 committeemeeting, Stockholm, Sweden, June 1989.

42. J L. Flanagan, J D.Johnston,R. Zahn, and G.Elko, "Computer­steeredmicrophone arraysfor soundtransduction in largerooms,"journal ofthe Acoustical Society ofAmerica, Vol. 78, No.5,November 1985, pp.1508-1518.

43. M.M.Sondhi and G.W.Elko, "Adaptive Optimization ofMicro­phoneArraysunder a Nonlinear Constraint," ICASSP '86, Proceed­ings ofIEEE International Conference onAcoustics, Speech, andSig­nalProcessing,Tokyo, Japan,April 7 to 11, 1986, Vol. II, IEEE, NewYork, April 1986, pp. 19.9.1-19.9.4.

Biographies (continued)has a B.Sc. in electrical engineering from London University(England), a D.I.C. from Imperial College of Science and Tech­nology (London, England), and a Ph.D. in electrical engineer­ing from London University. Mr. Prezas was responsible fordevelopment of speech coding and automatic speech recogni­tion technologies for intelligent network applications andsecure voice equipment. He joined the company in 1979 andhad a B.S. in physics from the University of Athens, Greece,and an M.S. and Ph.D. in electrical engineering from the illi­nois Institute of Technology in Chicago.

(Manuscript received june 14, 1990)

AT&T TECHNICALJOURNAL.SEPTEMBER/OcroBER1990

41