Coding of Speech and Wideband Audio - The Telecom Archive

CODING OF SPEECH ANDWIDEBAND AUDIO

_______ Nikil S. Jayant, Victor B. Lawrence, and Dimitrios P. Prezas -. _

Nikil S. Jayant is headof the Signal Processing Research Department of AT&T BellLaboratories, MurrayHill, New Jersey. VictorB. Lawrence is head ofthe Data Communications Research Department of AT&T Bell Laboratories, Middletown,New Jersey. The lateDlmitrios P. Prezaswas a supervisor in theAdvanced ServicesTechnology Departmentof AT&T Bell Laboratories (Indian Hill Park),Naperville, Illinois.Mr. Jayant is responsible for research in signal processing, including coding and communication of speech,image, and widebandaudio. He joined thecompany in 1968 andhas a Ph.D. in electrical communicationsengineering from theIndian Institute of Science (Bangalore,India). Mr. Lawrence isresponsible for exploratory development ofdata communicationsequipment and services. He joined thecompany in 1974 and(continued on page 41)

Advances incoding algorithms and digital signal processing have ledto sophisticated technologies forspeech communication for a variety ofapplications, aswell as togreaterflexibilities in the design ofISDN terminals for integrated communication ofspeech, images,and data. Fortraditional telephony with a signal bandwidth of3.2 kHz, the transmission ratefor networkquality speech is now down to 16 kb/s. Robustcommunications-quality speech appropriate for cellularradio has beenrealized at8kb/s. Research attention isshifting toward 4kb/s, focused onimproving speakeridentification and the naturalness ofcoded speech. Forwideband audio with a signal bandwidth of7kHz, highquality coding is now possible at 32 kb/s, which impliesstereo teleconferencing or dual-language programmingover a 64-kb/s channeL Transparent coding of20-kHzaudio has beendemonstrated at 128 kb/s, with neartransparent performance at ratesas low as 64 kb/s forsome classes ofsignals.Introduction

This paper is a review ofthe technology for digital speechcoding. First,we discusstraditional telephone speechwith a bandwidth ofabout3.2kHz (kilohertz). Then, wetum our attention to higher gradewideband speechwitha bandwidth of7 kHz, and webriefly discusswideband audio witha bandwidth of20kHz.

The bit rate in the digital representation ofspeechcould varyfrom2 to 128 kb/s (kilobits per second),depending on the applicationand on user expectations ofsignalquality. To describethe performance ofa digital codingsystem, we use several parameters, such as:- Processing delay- Tolerance oftransmission errors and multiple stages ofcoding and

decoding

AT&T TECHNICAL JOURNAL. SEPTEMBER/OCTOBER 1990

25

Panel1. Acronyms and Terms

26

ADPCMAMAPCAPLATMAUDIXCADNccrrr

CDCELPcentrex

codecCPECTIA

DAM

DCMEDDSDRT

DSIDSPFMFXG.711G.721

adaptive-differential pulse-code modulationamplitude modulationadaptive-predictive coderanalog private lineautomatic-teller machineaudio-information exchangecellular accessdigital networkInternational Telegraph andTelephone

Consultative Committeecompact diskcode-excited linearpredictioncentralexchange; a service provided by

the local telephone company that permitsanytelephone extension within acompany to callanotherextensionwithin the company or dial directly to anoutside line

coder-decodercustomer-premises equipmentCellular Technology IndustryAssociation

(North America)diagnostic acceptability measure; reflects

acceptability ofspeechcommunicationina multidimensional sense

digital circuit-multiplication equipmentdigital dataservicediagnostic rhymetest; a measureofword

intelligibilitydigital speechinterpolationdigital signal processorfrequency modulationforeign exchangeccrrrstandardfor PCM at 64kb/sccrrrstandardforADPCM at 32kb/s

G.722

G.723

G.727

G.764

GSM

HDTVlACSINMARSATISDNISO

LPCLD-CELPMFLOPmodemMOS

MSATMSENSAPBXPCMPSTNSELPSNRSTUTASIvocoder

ccrrr standard for 7-kHz audio; a 64-kb/salgorithm for ISDN teleconferencing andloudspeaker telephony

ccrrr standardforADPCM at 24, 32, and40kb/s

draftccrrr standardforADPCM at 16, 24,32, and40kb/s

ccrrr standardforpacketspeechtransmission

Group Speciale Mobile (Europe);standards organization fordigital cellular radio

high-definition televisionintegrated accessandcross-eonnect systeminternational maritime satelliteIntegrated Services Digital NetworkInternational Organization forStandardiza-

tionlinear-predictive codinglow-delay CELP106 floating-point arithmetic operationsmodulator-demodulatormeanopinion score; usedforevaluating the

performance ofcoding algorithmsmobile satellitemeansquarederrorNational Security Agency (U.S.)private-branch exchangepulse-eode modulationpublic-switched telephone networksum-excited linearpredictionsignal-to-noise ratiosecure telephone unit (STU-II or STU-III)time-adaptive speechinterpolationvoice coder

- Ability to handlenonvoice signals, such as voicebandmodem waveforms.

However, the most important descriptors ofcoderperformance are the quality ofthe digitized speechat atarget bit rate and the way the quality diminishes withdecreasing bit rate.

Measuring Speech Quality. The measurement ofspeechquality has been a difficult and long-standingproblem. In this paper, weuse a subjective ratingscale of1to 5,wherever possible, to quantify the level ofdigitalspeechquality. This is the so-called mean opinion score,or MOS scale.l- that is widely used forevaluating coding

AT&TTECHNICALJOURNAL. SEPTEMBER/OCTOBER 1990

algorithms fordigital telephony. (Panel 1defines acronyms and terms.)

Ascoreof4.0 on the MOS scalewill signify highquality, or near-transparent coding. Network quality willimply highquality as a necessary condition, but nottheonly one. It alsoimplies that the speechcoderprovidesfurthercapabilities demanded bythe telecommunicationsnetwork environment.

An MOS of3.5 will denote communications quality. Atthis level, speechdegradation is easily detectable,but notbad enough to impede natural communication.

Finally, synthetic quality will imply a signal that is

Digital-coding _ CCITT CCITT CCITT GSM CTIA NSA NSAstandard 1972 1984 1991 1988 1989 1989 1982

Applications_ Network Mobile radio, Secure voiceVoice mail

Quality (MOS)- 4.0-4.5 3.5-4.0 2.5-3.5

MOS scale I I I I I5 Excellent4 Good 64 32 16 8 4.8 2.43 Fair Bit rate (kb/s)2 Poor1 Unacceptable

Figure 1. Digital telephony standards, typical applications,and ranges of speech quality (which is expressed using thefive-point MOS scale). The frequency range oftelephonespeech is 200 to 3400 Hz; hence, the speech bandwidth is3.2 kHz. An MOS score of 4.0 signifies high quality, or neartransparent coding. The coding standards define dlgltalcoding algorithms at the particular bit rate; the 16-kb/sstandard is likely to be a hybrid coding algorithm.

characterized by an inadequate level ofnaturalness andspeakerrecognizability, although it mayhave high intelligibility. These deficiencies are usually reflected byanMas that does not exceed3.0.

Wewill alsouse the five-point Mas scaleinourdiscussion ofcoding algorithms for 7-kHz speech.

Mas measurements ofspeechquality aresupplemented, especially in low-bit-rate speechtechnology, by scores ofDRT (diagnostic rhymetest) and DAM(diagnostic acceptability measure). The DRT is a wordintelligibility measure, while the DAM reflects acceptability for speechcommunication in a broadermultidimensional sense.3,4

Digital Coding of Telephone SpeechFigure1describes the current stateoftelephone

speechcoding in terms ofstandards activity, bit rate, typical application, and quality ofdecoded speech. Weassumethat the frequency rangeofthe input signal is200 to 3400 Hz (hertz), and that the quality ofthe outputspeechis measured on the five-point Mas scale.

The currentgoalsin speechcoding include theachievement ofnear-transparent or transparent qualityat 8 kb/s, and robust telecommunications quality at4.8 kb/s and lower. (By robust, we mean the performanceis not degraded drastically acrossvarious speechsignals, various speakers, andvarious transmissionenvironments.)

Figure 2presents a more quantitative descriptionofspeechquality as a function ofbit rate.The historicalprogression is from right to left. In the figure, the characteristics depicted as solid curves refer togeneric examplesofcoding algorithms. The dashedcurve describes aresearchgoalthat is believed to be achievable in that itdoesnotviolate fundamental limits incoding capability.

Pulse-code modulation (peM) is the simplest

AT&T TECHNICAL JOURNAL.SEPTEMBER/OCTOBER 1990

27

64

G.711

328 16Bit rate (kb/s)

4

G.711, 64-kb/s PCMG.721, 32-kb/s ADPCMG.7xy, 16-kb/s coder

• Vocoder

Fair 3

Bad 1 L..-.__-L-__---..I -L-__----L__----J

2

Poor 2

Excellent 5

OJ Good 4rooeneno~

~ro::l0s:()

<Il<IlC.en

Figure 2. Quality of telephone-bandwidth speech(using the MOS scale) as a function of transmission rate. The signal bandwidth Is 3.2 kHz.G.711 and G.721 are existing CCITT dlgitalcoding standards, while G.7xy is the pendingCCITT standard. The research goal fits withinthe constraints of the fundamental limits In codIng capability.

28

coding system, a memoryless quantizer.The waveform codercurveis that ofa high

complexity algorithm, such as adaptive predictive coding. Waveform coding uses redundancy-removing operationsto presenta signal oflower energyto the amplitudequantizer, which results ina lower bit rate fora specifiedlevel ofoutput-speech quality.

The vocoder pointrepresentsan algorithm thatproduces intelligible but synthetic-sounding speechatverylow transmission rates by usinga highly compactexcitation-modulation model (Figure 3a). The syntheticquality in this systemis accepted inapplications wheredigital encryption and low transmission rate are ofparamountimportance. These could include commercialapplications, such as banking, but the principal customers,by far, are government and defense agencies.

In Figure2, the hybrid codercurve describesthe performance ofa classofalgorithms that combinethe high-quality potential ofwaveform coding with thecompression efficiency ofa model-based vocoder. Here,the ideais to use a time-varying excitation model that is

muchmoresophisticated than that ofa traditionalvocoder.t-" This model uses waveform-coding principlesto compute an excitation that minimizes distortion forevery frame [say, 16ms (milliseconds)1ofinput speech.(See Figures3b and 3c.) Hybrid codersfor4.8, 8, and16kb/s are discussed later.

The solid dots in Figure 2 referto coding algorithmsthat provide highquality at 64,32, and 16kb/s.Both PCM at 64kb/s andadaptive-differential PCM(ADPCM) at 32kb/s are ccrrrstandards, as defined inFigure 1,and are called G.711 and G.721, respectively.(ccrrr is the International Telegraph andTelephoneConsultative Committee.) These algorithms providenetwork-quality coding.

Currently, the CCnT is considering the definitionofa high-quality speechstandard at 16kb/s. The technique is likely to be a hybrid coding algorithm.

Figure 2 shows that our currentunderstandingofcoding has notyielded high-quality speechat bit ratesbelow about8 kb/s-in particular, in the importantneighborhood of4 kb/s.

AT&T TECHNICAL JOliRNAL.SEPTEMBER/OCTOBER 1990

29

Figure 3. Models ofspeech excitation.(a) LPC vocoder andhybrid coders, whoseperformance is givenin Figure 2. (b) Multipulse LPC. This coderuses a more sophisticated time-varyingexcitation model thana traditional vocoder.The excitation computed minimizes distortion for everyframe (e.g., 16 ms) ofinput speech.(c) Codebook-excitedLPC. This coderselects the best excitation vector from acodebook of possiblevectors.

SpeechSpectralenvelope

Short-delaycorrelation

filter


filter


filter

Finestructure

Voiced orunvoiced

switch

Long-delaycorrelation

filter

Long-delaycorrelation

filter

Excitation

(b)

(c)

(a)

I" I ,I I I I, I , II I ~ I I

Digital Telephony at 4 kb/sHigh-quality coding at 4 kb/s is a primaryfocus

in speech research.5-II Speech coding at bit rates of4kb/s is important for:- Enhancing secure telephony in government and mili

tary applications. 12 (See the next section.)- Providing the central capability offuture band-efficient

systems for digitalradio;for example, cellularchannels with a user bandwidthof 5 kHz. These arewireless-access channels for movingvehicles in rural,suburban, or urban environments that are served byterrestrial base stations.

- Mobile-satellite (MSAT) communication applicationsfor providing wireless access to movingvehicles inremote areas.

- INMARSAT (International MaritimeSatellite) applications with a 6.4-kb/s target for the total transmissionrate (i.e., the speech-coderbit rate plus overhead for

channel-errorprotection).- Storage ofcoded speech. When coded at 4 kb/s, one

hour of spoken materialcouldbe stored on a single16-Mb (megabit) memorychip.

Table I summarizes DRT, DAM, and MOS performances ofvarious speech coders that range from 64to2.4kb/s. I ,13,14 The numbers in this table are not theresult of a single well-controlled experiment, but areaccumulated fromvarious independent sources. As such,the scores in the table should be used for a rough,overall indication ofperformancerather than for veryfine comparisonsand judgments.

The code-excited linear prediction (CELP)algorithms in Table I typify the hybrid coders depictedin Figure 2. Recentadvancesin CELP coding have produced significant improvements relative to 2.4-kb/svocoding. This is reflected in a standardized,4.8-kb/salgorithm (Figure 1) with high levelsof DRT and DAM.

AT&T TECHNICALJOURNAL.SEI'TEMBER/OCTOBER 1990

30

Table I. DRT, DAM and MOS Scores for Standard Speech Coders

Score

Coder DIU DAM MOS

64·kb/s PCM 95 73 4.332-kb/sADPCM 94 68 4.116-kb/sLD-CELP 94* 70* 4.08-kb/s CELP 93+ 68* 3.94.8-kb/sCELP 931 64 3.2t

2.4-kbps LPC (vocoder) 90 53 2.5*

* estimatest upperbound1 lowerboundNOTE: See Panel1for definitions for acronyms.

However, the MaS quality of4.8-kb/s speech, as alreadynoted in Figure 2, falls well below the high-quality levelof4.0, whichsuggests further research is needed at thisimportant bit rate.

Secure Voic_A Case Study. United Statesgovernment agencieshave deployed secure telecommunicationsover the public-switched telephonenetworkfor severalyears.As modemtechnology advanced, highlyeffectivedigital encryption techniques became available for suchapplications. Low-bit-rate voice codingwas required totake advantage ofthese sophisticated methods.

In the early 1980s, the DepartmentofDefenseintroducedthe Government StandardLPC voice-codingalgorithm at 2.4 kb/s (see Figure 1).15 (LPC is linearpredictive coding.) This vocoderfeatured a simplifiedsource-excitation modelthat provided fair speech intelligibility. However, the vocoderand its speech lackednaturalnessand robustness, exhibitedinconsistent performance across the speaker population, and allowed little, if any,speaker recognition. (Thatis, the listener usually couldnot identify the person whowas speaking.)

This vocodertechnology was incorporated in alimited number ofbulkyand expensive secure-telephoneunits (called the STIJ-u). These units alsofeatured a9.6-kb/s adaptive-predictive coder (APC) 16 that was

AT&T TECHNICALJOURNAL. SEPTEMBER/OCTOBER 1990

capable ofproviding near-communications quality. However,network coverage at 9.6kb/s was inadequate, andthe performance ofthe waveform-type coder dramaticallydeterioratedwhen it wasoperatedat and below 4.8kb/s.Therefore, further research wasdirected towardimprovements at the 2.4-kb/s rate.

This effortresulted in FederalStandard1015,17an enhancedversionofthe old Government StandardLPC. (The2.4-kb/s LPC systeminTable I is the enhancedversion.) Primarychanges occurred in the voicing andpitch-detection areas," withsecondarymodifications inexcitation format and spectralshaping. 19 These changesyielded a first-order effect on coder robustness,and marginalimprovements in intelligibility, naturalness, andspeaker recognition were attained. The result wasasizable net increase in the DAM fromabout 50to about55, whichplaced the coder at the upper end ofthesynthetic-quality range.

This enhanced performance wasthe motivatingfactorfor the introduction ofthe STU-III programin themid-1980s. The program,whichwasdriven by theNational Security Agency (NSA) , provided governmentsupportfor the development ofthe next-generationsecure-telephone units (called the STIJ-III) and fosteredpurchase ofthe units byvariousgovernmentagencies.Compact and cost-effective compared to their predecessors, the STU-IIIs featured the new2.4-kb/sstandard.AT&T's participation in this programresulted in thedevelopment ofthe Security-Plus terminal, which hasbeen in production for the last three years.

The introduction of CELP coders" in the mid1980s made communications quality feasible at 4.8kb/s.Atthe same time,newmodemtechnology permittedwide networkcoverage at this bit rate. Overall coderrobustness, naturalness, and speaker recognition farexceeded those ofa 2.4-kb/ssystem. Afterits introductionby the Acoustics Research Departmentat AT&TBellLaboratories (a division ofAT&T), CELP codingattracted the attention ofnumerouspotential users,including the NSA. AT&T's ability to develop organiza-

tional synergies, within and outside the corporation:- Resulted in swift transfer of technology fromresearch

to development- Effectively identified the customer's needs- Hada timely impacton the NSA's decisionsabout stan-

dardization.'Continued workby AT&T BellLaboratories

focused on improving the computational and performance profile of the new CELP coder.This facilitated thecoder's rapid implementation on the digital signalprocessors (DSPS) available at the time. Fast techniques thatexpedite searching through codebooks? brought abouta tenfold reductionin computational load. Constrainedexcitation11 and fractional pitch-delay trackingtechniques-? contributedto a net DAM gain ofabout 10 unitsover FederalStandard1015. Algorithms that addresssourceand channel noise7,2o increased CELP'S robustness for use in real-world applications.

During 1987, the NSA launcheda newstandardization efforttoward the 4.8-kb/s rate. In early 1988,AT&T BellLaboratories demonstratedthe feasibility ofthe 4.8-kb/s CELP coder, using laboratory prototypehardware that reflected the computational capacity ofthevoice sectionofthe Security-Plus terminal. In mid-1988,AT&T demonstrated,at the NSA, secure-call completionat 4.8kb/s using CELP-modified Security-Plus terminals.The 4.8-kb/s standardization process,whichhad beenaccelerating, peaked by mid-1989 when the NSA issuedthe first predraftofFederalStandard1016 and submittedit to the U.S. Office ofStandardsfor approval. 12 (The4.8-kb/scoder inTable I is this versionofthe standard.)

Toward the end of 1989, the NSA awardedcontracts to vendorsfor incorporating into security-terminalproduction a 4.8-kb/s CELP coder that wascompatiblewith FederalStandard 1016. Compatibility requirementspermitted shorter codebooksand allowed for optionalfeaturesto accommodate immediate implementationsthat used current DSP products.AT&T's earlyefforts andcontributions in the CELP-coding area havefacilitated theinclusion ofnewtechnology into a preexisting product.

Channel quality

Figure 4. Speech quality versus channel quality in cellulartelephony, based on semiquantitative estimates. In the firstNorth American digital-cellular standard (i.e., the pendingeTIA standard), speech coding is at 8 kb/s.

AT&T's newversionofthe Security-Plus terminal, thefirst ofsuch products to feature a 4.8-kb/s CELP coder,has been in production sinceAugust1990.

Digital Telephony at 8 kb/sSystems for first-generation digital cellular radio

use bit rates ofabout 8 kb/s for speech coding (Fig-ure 1). In NorthAmerica, the proposed systemhas aper-userchannelbandwidth of 10kHzand a total transmission rate ofabout 13kb/s for speech codingandchannel-error protection. 7,21,22 The systemwill eventually replacethe current practice ofanalogFM speech thathas a 3D-kHz user bandwidth. The digital system providesgreater robustness to channelnoiseand fading, aswell as better reuse of individual carrier frequencies. Asa result, the improvement in callcapacity (numberof


31

32

users) will exceed the factor of3 implied by the changeoverfrom 30kHzto 10kHz in user bandwidth. The netgain is expectedto be a factor of5 to 7.

In NorthAmerica, cellular telephony is expectedto be based on a cELP-coding algorithm, i.e., the pendingCIlA standard. (CfIA is the Cellular Technology IndustryAssociation.) Speechquality at 8 kb/s currentlyfallsslightly below the high-quality threshold (MOS = 4.0).However, the communications quality that the 8-kb/sCELP coder provides is adequatefor improvements overanalog FM telephony, especially at lowlevels ofradiochannelquality [say, a channelsignal-to-noise ratio (SNR)below 15to 18dB (decibels)]. This is depicted in theimpressionistic curvesofFigure4.The communicationdelaycaused by the speech codec (coder-decoder) isexpectedto be about40ms, a value considered acceptable in the cellular application.

For digital cellular radio in Europe, the recommendation ofthe GSM (Group Speciale Mobile) is alsoahybridcoder. It is a regular pulse-excitation algorithmwith a bit rate of 13.2 kb/s (outofa total transmissionrate of22.8 kb/s) and a codec delayof40ms.23.24 Thecodingtechnique is similar to a 9.6-kb/smultipulseexcitation coder for the Skyphone" airline application.(Skyphone is a registered service mark ofBritishTelecommunications, PLC.)

Network-Quality Speech CodingFor ubiquitous application in networks, a

speech-coding algorithm has to satisfy several performancecriteria, including:- Alevel ofspeech quality that is high enough to with

stand multiple stages ofcoding and decoding- Aprocessingdelay that is lowenough to withstand

echoes and additional delaycomponents in the network

- The ability to handle nonspeech signalsin the telephoneband.

PCM and Variable Bit-Rate ADPCM. Algorithms at64kb/s (G.711, PCM) and 32kb/s (G.721, ADPCM) satisfya broad class ofnetwork requirementsand are inter-


national CCITf standards.25- 28These standardsare inwidespread use in both public and private speechtelecommunications.

The 32-kb/s standard, G.72l, is relatively recent(i.e., 1984). An important application ofthis codec is indigital circuit-multiplication equipment (DCME). Here,the combination of2:1 compression (from 64kb/s to32kb/s) withexploitation ofthe so-called 2.5:1 TASI orDSI gain (i.e., the effect ofsilencesin speech) provideseffective circuitexpansions of5:1 overtraditional telephone systems. (TASI is time-adaptive speech interpolation. The silences in naturalspeechoccurwhenthe talkerpauses to breathe or collect his or her thoughts or stopsspeakingand waits for the other personto beginspeaking, andwhen he or she is listening to the otherspeaker.)

The 32-kb/salgorithm has been extended to24and 40kb/s in the G.723 standard. Also, embeddedADPCM 29 is a draft CCITf standard, G.727, at the 40, 32,24, and 16-kb/srates. G.727 can be used withG.764 forwideband packetnetwork applications, such as inAT&T'sintegrated access andcross-connect system (IACS).30

Lower transmission rates such as 24and16kb/s, if based on ADPCM, do not provide networkquality coding. However, they permitoccasional burstsofheavy telephone traffic to be accommodated, withoutexplicit coordination amongallnodes in the pathofthecall. Adaptive algorithms can be usedfor postfiltering'"at the receiverto enhance the lower speech quality ofthe24- and 16-kb/ssystems. However, the use ofpostfilteringadversely affects the performance ofa systemthathas multiple stages ofencoding and decoding.

The higher ADPCM rate of40kb/s (from theG.723 standard) provides the capability for transmitting9.6-kb/smodem waveforms. The simplicity ofthe G.721algorithm and relatedalgorithms alsomakesthem attractive forwireless-access applications that requireverylowtransmitterpower; for example, in terminals that arewithin or near a building that has indoorwireless communication.32,33

The 32-kb/sADPCM algorithm ofthe G.721 standard is also robust to (i.e., it can tolerate) multiple stages

Input speech

~

Vectorquantizer

index

Side information

MinimumMSE

Perceptualweighting

Buffer andanalysisr--

II

LPCsynthesis

filter

Pitchsynthesis

filter

~ ~ J

Excitationvector

quantizercodebook

/,------- 20 ms-----....."

(a)' .....----- 20 ms ------'/ --Current speech vector

to be coded

33

Vectorquantizer

indexMinimumMSE

Perceptualweighting

Input speech

Predictoradaptation

LPCsynthesis

filter

Gainadaptation

III

I IL ~

Excitationvector

quantizercodebook

(b)

ofencoding and decoding, and more robust than 64-kb/sPCM to digital errors in transmission. Ata bit-error rate of1in 1000, the degradation in speech quality for32-kb/sADPCM is graceful. Ata bit-error rate of1 in 100, speechintelligibility is good, although the quality is poor.

Low-Delay Speech Coding at 16 kb/s. Currently,the ccm is considering the definition ofa low-delaynetwork-quality speech standardat 16kb/s (Figure 1).Possible applications include DCME, ISDN transmission,packetized speech,cordlesstelephones, and speechforvideophone service. (ISDN is the Integrated ServicesDigital Network.)

Figure5b is the block diagram ofa backwardadaptive CELP coder proposed byAT&T for this CCITfstandard.34•35 In this system, the only sourceofencoding

Figure 5. Code-excited linear prediction. The waveform's2o-ms segments are used to perform speech analysis toprovide LPC filter coefficients. (a) Conventional or fullyforward-adaptive CELP; uses the 2o-ms segment on theright in the waveform. (b) Backward-adaptive system forlow-delay CELP, the proposed CCITT standard at 16 kb/s;uses the left 2o-ms segment of the waveform. In this system, the forward adaptation of the shape of the excitationsignal is the only source of encoding delay.

delay is the forward adaptation ofthe shapeoftheexcitation signal. This delay comesfrom selecting thebest excitation vectorfrom a rich codebook ofpossibleexcitation vectors, each a sample vectoroflength5(0.625 ms). Figure5ais the more traditional, fully


15,000I I20,000

FM radio

Compact disk

AM radio ~J

Telephone II

200 3400 7000Audio bandwidth (Hz)

CCITT 1987 (64 kb/s)

CCITT 1972 (64 kb/s) }CCITT 1984 (32 kb/s)CCITT 1991 (16 kb/s) ,L--II----II_---L ..I....-_~_-'-_ __L_.....

102050

Digital-codingstandard

ISO 1990(128 or 96 kh/s)

Figure 6. Four grades of aUdio-signal bandwidthand corresponding standards for digital coding.Audio is stored on today's compact disk at a bitrate of about 700 kb/s per sound channel, butthe emerging ISO standard calls for a singlechannel bit rate of 96 or 128 kb/s. The CCITT's1991 standard will define low-delay, networkquality speech.

34

forward-adaptive eELPsystem.An important challenge for the proposed algo

rithm (Figure 5b) is to combine high quality with lowprocessing delay. The delay requirement meansthat thetime-varying model for the speech-spectral envelope hasto be estimated in a backward-adaptive mode, usingahistory ofalready quantized speech. Forforward spectralestimation, tens ofmilliseconds ofinputspeechare buffered. 5,6,22 In the backward-adaptive mode, the challengethen is to realize adequate spectral estimation eventhough quantization noise is present in the past speechsamples used forbackward spectral analysis.

The algorithm is complex, with the codebooksearchas the single, mostdemanding component. A25-MFWP processorwith advanced memory capabilities(Le., the AT&T WE® DSP32e processor) is available,which permitsa single-chip (half-duplex) implementationofthe coder. Currently, a full-duplex coderrequiresatwo-chip implementation, but prospects for a single-chipimplementation withnearly equalspeechquality aregood. (MFWP standsfor 106 floating-point arithmeticoperations. In full duplex transmissions, datais transmitted and received simultaneously. With halfduplex, datacanbe transmitted and received, but only in one directionat a time.)

Also ofinterestare the possibility of:- Integer-point processing, usingthe AT&T WE DSP16A

processor.- Extending the low-delay property to lower bit rates,

such as 8kb/s, while maintaining the communicationsquality offered bytraditional high-delay codersatthose bit rates. (Digital telephony at 8 kb/s wasdiscussedearlier.)

Applications of 16- and 8-kb/s Coders in CPE. Theneedfordigitized, low-bit-rate voice products incustomer-premises equipment (ePE) is expected toincrease, as will the use ofdigital transmission facilitiesfor integrated voice anddataservices. The use oflow-bitratevoice will alsogrowbecause ofthe demand forstore-and-forward voice mail andforvoice-security applications. Speech codersat 16and8 kb/s are prime candidatesforePEapplications.

Intelligent 11 multiplexers. Today, several vendorsare offering intelligent T1 or fractional-T1 multiplexersfor large, corporate, T-carrier trunk-based networks.These networks are complete telecommunications systems that carrybothvoice and datatraffic. For these networks, the economies ofscaleandthe dynamic reallocationofbandwidth offer potential costsavings.

In many ofthese applications, users select64, 32,

AT&TTECHNICALJOURNAL. SEPTEMBERJocroBER 1990

24, or 16kb/s as the bit rate ofthe voice circuits, according tocost-performance tradeoffs.

Whenthe voice signal is compressed to 16kb/s,theT1voice-channel capacity (which originally was24channels) is increased to 96channels, andextra bandwidth is available for dataand image applications. Moresophisticated T1 multiplexers double the voice-channelcapacity by usingdigital speechinterpolation (DSI) ,which removes the silentpauses in speech. WhenDSI isused with 16-kb/scompressed speech, T1 multiplexerscan offer 192 or morevoice circuits overaT1link, withminimal voice degradation. Soon, the use of8-kb/s coding techniques will double the capacity ofthe T1links to384 voice circuits or more.

Mostcorporate, private, T-carrier trunk-basednetworks are PBX to PBX connections. Therefore, a voicecoding algorithm that performs well in asynchronoustandem applications is essential. In addition, the desirability ofavoiding echocancellation suggeststhe use ofalow-delay coding algorithm.

Compressed voice over APL and DDS circuits. Today,some ePE products multiplex voice and dataoverleased,digital-data-service (DDS) linesor analog, private lines(APLs) to smaller locations that cannotjustify the capacityorcostofT1circuits.

The DDS systemscanbe configured to provide56-kb/s service formultiple 8-kb/svoice channels and16-kb/s datachannels between PBX, centrex, or FX locations. These multiplexers are especially neededin international circuits. Because suchcircuits are expensive,usersnormally liketo multiplex as many voice connections as possible ontoa single circuit [for example, fivevoice channels plusone datachannel; i.e., (5x 8 kb/s) +(1 x 16kb/s) = 56kb/s], For international applicationsthatuse satellite links, the delay and echocharacteristicsassociated with the linksmakea low-delay, high-quality,compressed-voice algorithm highly desirable.

Another application for low-bit-rate speechis forautomatic-teller machines (ATMS). In this application, ahigh-speed, 19.2-kb/s APL circuit connects eachATM to a

central site. Besides voice, both dataand still-frameimages canbe multiplexed onto a single 19.2-kb/s circuit

Store-and-forward voice mall. Otherapplications ofhigh-quality, low-bit-rate speechcoding at 16kb/s (andperhaps, in the future, at 8 kb/s) are the call-answer andstore-and-forward voice mail features offered in manyPBXs today.

Voice messages received at 64kb/s canbe compressedto lower bit rates forefficient storage. In addition, customers canrecordand send messages to oneormorerecipients whoare connected to the network ofPBXs. Although there is adequate bandwidth to support64-kb/svoice transmission within the premises, the needforcompressed voice at 16kb/s or below arisesbecauseofthe limitations in storagerequirements.

AT&T's systemforAUDIX Voice Mail is typicalofthese store-and-forward services. (AUDIX standsforaudio-information exchange.)

Digital Coding of Wid.band Speech and AudioFigures1and 2 referred specifically to

telephone-band speech. In Figure 2,the achievement ofhigher quality at a given bit rate implied reduced speechdistortion, without anychangeofbandwidth. Butwhateffect doeschanging the signal bandwidth have onspeechquality andintelligibility?

Figure6 defines fourcommonly understoodgrades ofaudio bandwidth. If the audio signal is speechinstead ofmusic, the perceived gainsin quality are,perhaps, greatestwhenone progressesfrom the telephonelevel to the commentary, or AM-radio, level. Thegainsin quality are in terms ofincreased intelligibility,naturalness, andspeakerrecognition. Low-frequencyenhancement (i.e., 50to 200 Hz) contributes to increasednaturalness and speakerpresence, andhigh-frequencyenhancement (i.e., 3400 to 7000 Hz) provides greaterintelligibility andfricative differentiation (for example,sversusf).

In the rest ofthis section, wedescribe highquality compression ofwideband audio and ISDN

AT&T TECHNICAL JOURNAL. SEITEMBER/OCTOBER 1990

35

Auxiliary datachannel input;

o 8 or 16 kh/sI 16 kb/s JAudio- I Higher subband

signal I TransmitADPCM encoder 64-kb/s

input I quadrature Data- outputMultiplexer f---- insertion

I mirror Ifilters device

I Lower subband 48 kb/s II ADPCM encoder II I

14 bits,16 kHz

I16 kb/s I

Audio- I Higher subband Isignal I ReceiveADPCM decoder Data- Ioutput

I quadrature extraction IDemulti- - deviceI mirror plexer

filters (determines Input

I Lower subband 48 kb/s mode)ADPCM decoder

I (3 variants)

A T ~II t UXI rary data-

channel output;

Mode indication 0, 8, or 16 kb/s

36

Figure 7. Block diagram of a two-band subband coder for64-kb/s coding of 7-kHz aUdlo,37 the basis for the G.722standard. The low- and high-frequency subbands are quantized using 6 and 2 bits per sample, respectively. Theanalysis and synthesis filters produce a communicationdelay of about 3 ms.

applications ofdigital audio. We alsodiscussa CCIn coding standardfor 7-kHz audio and a 2o-kHz audio standardthat is beingdefined by the ISO (International Organizationfor Standardization).

The naturalnessofwideband speech is a significant featurefor extended telecommunications processes,such as audioteleconferencing and programbroadcasting. Basic-rate ISDN provides a naturalframework for a64-kb/s algorithm to encodewideband audio for suchapplications. [Basic-rate ISDN provides two 64-kb/scircuit-switched channels (bearerchannelsfor the customer's voice, data,or video) and one 16-kb/spacketswitched channel (datachannelfor the network's information).] The digital connectivity afforded by ISDN36hasprompted a worldwide revisiting ofaudio-transmissionquality. In particular, end-to-end digital connectivity hasmadepossible the inclusion oflowfrequencies down to

50Hzin the transmittedaudio band. 37Coding of 70kHz Audio. The CCIn standardfor

7-kHz audio (G.722) is a 64-kb/salgorithm developedprimarily for ISDN teleconferencing and loudspeakertelephony. Because ofthe 64-kb/scapability, a single''voice-grade'' channelon a digital or analog, publicswitched telephone network (PSTN) can transportacommentary-quality sound programoveranydistanceandyield a broadcast-grade voice programat thereceiving end.

The G.722-eoding algorithm is based on atwo-band subbandcoder,withADPCM codingofeachsubband (Figure 7).37,38 The low- and high-frequencysubbandsare quantized using6 and 2 bits per sample,respectively. The filterbanks that are used for analysisand synthesisproducea communication delay ofabout3 ms.This delayturns out to be a desirable featurebecauseofthe expectedinterconnections ofG.722 withnarrowband links. For these interconnections, uncanceledechoes could pose a problem, if compounded bycodecdelay. (Inisolation, digital wideband linksdo nothavetwo-wire/four-wire hybridsand the resultinguncanceled echoes.)

The 64-kb/s algorithm can toleraterandomerror

AT&T TECHNICAL JOURNAL- SEPTEMBER/OCTOBER 1990

~roo(/)

eno~

Excellent 5

Good 4

Fair 3

Poor 2

Research goalfor 7-kHz speech -0-------------- 1

o 240 kb/s PCM

• G.722, 64 kh/s SB-ADPCM

Figure 8. Quality of 7-kHz digital audio as a function of bit rate In the G.722 algorlthm. 3 7 Signalbandwidth is 7 kHz. The G.722 points are for thetwo-band subband coder (Figure 7), which usesADPCM coding for each subband. The PCMpoints are supplied for comparison and represent a 15-blt audio Input sampled at 16 kHz.Again, the research goal Is realistic.

Bad 1 L...-__....L- -'- ----'

48 56Bit rate (kb/s)

6437

ratesofabout 1 in 10,000 and fourtandemstages ofrepeated encoding and decoding. The simplicity ofthequantizing, predicting, and filtering (24-tap) algorithmspermits a single-chip, fixed-point implementation on theDSP16A processor.

ISDN applications suggest that the audio-codingalgorithms be operatedat slightly lowerbit rates. Here,the use ofan embedded coding techniqueforADPCM permits operation ofthe low-frequency subbandat one ofthree quantizing rates (i.e., 6,5,or 4 bits per sample),with graceful degradation ofquality. The correspondingaudio rates are 64, 56, and 48kb/s. For the 56- and48-kb/s rates, capacities of8 and 16kb/s are availablefor simultaneous data transmission overthe 64-kb/sbasic-rate channel.

Figure8 showsaudio quality on the MOS scaleforspeech and musicmaterial at rates of64, 56, and48 kb/s. For comparison, we alsoshowthe performanceof24D-kb/s linearPCM (i.e., a 15-bit audio inputthat was

sampled at 16kHz). In anothercomparison that involvesG.722, G.721, and 128-kb/s PCM (16 kHzx 8 bits persample), the G.722 algorithm at 64kb/s wasshown tohavean equivalent SNR gainof13dBoverthe G.721 algorithm.38,39 Ofthis SNR gain, 6 dBcanbe attributed toincreased inputbandwidth.

Figure8 alsoshows the current researchgoalfor the coding of7-kHz audio, a goalthat is believed tobe realistic. One implication ofthis goalis the possibilityofcoding 7-kHz audio at 32kb/s with high quality(MOS = 4.0). This will permitthe transmission oftwobilingual or stereo wideband channelsat 64kb/s. Forstereo, the use ofcross-ehannel correlations canprovidea further increaseofcapability. For example, a bandwidth greater than 7 kHzcould be accommodated inthe 64-kb/s system.

There are at least two approaches to the problemofhigh-quality coding ofaudio at 32kb/ s:- Linear-prediction approach, exemplified by CELP


60

16

Just-noticeabledistortion

128Frequency (kHz)

440 L....-__--L -'----__---''-----__---'-- .L..-__--L. .....

o

Figure 9. Threshold ofjust-noticeable distortion as a function offrequency for an illustrative audio signal (atrumpet).40 Researchon perception willplay an increasingrole by enhancing ourunderstanding of howto mask noise, especially In the timedomain.

38

- Frequency-domain techniqueoftransform or subbandcoding.

For both, the attainment ofhigh audioquality willdependon the use ofperceptual tuning ofthe algorithmto provide effective shapingofthe quantization noise.The CELP techniqueoffers the additional possibility oflow-delay codingthrough backward adaptation, as illustrated in Figure 5b.

Coding of 2G-kHz Audio. Although a bandwidthof7 kHzprovides verynatural reproduction ofspeech,20kHzis a well-accepted bandwidth standardfor moregeneral classes ofaudio, including vocal and instrumental music.

The ISO is committed to the standardization (inthe 1990 to 1991 timeframe) ofa low-bit-rate coding algorithmfor 2G-kHz audio. Applications for low-bit-rate wideband speech include electronic publishing, travel andguidance, teleteaching, multilocation games,multimediamemoranda, and databasestorage.Anothermajor application for 2G-kHz digital audio is in advanced television

systems, such as high-definition television (HDTV).On current compact disks (CDs), audio is stored

at a bit rate ofabout700 kb/s per soundchannel (i.e.,16-bit PCM codingof44.1-kHz sampled signals). However,the emergingISO standardcallsfor a single-channel bitrate of96or 128 kb/s. Production ofhigh-quality audio atthese verylowrates callsfor a newgeneration ofcodingalgorithms. These algorithms will achieve coding gainsby removing signalredundancy. But these gains mustbeaugmented by the liberties permittedby the humanauditory process,as predicted by sophisticated models ofjust-noticeable distortion (Figures 9 and 10).40.41

Future TrendsSophisticated algorithms forcoding will lead

to transmission techniquesthat do not permitquantization noiseto limit speechquality. In addition, thenotion ofenhancingspeechquality by usinggreaterinputbandwidth will becomemorepervasive. Codingsystems in the 8-to 64-kb/s rangewill thus provide


Leftsignal

Rightsignal

Frequencyanalysis

Frequencyanalysis

AnalysisFilter bank

Quantization

Quantization

Noiselesscoding

Noiselesscoding

Bit-streamcontrol

Bit-streamcontrol

Channel

-- Optional (stereo coding)

39graceful flexibilities in terms ofselected bandwidth andspecial features, such as stereo separation in teleconferencing. Advances in coding will be supported by newtechnologies forwideband transducers, noise-cancelingsystems foraudio pickup, and autodirective microphonearrays. 42,43

Ascoding algorithms become increasinglyefficient and approach fundamental capabilities, researchonperception will play an increasing rolebyenhancingourunderstanding ofnoisemasking, especially in thetime domain.

Advances in signal-processor technology willcontinue to supportincreasingly complex algorithms forcoding and decoding. The synergistic working ofcodingtheory, perception science, and signal processing willbringsophisticated speechtechnology to the humanlistener in affordable forms.

AcknowledgmentWethank the following colleagues for reviewing

anearlierversion ofthis paper: B.S.Atal, K H.Branden-

Figure 10. Perceptual coding of wldeband audio; blockdiagram of a perceptual frequency-domain coder. 41 Thedashed lines identify an option for using left-right channelcorrelations to increase efficiency in stereo coding. ForhlglHluallty audio at low bit rates, the liberties permitted byhuman perception must augment the coding gains achievedfrom removing signal redundancy.

burg,].-H. Chen, R V. Cox,]. D. Johnston, W.B.Kleijn,D.]. Krasinski, P. Noll, M. H.Sherif, andY. Shoham.

References1. W. R Daumer, "Subjective Evaluation ofSeveral Efficient Speech

Coders:' IEEE Transactions onCommunications, Vol. 30,No.4,April 1982, pp.655-662.

2. N.S.Iayantand P. Noll, Digital Coding ofWaveforms: PrinciplesandApplications to Speech andVideo, Prentice Hall, EnglewoodCliffs, New Jersey,1984.

3. W.D.Voiers, "Diagnostic Evaluation ofSpeech Intelligibility:'Speech Intelligibility andSpeaker Recognition, M.E. Hawley (ed.),Dowden Hutchinson Ross, Stroudsburg, Pennsylvania, 1977.

4. W.D.Voiers, "Diagnostic Acceptability MeasureforSpeech Communication Systems," ICASSP '77, IEEE International Conference

AT&TTECHNICAL JOURNAL • SEPTEMBER/OcroBER 1990

40

onAcoustics, Speech, andSignal Processing, Hartford, Connecticut,May9 to 11,1977, IEEE,NewYork, May1977, pp. 204-207.

5. B.S.Ataland M.R Schroeder,"Stochastic CodingofSpeechatVery Low BitRates," Links forthe Future: Science, Systems andServices forCommunications, Proceedings ofthe International Conference on Communications, Amsterdam, The Netherlands, P. Dewildeand C.A May (eds.), North-Holland, NewYork, May1984,pp. 1610-1613.

6. B.S.Atal, "High-quality speech at lowbit rates: Multi-pulse and stochastically excitedlinearpredictive coders,"ICASSP '86, Proceedings ofIEEE International Conference onAcoustics, Speech, andSignalProcessing, Tokyo, Japan,April 7 to 11,1986, Vol. III,IEEE, NewYork, April 1986, 1986, pp. 1681-1684.

7. R V.Cox,W. B.Kleijn, and P. Kroon, "RobustCELP CodersforNoisy Backgrounds and Noisy Channels," ICASSP '89, Proceedingsofthe International Conference onAcoustics, Speech, andSignal Processing, Glasgow, Scotland, May23to 26,1989, Vol. II, IEEE, NewYork, May1989, pp. 739-742.

8. N. S.Iayant andJ-H. Chen,"SpeechCodingwithTime-Varying BitAllocations to Excitation and LPC Parameters,"ICASSP '89,Proceedings ofthe International Conference onAcoustics, Speech,andSignal Processing, Glasgow, Scotland, May23to 26,1989,Vol. I, IEEE, NewYork, May1989, pp. 65-68.

9. W.B. Kleijn, D.J Krasinski, and R. H.Ketchum, "Improved SpeechQuality and Efficient VectorQuantization in SELP," ICASSP '88,Proceedings ofthe International Conference onAcoustics, Speech,andSignal Processing, New York, April 11to 14,1988, Vol. I, IEEE,NewYork, April 1988, pp. 155-158.

10. P. Kroonand B.S.Atal, "PitchPredictorswithHighTemporalResolution," ICASSP '90, Proceedings ofthe International Conference onAcoustics, Speech, andSignal Processing, Albuquerque, NewMexico, April 3 to 6,1990,Vol. II, IEEE, NewYork, April 1990,pp. 661-664.

11. Y.Shoham,"Constrained Excitation CELP Coding," IEEEWorkshop on Speech Coding for Telecommunications, Vancouver,BritishColumbia, Canada, September1989, p. 65.

12. Telecommunications: Analog toDigital Conversion ofRadio Voice by4800bit/sec. Code Excited Linear Prediction (CELF), FED-STD1016, SecondDraft, Office ofTechnology and Standards,NationalCommunications System,Washington, D.C., November 1989.

13. D. P. Kemp, R. A Sueda,andT. E.Tremain,"AnEvaluation of4800 bps Voice Coders," ICASSP '89,Proceedings ofthe International Conference onAcoustics, Speech, andSignal Processing, Glasgow,Scotland, May23to 26,1989, Vol. I, IEEE, NewYork, May1989, pp. 200-203.

14. V.C.Welch, T. E.Tremain,and J P. Campbell, "AComparison ofU.S. Government StandardVoice Coders," MILCOM 89, Bridgingthe Gap: Interoperabiliiy, Survivability, Security, Conference record,IEEEMilitary Communications Conference, Boston, Massachu-

AT&T TECHNICAL JOURNAL.SEPTEMBER/OCTOBER 1990

setts, October15to 18,1989, Vol. 1,IEEE, NewYork, October1989, pp.269-273.

15. T. E.Tremain, 'The Government StandardLinear Predictive CodingAlgorithm: LPC-lO," Speech Technology, Vol. 1,No.2,April 1982, pp.4G-49.

16. B.S.Atal and M. R. Schroeder,"Adaptive Predictive Coding ofSpeechSignals," The Bell System Technical journal, Vol. 49,No.8,October1970, pp. 1973-1986.

17. Telecommunications: Analog toDigital Conversion ofVoice by 2,400Bit/sec. Linear Predictive Coding, FED-STD-1015, Office ofTechno1ogyand Standards,National Communications System, Washington,D.C., March 1983.

18. G.S.Kangand S. S. Everett, "Improvement ofthe ExcitationSourcein the Narrow-Band Linear Prediction Vocoder,' IEEETransactions onAcoustics, Speech andSignal Processing, Vol. ASSP33,No.2, April 1985, pp. 317-386.

19. G.S. Kangand S.S. Everett, "Improvement ofthe NarrowbandLinear Predictive Coder,Part 2:Synthesis Improvements,"Report8799, Naval Research Laboratory, Washington, D.C.,June 1984.

20. J. R. B.De Marca,N. Farvardin, N.S.Iayant,andy. Shoham,"RobustVectorQuantization for Noisy Channels," Proceedings ofthe M-SAT Conference, Jet Propulsion Laboratories, Pasadena, California, May1988, pp.515-520.

21. E. S. K Chien,D.J Goodman andJ E. Russell, "Cellular AccessDigital Network (CADN): WirelessAccessto Networks oftheFuture,"IEEE Communications Magazine, Vol. 25,No.6,June 1987, pp.22-27.

22. 1. A Gersonand M.A.Jasiuk,"Vector SumExcited Linear Prediction (VSELP)," IEEE Workshop on Speech Coding forTelecommunications, Vancouver, BritishColumbia, Canada, September1989,pp.6tHi9.

23. P. Kroon, E. F. Deprettere,and R. J. Sluyter, "Regular-PulseExcitation-A Novel Approach to Effective and Efficient MultipulseCodingofSpeech," IEEE Transactions onAcoustics, Speech andSignal Processing, Vol. ASSP-34, No.5, October1986, pp. 1054-1063.

24. P.Vary, K Hellwig, R. Hofmann, R. J. Sluyter, C.Galand, andM. Rosso, "SpeechCodecfor the European Mobile Radio System,"ICASSP '88, Proceedings ofthe International Conference onAcoustics, Speech, andSignal Processing, NewYork, April 11to 14,1988,Vol. I, IEEE, NewYork, April 1988, pp.227-230.

25. M.Taka and X. Maitre,"ccm Standardizing Activities on Speechcoding," ICASSP '86, Proceedings ofIEEE International ConferenceonAcoustics, Speech, andSignal Processing, Tokyo, Japan,April 7 to11,1986, Vol. II, IEEE, NewYork, April 1986, pp.817-820.

26. "G.721-32 kbits/s Adaptive Differential PulseCodeModulation(ADPCM)," ReportR57, Part II, CCITT StudyGroupXVIII, IXthPlenaryAssembly, Melbourne, Australia, 1988.

27. "Recommendation G.727-5, 4, 3, 2 bit per Sample Embedded

ADPCM," CCIIT StudyGroup XV, 1990.28. "Recommendation G.764-Packet Voice," CCIIT StudyGroup

XVIII, 1990.29. M.H.Sherif, D.O. Bowker, G.Bertocci, B.A.Orford, and G.A.

Mariano, "Overview ofCCIIT/ANSI Embedded ADPCM Algorithms," ICC '90, IEEE International Conference on Communications" Atlanta, Georgia, April 16to 19, 1990, IEEECommunicationsSociety, NewYork, April 1990, pp. 1014-1018.

30. M.K Verma, D. Prezas,T. L. Russell, M.H.Sherif, and R. Thorkildsen, "Novel Applications ofSpeechProcessinginAT&T NetworkSystemsProducts," AT&TTechnicaljournal, Vol. 69, No.5,September/October1990, pp.77-86.

31. V. Ramamoorthy, N.S.Jayant, R. V. Cox, and M.M.Sondhi,"Enhancement ofADPCM SpeechCoding withBackward-AdaptiveAlgorithms for Postfiltering and NoiseFeedback," IEEE journal onSelected Areas in Communications, Vol. 6, No.2, February1988,pp.364-382.

32. D.C.Cox, "Portable Digital Radio Communications-AnApproachtoTetherlessAccess," IEEE Communications Magazine, Vol. 27,No.7, July 1989, pp.3Q-40.

33. R. Steele,'The Cellular Environment ofLightweight HandheldPortables," IEEE Communications Magazine, Vol. 27, No.7, July1989, pp.20-29.

34. J-H. Chen,"ARobust Low Delay CELP SpeechCoderat 16kbps,'Communications Technology forthe 19905 andBeyond, Globecom'89, 8th IEEEGlobal Telecommunications Conference, Dallas,Texas,November 27to 30, 1989, IEEE, NewYork, 1989,pp.3411-3415.

35. J.-H. Chen,"HighQuality 16kbps speechcodingwitha one-waydelayless than 2 ms,"ICASSP '90, Proceedings ofthe InternationalConference onAcoustics, Speech, andSignal Processing, Albuquerque,NewMexico, April 3 to 6, 1990, Vol. I, IEEE, NewYork,April 1990, pp.453-456.

36. T. Irmer,"An IdeaTurns Intoa Reality-CCITI Activities on theWayto ISDN," IEEE journal onSelected Areas in Communications,May1986, pp.316-319.

37. P. Mermelstein, "G.722, ANewCCIIT Coding Standardfor DigitalTransmission ofWideband Audio Signals," IEEE CommunicationsMagazine, Vol. 26,No. I, January 1988, pp.8-15.

38. "G.722-7 kHzAudio CodingWithin 64kbits/s,' Report R57,Part II,CCIIT StudyGroupXVIII, IXth Plenary Assembly,

Melbourne, Australia, 1988.39. G.Modena, A.Coleman, P. Usai, and P. Coverdale, "Subjective per

formance evaluation of the 7 kHzAudio Coder," CSELT TechnicalReport (Centro Studie Laboratori Telecommunicazion, Turino,Italy), Vol. 15, No.2, March 1987, pp.171-176.

40. J. D.Johnston,'TransformCoding ofAudio Signals Using Perceptual Noise Criteria," IEEE journal onSelected Areas in Communications, February1988, pp.314-323.

41. J. D.Johnstonand K H.Brandenburg, "Sound Coding Algorithm,"MPEG-891-148, Report ofISO-IEC/JTCI/SC2/WG8 committeemeeting, Stockholm, Sweden, June 1989.

42. J L. Flanagan, J D.Johnston,R. Zahn, and G.Elko, "Computersteeredmicrophone arraysfor soundtransduction in largerooms,"journal ofthe Acoustical Society ofAmerica, Vol. 78, No.5,November 1985, pp.1508-1518.

43. M.M.Sondhi and G.W.Elko, "Adaptive Optimization ofMicrophoneArraysunder a Nonlinear Constraint," ICASSP '86, Proceedings ofIEEE International Conference onAcoustics, Speech, andSignalProcessing,Tokyo, Japan,April 7 to 11, 1986, Vol. II, IEEE, NewYork, April 1986, pp. 19.9.1-19.9.4.

Biographies (continued)has a B.Sc. in electrical engineering from London University(England), a D.I.C. from Imperial College of Science and Technology (London, England), and a Ph.D. in electrical engineering from London University. Mr. Prezas was responsible fordevelopment of speech coding and automatic speech recognition technologies for intelligent network applications andsecure voice equipment. He joined the company in 1979 andhad a B.S. in physics from the University of Athens, Greece,and an M.S. and Ph.D. in electrical engineering from the illinois Institute of Technology in Chicago.

(Manuscript received june 14, 1990)

AT&T TECHNICALJOURNAL.SEPTEMBER/OcroBER1990

41

Coding of Speech and Wideband Audio - The Telecom Archive

Documents

Transcript of Coding of Speech and Wideband Audio - The Telecom Archive