Estimation and Evaluation of Reduced Length Equalization Filters for Binaural Sound Reproduction

8
Estimation and Evaluation of Reduced Length Equalization Filters for Binaural Sound Reproduction Esben Theill Christiansen, Jakob Sandholt Klemmensen, Michael Mørkeberg Løngaa, Daniel Klokmose Nielsen, Christian Have Pedersen, Andreas Popp, and Søren Birk Sørensen Group 741, Aalborg University, 2004 Abstract— The objective of this study was to estimate two equalization filters, which flatten the frequency response of the reproduction chain used for binaural sound reproduction. These filters should be of a lower order than the reciprocal of the headPhone Transfer Function (PTF) - the optimum filter. The PTFs were obtained from measurements on two headphones (Beyerdynamic DT990 Pro (DT990) and Monacor MD-300 (MD300)). These measurements were performed using the Maximum Length Sequence System Analyzer. Three different model structures and four different estimation methods were used to estimate the parameters of the equalization filters. The chosen models were: ARX, ARMAX and OE; the parameter estimation methods chosen were: PEM-LS, PEM-WLS, PEM-RLS and Steiglitz-McBride. The estimated filters for the DT990 were further evaluated by conducting a 3 Alternatives Forced Choice listening test. The four estimated equalization filters had orders that were significantly lower than the optimum equalization filter. For the DT990 the order was reduced to 24 and 40; for the MD300 the order was reduced to 35 and 48. Both filters for the DT990 were found using the ARMAX model and the PEM-RLS estimation method. The filters for the MD300 were found using the OE model and the Steiglitz-McBride estimation method. The listening test conducted showed no audible difference. Index Terms— Headphone Transfer Function, Headphone Equalization, Binaural Sound Reproduction. I. I NTRODUCTION T HE use of 3D sound technology is gaining ground on both the consumer market and in the industry. Recently, the gaming industry has started to implement 3D sound via headphones in their newest computer games. This is used to provide an ability to locate events not only by the visual appearance but also by the sense of hearing. The main idea of 3D sound is to reproduce sound with respect to spatial features, which are information about the direction of sound created by the direct sound and its reflec- tions from the surrounding environment. The spatial features are extracted by the human brain by using several clues e.g. coloration of sound and differences between the two ears. These differences can occur in time, phase, and amplitude [1]. In order to reproduce the spatial features, recordings must be performed with two microphones placed in the ears of a person or on an artificial head. Reproducing these sounds correctly at the eardrum of a listener is referred to as Binaural Technique. Optimum reproduction of spatial features is established with a flat frequency response, measured for the whole transmission chain. The transmission chain is depicted in figure 1 and consists of the two recording microphones, an amplifier, an equalizer, and the headPhone Transfer Function (PTF). The PTF is the transfer function measured from the terminals on the headphone to the entrance of the blocked ear canal. The transfer function (TF) is primarily influenced by the anatomical shape of the listeners ear and the electro-acoustical properties of the headphone. Amplifier Equalizer PTF Fig. 1. The transmission chain in a binaural reproduction system. The reproduction system consists of the amplifier, the equalizer, and the PTF and is also referred to as the reproduction chain. The equalizer is intended to compensate for the influences of the transmission chain, so that a flat frequency response is obtained. This is accomplished by determining the TF for the transmission chain. The TF is then inverted, and in this study it is referred to as the optimum equalization filter. In the transmission chain, it can be assumed that the only variable part is the PTF. The microphones are only used during the recording and will therefore be the same every time. The amplifier is not necessarily the same during recording and playback but is assumed to be flat in the audible frequency range. In addition, Møller et al. [2] states that individual equal- ization for each test person gives the best performance of binaural reproduction. However, it is noted that equalization based on an average of the PTF for each headphone may give acceptable results. Toft et al. [3] supports this approach by confirming, that it is possible to achieve good results with an average equalization of each headphone model. Toft et al. also concludes that it is not possible to create an applicable general equalization filter that covers different headphone models. The optimum equalizing filter in this study is therefore based on the average PTF of a headphone model. Equalization filters can be implemented between e.g. an amplifier and the headphone. This filter ensures a flat fre- quency response for the reproduction chain. This knowledge can be used in general synthesis of binaural signals without prior information about the audio playback equipment used. Implementation of equalization filters can be done digitally and it is desirable to construct this filter with a filter order that is as low as possible due to computational limitations. Reduction of filter order (i.e. length of digital filters) is investigated by Nielsen et al. [4], and it is concluded that it is possible to design lower order filters with satisfactory performance. However, it is not investigated whether there is

Transcript of Estimation and Evaluation of Reduced Length Equalization Filters for Binaural Sound Reproduction

Estimation and Evaluation of Reduced Length EqualizationFilters for Binaural Sound Reproduction

Esben Theill Christiansen, Jakob Sandholt Klemmensen, Michael Mørkeberg Løngaa,Daniel Klokmose Nielsen, Christian Have Pedersen, AndreasPopp, and Søren Birk Sørensen

Group 741, Aalborg University, 2004

Abstract— The objective of this study was to estimate twoequalization filters, which flatten the frequency response ofthe reproduction chain used for binaural sound reproduction.These filters should be of a lower order than the reciprocalof the headPhone Transfer Function (PTF) - the optimumfilter. The PTFs were obtained from measurements on twoheadphones (Beyerdynamic DT990 Pro (DT990) and MonacorMD-300 (MD300)). These measurements were performed usingthe Maximum Length Sequence System Analyzer. Three differentmodel structures and four different estimation methods were usedto estimate the parameters of the equalization filters. The chosenmodels were: ARX, ARMAX and OE; the parameter estimationmethods chosen were: PEM-LS, PEM-WLS, PEM-RLS andSteiglitz-McBride. The estimated filters for the DT990 werefurther evaluated by conducting a 3 Alternatives Forced Choicelistening test. The four estimated equalization filters hadordersthat were significantly lower than the optimum equalizationfilter.For the DT990 the order was reduced to 24 and 40; for theMD300 the order was reduced to 35 and 48. Both filters for theDT990 were found using the ARMAX model and the PEM-RLSestimation method. The filters for the MD300 were found usingthe OE model and the Steiglitz-McBride estimation method. Thelistening test conducted showed no audible difference.

Index Terms— Headphone Transfer Function, HeadphoneEqualization, Binaural Sound Reproduction.

I. I NTRODUCTION

T HE use of 3D sound technology is gaining ground onboth the consumer market and in the industry. Recently,

the gaming industry has started to implement 3D sound viaheadphones in their newest computer games. This is used toprovide an ability to locate events not only by the visualappearance but also by the sense of hearing.

The main idea of 3D sound is to reproduce sound withrespect to spatial features, which are information about thedirection of sound created by the direct sound and its reflec-tions from the surrounding environment. The spatial featuresare extracted by the human brain by using several clues e.g.coloration of sound and differences between the two ears.These differences can occur in time, phase, and amplitude [1].

In order to reproduce the spatial features, recordings mustbeperformed with two microphones placed in the ears of a personor on an artificial head. Reproducing these sounds correctlyatthe eardrum of a listener is referred to as Binaural Technique.

Optimum reproduction of spatial features is established witha flat frequency response, measured for the whole transmissionchain. The transmission chain is depicted in figure 1 andconsists of the two recording microphones, an amplifier, anequalizer, and the headPhone Transfer Function (PTF). ThePTF is the transfer function measured from the terminals

on the headphone to the entrance of the blocked ear canal.The transfer function (TF) is primarily influenced by theanatomical shape of the listeners ear and the electro-acousticalproperties of the headphone.

Amplifier Equalizer PTF

Fig. 1. The transmission chain in a binaural reproduction system. Thereproduction system consists of the amplifier, the equalizer, and the PTF andis also referred to as the reproduction chain.

The equalizer is intended to compensate for the influencesof the transmission chain, so that a flat frequency responseis obtained. This is accomplished by determining the TF forthe transmission chain. The TF is then inverted, and in thisstudy it is referred to as the optimum equalization filter. Inthetransmission chain, it can be assumed that the only variablepart is the PTF. The microphones are only used during therecording and will therefore be the same every time. Theamplifier is not necessarily the same during recording andplayback but is assumed to be flat in the audible frequencyrange.

In addition, Møller et al. [2] states that individual equal-ization for each test person gives the best performance ofbinaural reproduction. However, it is noted that equalizationbased on an average of the PTF for each headphone may giveacceptable results. Toft et al. [3] supports this approach byconfirming, that it is possible to achieve good results with anaverage equalization of each headphone model. Toft et al. alsoconcludes that it is not possible to create an applicable generalequalization filter that covers different headphone models. Theoptimum equalizing filter in this study is therefore based onthe average PTF of a headphone model.

Equalization filters can be implemented between e.g. anamplifier and the headphone. This filter ensures a flat fre-quency response for the reproduction chain. This knowledgecan be used in general synthesis of binaural signals withoutprior information about the audio playback equipment used.Implementation of equalization filters can be done digitallyand it is desirable to construct this filter with a filter orderthat is as low as possible due to computational limitations.

Reduction of filter order (i.e. length of digital filters) isinvestigated by Nielsen et al. [4], and it is concluded thatit is possible to design lower order filters with satisfactoryperformance. However, it is not investigated whether thereis

an audible difference between the optimum equalization filterand a lower order implementable filter.

The purpose of this study is to investigate if there is anaudible difference between the optimum equalizing filter ofthePTF and two lower order filters. These two filters are chosenamong several candidates found by parametric estimationmethods with regard to minimizing the order of the filters.These filters will be experimentally evaluated by conductinglistening tests on third party subjects.

II. M ETHODS

This section will describe the steps followed throughout thestudy, starting with the measurements of the PTFs, followedby the preprocessing of the data. Subsequently, a model isinferred and the parameters are estimated. Finally the listeningtest is described. This test verifies if audible differencesarepresent or not.

A. Acquisition of data

Preliminary measurements were carried out on 5 groupmembers (aged 23-28). The PTFs were not directly measuredbut were derived by preprocessing the measured impulse re-sponses. The impulse responses were measured on the blockedear canal by using the Maximum-Length Sequence SystemAnalyser (MLSSA), which for this purpose had the followingsettings:

• Sampling frequency:fs = 48 kHz• Antialiasing filter:8th Order Chebyshev,fc = 20 kHz• Sequence length: 4095 samples• 16 x concurrent preaveraging

The selected preaveraging improved the signal-to-noise ratioby 12 dB [5]. Measuring impulse responses with the MLSSAsystem required additional equipment; the setup is shown infigure 2. The measurement setup was placed so a minimumdistance of 1 m from the headphone to the floor, ceiling,table etc. was assured. The minimum distance implied thatreflections from the surroundings did not occur before 6 mshad passed.

The measurements took place in room B4-107 at the Depart-ment of Acoustics at Aalborg University. The microphone usedfor the measurements was a Sennheiser KE-4-221-2 electretmicrophone (assumed flat in the interval 100 Hz - 10 kHz [2]).This microphone was mounted in an earplug from EAR andplaced in line with the entrance of the ear canal of the subject.Two different headphones were used for the measurements:

1) Beyerdynamic DT990 Professional (DT990) with a fre-quency range from 5 Hz - 35 kHz, reported by themanufacturer.

2) Monacor MD-300 (MD300) with a frequency rangefrom 20 Hz - 18 kHz, reported by the manufacturer.

The headphones were set in place by the test person, mea-surements were repeated three times, and the headphones weretaken off and put back on again between each measurement.Both channels on the headphone were measured, but withthe DT990 the microphone was only placed in the left ear.The measurement on the right side was then carried out by

Behringer HA4400HeadphoneAmplifier

48 kHzClock generator

SystemMLSSA

Monacor MD-300Beyerdynamic DT990

SennheiserKE-4-211-2

Measuring amp.Type 2636

Bruel & Kjaer

MicrophonePower supply

and Preamplifier

Fig. 2. Measurement setup for measurements of the impulse response fromthe input terminals of the headphone to the blocked ear canal. The microphonewas mounted in an earplug and placed in the ear of the test subject.

putting the right cup on the left ear. This simplification of themeasurements was made, because it was assumed, that the earswere totally symmetrical. Putting the right cup on the left earwas not possible with the MD300; therefore, the microphonewas placed in the right ear for measurements of the right canalof the MD300.

After the measurements were performed, it was necessaryto preprocess the data to find the averaged PTF.

B. Preprocessing

Preprocessing was accomplished in several steps in order tofind the optimum equalization filter.

Before transforming the impulse responses into the fre-quency domain the responses were truncated to 256 samples,corresponding to the first 5.33 ms of the impulse response.This truncation gave a frequency resolution of 187.5 Hzand could be performed, assuming that the remaining partof the impulse response was due to reverberations from thesurroundings.

The truncated response was transformed into the frequencydomain by a 2048-point Fast Fourier Transform and formedthe PTF. This frequency spectrum contained components fromthe MLSSA system and the microphone amplifier. Thesecomponents were therefore removed from the PTF by findingthe TF for the MLSSA system and the amplifier, and followingdividing the PTF with this TF. The microphone was assumedflat, hence the PTF was only divided by the sensitivity of themicrophone.

Averaging of the PTFs was attained over both channels ona sound level basis [2]. Averaging over both channels arevalid due to the symmetrical properties of the ear and theheadphone.

TABLE I

MODEL STRUCTURES AND THEIR CORRESPONDING POLYNOMIALS. IN THE RIGHTMOST COLUMN THE EVALUATED PARAMETRIC ESTIMATIONMETHODS

ARE PRESENTED FOR EACH MODEL.

Structure Description A(q) B(q) C(q) F(q) Estimation method(s)ARX AutoRegressive with eXogenous input 6= 1 6= 0 1 1 PEM-LS, PEM-WLS

ARMAX AutoRegressive Moving Average with eXogenous Input 6= 1 6= 0 6= 1 1 PEM-RLSOE Output Error 1 6= 0 1 6= 1 STMCB, PEM-RLS

The optimum equalization filter was found by invertingthe averaged PTF. Before inverting the PTF it was separatedinto a minimum phase part and an all-pass part. The all-pass part was excluded as proposed by Minnaar [6]. TheTF of the equalization filter was then found by inverting theminimum phase part of the averaged PTF, this assured that theequalization filter was stable [7].

The PTF is naturally damped at low and high frequencies,thus the equalization filter will have the opposite effect andamplify signals at these frequencies. The equalization filterwas therefore bandwidth limited with a Butterworth bandpassfilter in order to protect the headphone.

C. Parametric models

The obtained optimum equalization filter will be of highorder. The primary aim of this study was to estimate reducedlength versions of this filter. A model based approach waschosen for this purpose. Thus the optimum equalization filterwas fitted to a set of parametric models using several parameterestimation methods. The general model structure is [8]:

A(q)y(t) =B(q)

F (q)u(t) + C(q)e(t), (1)

wherey(t) is the output,u(t) is the input, ande(t) is the noisein the system represented as a zero-mean Gaussian process.A(q), B(q), C(q), andF (q) are all polynomials ofq, whereq

is a time shift operator. These polynomials are

A(q) = 1 + a1q−1 + . . . + ana

q−na

B(q) = 0 + b1q−1 + . . . + bnb

q−nb

C(q) = 1 + c1q−1 + . . . + cnc

q−nc

F (q) = 1 + f1q−1 + . . . + fnf

q−nf ,

wherena, nb, nc, andnf are the lengths ofA(q), B(q), C(q),andF (q) respectively.

From the general model structure three combinations werechosen; table I lists these combinations. The ARX was chosendue to its linear properties; hence, a simple linear regressioncould be obtained. The ARX model is described as,

A(q)y(t) = B(q)u(t) + e(t), (2)

The ARMAX model was selected for its additional modelingof the noise,

A(q)y(t) = B(q)u(t) + C(q)e(t). (3)

Finally the OE model was chosen. It separately models thedynamics of the system without using parameters on the noise

model,

y(t) =B(q)

F (q)u(t) + e(t). (4)

The ARMAX and OE model cause nonlinearity in the polyno-mial coefficients, which gives a more complicated nonlinearregression. However, the ability to handle fluctuations in thefrequency response is improved.

D. Parameter Estimation

The polynomials of the selected models were estimated bythe following set of estimation methods [8]:

• Prediction Error Method (PEM)-Least Squares (PEM-LS)• PEM-Weighted Least Squares (PEM-WLS)• PEM-Recursive Least Squares (PEM-RLS)• Steiglitz-McBride Method (STMCB)

The PEM was the general principle in the estimation ofall models. PEM consists of an estimation by minimizing thesum of squared prediction errors, denoted as the performancefunction:

VN (θ) =1

2N

N−1∑

t=0

(y(t) − y(t, θ))2 , (5)

whereθ is a parameter vector that comprises the polynomialcoefficients,y(t, θ) is the one-step predictor, andN is thesample size. Finding the parameters by minimization of (5),with respect toθ is the principle of the Prediction ErrorMethod,

θo = arg minθ

V (θ, ZN ), (6)

where θo is the optimum parameter vector for the data set,ZN .

The PEM-LS was used for the ARX model. The minimizingproblem in (6) could be solved analytically by linear regres-sion. Hence the solution for PEM-LS was given as:

θLS =

[

N∑

t=1

ϕ(t)ϕ(t)T

]−1N

t=1

ϕ(t)y(t), (7)

whereθLS is the parameter vector, andϕ(t) is the associateddata vector. The format of the parameter vector,θLS, is,

θLS = [a1, . . . , anA, b1, . . . , bnB

]T ,

and the format of the data vector,ϕ(t), is,

ϕ(t) = [−y(t − 1), . . . ,−y(t − nA),

u(t − 1), . . . , u(t − nB)]T .

The PEM-WLS method was also used for the ARX model.The method was similar to PEM-LS, but introduced timeweighting of the prediction errors. It was then investigatedif time weighting performed better than the PEM-LS method.PEM-WLS has the solution,

θWLS =

[

N∑

t=1

β(N, t)ϕ(t)ϕT (t)

]−1N

t=1

β(N, t)ϕ(t)y(t), (8)

whereβ is the weighting function, implemented asβ(N, t) =λN−t, with 0 < λ < 1. It is seen that the PEM-LS solution isfound whenλ = 1.

The PEM-RLS method was used for the ARMAX and theOE models. Since these models have nonlinear properties, anumerical approach to the minimization problem was chosen.The minimization method was implemented as a quasi-Newtonalgorithm that was obtained by using the Levenberg-Marquardtsearch direction [9]. Hence, the parameter vector was esti-mated recursively as:

θ(k+1)RLS = θ

(k)RLS − R−1 ∂

∂θV (θ

(k)RLS) (9)

R =∂2

∂2θV (θ

(k)RLS) + δI,

wherek denotes thekth iteration,R the search direction andδa scalar adjusted iteratively. The initial values ofθ were foundusing a covariance method, the Prony estimate [10].

The STMCB method was evaluated as an alternative pa-rameter estimation method for the OE model. This methodapproximates the OE model to an ARX model in order tofind the polynomials of the OE model. The STMCB methodwas initialized by estimatingF (q) with a Prony estimate [10].The estimated polynomials were denotedF (i)(q) andB(i)(q)respectively. The STMCB method was implemented in thefollowing steps:

1: Prefilter the data withF (i)(q),

yF (t) =1

F (i)(q)y(t) uF (t) =

1

F (i)(q)u(t).

2: By linear regression (7) solve:

F (i+1)(q)yF (t) = B(i+1)(q)uF (t) + e(t)

wherei denotes theith iteration. Step 1 and 2 were repeateduntil the parameters converged. If convergence was not ob-tained after a maximum number of iterations the parameterswere rejected. As the PEM-RLS requires computations of thederivatives, the STMCB is a more computationally efficientmethod to estimate the parameters of the OE model.

1) Selection of equalization filters:Two equalization filterswere selected among the investigated models and methods.The investigated models were applied to the optimum equal-ization filter, and an iterative search included all combinationsof polynomial orders up to 65 for each model and estimationmethod. The search resulted in several candidates for equaliza-tion filters. Among these candidates, the frequency deviationfrom the optimum frequency response, was the main selectioncriteria. The frequency deviation was calculated by:

N(f) =|H(f)|

|H(f)|, (10)

whereH(f) was the estimate of the optimum equalizing filterH(f). A frequency deviation of maximum±1 dB from a flatfrequency response is not audible, according to Moore [11].This limit was applied to the first of the two filters, and thelowest order filter that satisfied the stated criteria, within thefrequency band from 50 Hz to 20 kHz, was selected for thelistening test. A lower filter order of the estimated filter wasexpected when increasing the maximum allowed deviation.Therefore, a different deviation limit of±2 dB was chosenfor the second filter. However, this filter should not result inaudible differences either, also stated by Moore. Hence, thefilters were selected from the following criteria:DT990:

Filter 1 |N(f)| ≤ 1 dB, 50 Hz ≤ f ≤ 20 kHz (11)

Filter 2 |N(f)| ≤ 2 dB, 50 Hz ≤ f ≤ 20 kHz, (12)

MD300:

Filter 3 |N(f)| ≤ 1 dB, 50 Hz ≤ f ≤ 20 kHz (13)

Filter 4 |N(f)| ≤ 2 dB, 50 Hz ≤ f ≤ 20 kHz, (14)

and were evaluated in the listening test.

E. Listening test

The listening test was conducted to investigate if there wasan audible difference between the optimum equalizing filterand two lower order filters. The filters were only tested forthe DT990.

The test was carried out on 6 subjects, 5 males and 1female (aged 23-31). All the test subjects were chosen amongstudents. To ensure that the test subjects had a normal hearingan audiometry test was made. Test subjects with a hearing lossgreater than 20 dB HL (Hearing Level) were rejected [11].

The room used for the test was a small cabin designedprimarily for listening experiments. The test took place inroom B5-104 at the Department of Acoustics at AalborgUniversity. The equipment used for the test was:

• Audiometer - Madsen Electronics Orbiter 922, Version 2• 4-channel Headphone Amplifier - Behringer HA 4400• Cd-player - Marantz Compact Disc player CD-32• Headphone - Beyerdynamic DT 990 Professional

The used test method was a 3 Alternatives Forced Choice(3AFC) test. The test subject was presented with 3 stimuli -2 were identical and 1 was different. The test subject shouldthen choose the stimulus which differed from the others.

The used stimuli were generated from 2 sound sourcesplayed from 3 different directions giving a total of 6 sounds.The directions were artificially created by convolution ofthe sound sources with the Head Related Transfer Functions(HRTF) for the given directions. The HRTF’s were providedby the Department of Acoustics at Aalborg University. The 6sounds are listed below.

• Pink noise{−44◦, 0◦, 44◦} azimuth - 0◦ elevation• Music {−44◦, 0◦, 44◦} azimuth - 0◦ elevation

When one sound was presented a total of 24 stimuli weregiven resulting in 8 answers from the test subject. Six ofthese answers were from the comparison between the optimum

0.1 1 10−40

−30

−20

−10

0

10

20

Averaged PTF for DT 990

Frequency [kHz]

dB

re.

1P

a/V

0.1 1 10−40

−30

−20

−10

0

10

20

Averaged PTF for MD-300

Frequency [kHz]

dB

re.

1P

a/V

Fig. 3. Average across the PTFs for the DT990 and MD300. The grey lines represent the individual PTFs. The black lines represent the average PTF.

equalizing filter and the lower order filter; two were from thecomparison between the optimum equalizing filter and a lowpass filtered version of the sound. The low pass filtered soundswere added to keep the test subject motivated.

Two seperate tests were conducted, one for each of the lowerorder filters. For each test a total of 288 answers were given.Of these, 72 were not included in the test statistics; this wasdone to ensure that the answers given for the lowpass filteredsounds had no influence on the test outcome.

III. R ESULTS

In this section the obtained results will be presented. Firstthe obtained PTFs are presented, then the parameter modelsand the estimation methods are examined. Finally, the resultsof the listening test are presented.

A. Headphone transfer functions (PTFs)

The obtained PTFs and the averaged PTFs for DT990and MD300 are shown in figure 3. The figure shows thatthe individual PTFs follow the average PTF for the lowerfrequencies up to approximately 7 kHz for the DT990 and6 kHz for the MD300. For higher frequencies, the individualPTFs vary significantly. This corresponds to the measurementsperformed by Møller et al. [2].

B. Parameter estimation results

The estimated parameter models were examined for bothheadphones. This was done in order to find filters that fulfilledthe conditions previously listed in (11) to (14).

1) DT990: The parameter estimation for DT990 with theARX model did not produce any results with a frequencydeviation less than±1 dB. Neither did any of the estima-tion methods for the OE model and therefore these modelswere rejected. Only the ARMAX model using the PEM-RLSmethod produced candidates with a frequency deviation lessthan±1 dB. For a frequency deviation less than±2 dB all theestimation methods produced valid results. The lowest order

TABLE II

LOWEST FILTER ORDER FOUND FORDT990

Filter Model Estimation Order Errormethod [nA nB nC]

1 ARMAX PEM-RLS [16 24 14] ±1 dB2 ARMAX PEM-RLS [2 22 0] ±2 dB

filter was found using the PEM-RLS method on the ARMAXmodel. The order of the selected filters are listed. in table II

The first filter selected for DT990 is compared to theoptimum equalization filter in figure 4. In figure 4(a) afrequency and phase deviation plot is depicted. Figure 4(b)shows the impulse response plot for both Filter 1 and theoptimum equalization filter for DT990. This plot shows novisible difference between the two responses.

The comparison of Filter 2 and the optimum equalizationfilter is shown in figure 5. In figure 5(a) a frequency andphase deviation plot is depicted. It is seen that even thoughthe frequency deviation limit was set to±2 dB, the lowestorder filter only had a maximum deviation of±1.7 dB.Figure 5(b) shows that deviations in the impulse responseappeared between Filter 2 and the optimum equalization filter.The deviations were present after approximately 25 sampleshad passed and were caused by the frequency deviations inthe lower frequencies.

2) MD300: The examination of the estimation methodsgave considerably different results for the MD300 headphone.The results showed, that only the OE model produced valid

TABLE III

LOWEST FILTER ORDER FOUND FORMD300

Filter Model Estimation Order Errormethod [nB nF]

3 OE STMCB [15 33] ±1 dB4 OE STMCB [17 18] ±2 dB

0.1 1 10

0

200

400

600

0.1 1 10

−5

−101

5

An

gle

[deg

.]

Frequency [kHz]

6 N(f) = 6 H(f) − 6 H(f)

Am

plit

ud

e[d

B]

Frequency [kHz]

DT990 Error plot: ARMAX estimated with PEM-RLS, order [16 24 14]

N(f) = |H(f)|

|H(f)|

1 dB Margin

(a) Frequency deviation and phase error plots for Filter 1

0 10 20 30 40 50 60 70 80 90 100

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Sample

Am

plit

ud

e[.

]

DT990: ARMAX estimated with PEM-RLS, order [16 24 14]

h(n)

h(n)

(b) Impulse response of Filter 1 and the optimum filterresponse

Fig. 4. Filter 1: Equalization filter for DT990 estimated with an ARMAX model with a PEM-RLS method. The filter order was [1624 14].

0.1 1 10

0

200

400

600

0.1 1 10

−5

−2

0

2

5

An

gle

[deg

.]

Frequency [kHz]

6 N(f) = 6 H(f) − 6 H(f)

Am

plit

ud

e[d

B]

Frequency [kHz]

DT990 Error plot: ARMAX estimated with PEM-RLS, order [2 22 0 ]

N(f) = |H(f)|

|H(f)|

2 dB Margin

(a) Frequency deviation and phase error plots for Filter 2

0 10 20 30 40 50 60 70 80 90 100

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Sample

Am

plit

ud

e[.

]

DT990: ARMAX estimated with PEM-RLS, order [2 22 0]

h(n)

h(n)

(b) Impulse response of Filter 2 and the optimum filterresponse

Fig. 5. Filter 2: Equalization filter for DT990 estimated with an ARMAX model with a PEM-RLS method. The filter order was [2 22 0].

estimation results for Filter 3 and Filter 4. Therefore, theARXand ARMAX models were rejected. In table III. the order ofthe selected filters for the MD300 are listed.

The optimum equalization filter is compared to Filter 3 infigure 6. In figure 6(a) a frequency and phase deviation plot isdepicted. As seen in figure 6(b) the estimated impulse responsefollowed the inverse impulse response of the PTFs.

The last filter selected for the MD300 is shown in figure 7.In figure 7(a) a frequency and phase deviation plot is de-picted. Figure 7(a) shows the difference between the optimumequalization filter and Filter 4. It is seen that even though thefrequency deviation limit was set to±2 dB, the lowest orderfilter only had a maximum deviation of about±1.8 dB. Theestimated impulse response of Filter 4 is shown in figure 7(b).

C. Listening test

The results of the listening test for both filters are shown intable IV. The results showed no audible differences from theoptimum filters with a confidence level of 95%.

TABLE IV

RESULTS OF THE LISTENING TEST. FILTERS VS. CORRECT ANSWERS

Filter Correct answers [%] Confidence interval [%]1 30.6 [27.0;39.6]2 33.3 [27.0;39.6]

IV. D ISCUSSION

We have investigated the possibility of reducing the lengthsof equalization filters for binaural reproduction. The opti-

0.1 1 10

0

200

400

600

0.1 1 10

−5

−101

5

An

gle

[deg

.]

Frequency [kHz]

6 N(f) = 6 H(f) − 6 H(f)

Am

plit

ud

e[d

B]

Frequency [kHz]

MD300 Error plot: OE estimated with STMCB, order [15 33]

N(f) = |H(f)|

|H(f)|

1 dB Margin

(a) Frequency deviation and phase error plots for Filter 3

0 10 20 30 40 50 60 70 80 90 100

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Sample

Am

plit

ud

e[.

]

MD300: OE estimated with STMCB, order [15 33]

h(n)

h(n)

(b) Impulse response of Filter 3 and the optimum filterresponse

Fig. 6. Filter 3: Equalization filter for MD300 estimated with an OE model with a STMCB method. The filter order was [15 33].

0.1 1 10

0

200

400

600

0.1 1 10

−5

−2

0

2

5

An

gle

[deg

.]

Frequency [kHz]

6 N(f) = 6 H(f) − 6 H(f)

Am

plit

ud

e[d

B]

Frequency [kHz]

MD300 Error plot: OE estimated with STCMB, order [17 18]

N(f) = |H(f)|

|H(f)|

2 dB Margin

(a) Frequency deviation and phase error plots for Filter 4

0 10 20 30 40 50 60 70 80 90 100

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Sample

Am

plit

ud

e[.

]

MD300: OE estimated with STCMB, order [17 18]

h(n)

h(n)

(b) Impulse response of Filter 4 and the optimum filterresponse

Fig. 7. Filter 4: Equalization filter for MD300 estimated with an OE model with a STMCB method. The filter order was [17 18].

mum equalization filters for DT990 and MD300 have beensuccessfully reduced applying common parametric estimationtechniques. The obtained filters were of order 24 to 48. Foreach headphone the two lowest order filters with an estimationerror less than±1 dB and less than±2 dB respectively werechosen. The selected filters for the DT990 were compared withthe optimum equalization filter in a 3AFC test. No audibledifferences were observed by the six test subjects. In generalthe human ear cannot detect relative frequency deviationswithin ±1 dB. This research has shown that deviations lessthan±1.7 dB in this case were not audible either as stated byMoore [11].

The impulse response of the optimum equalizing filter wasused to estimate the parameters. This approach seems to favor

higher frequency components, and therefore resulted in ahigher rejection ratio than expected. A better pre-filtering inrelation to the parameter estimation is expected to result in aneven lower filter order.

The investigation and research presented in this article rein-forces indications made by Møller et al. [2], where it is notedthat equalization based on average PTFs for each headphonemay give acceptable results. Toft et al. [3] concludes that it ispossible to achieve good results with an average equalizationof each headphone, which is supported by this article aswell. Nielsen et al. [4] also estimated lower order filters. Ourresearch verifies the applicability of such a reduced order filterthrough listening tests.

An investigation of an additional reduction in filter order

seems possible since verification of both filters indicated noaudible difference. This will ultimately lead to the thresholdvalue of audibility. Further verification of the obtained filterswith respect to determining the directional deviations is sug-gested.

REFERENCES

[1] H. Møller, “Fundamentals of binaural technology,”Applied Acoustics,vol. 36, pp. 171–218, 1992.

[2] H. Møller, D. Hammershøj, C. B. Jensen, and M. F. Sørensen, “Transfercharacteristics of headphones measured on human ears,”J. Audio Eng.Soc., vol. 43, no. 4, pp. 203–217, April 1995.

[3] J. C. Toft, S. Pedersen, A. Kristensen, and T. Sørensen, “Equalizationof headphones for use with 3d sound,”15th SEMCON 2002, 2002.

[4] H. A. Nielsen, K. R. Pedersen, R. M. M. Petersen, K. H. Sørensen,and M. Sørensen, “Evaluation and comparison of methods for real timeequalization of stereo headphones for binaural sound reproduction,” 16thSEMCON, 19th december 2003, 2003.

[5] D. D. Rife and J. Vanderkooy, “Transfer-function measurement withmaximum-length sequences,”J. Audio Eng. Soc., vol. 37, no. 6, pp.419–444, June 1989.

[6] P. Minnaar, “Simulating an acoustical enviroment with binaural technol-ogy,” Ph.D. thesis, 2001.

[7] A. V. Oppenheim and R. W. Schafer,Discrete-Time Signal Processing,1st ed. Prentice Hall, 1989.

[8] L. Ljung, System Identification. Theory for the user, 2nd ed. PrenticeHall, 1999.

[9] W. Murray, Numerical Methods For Unconstrained Optimization, 1st ed.Academic Press, 1972.

[10] C. W. Therrien,Discrete Random Signals and Statistical Signal Pro-cessing, 1st ed. Prentice Hall, 1992.

[11] B. C. J. Moore,An introduction to the psychology of hearing, 4th ed.Academic press, 1997.