Reproduced sound in rooms: scaling the experience between large and small rooms
Transcript of Reproduced sound in rooms: scaling the experience between large and small rooms
Theo Stojanov Dec. 8th, 2006
Reproduced sound in rooms:scaling the experience between large and small rooms
The focus of this presentation is sound reproduction in small rooms, and more specifically the
reproduction of large room acoustics in a small space. The goal will be to identify a number of issues
that relate to our understanding of large room acoustics, and how these translate to the playback
environment.
Over a century after the invention of the microphone1, sound recording has become the principal means
of enjoying music. The listening experience has moved away from live performances with real
instruments to listening rooms with loudspeakers, often intended to play back not only music, but also
film sound, and potentially be a part of immersive communication installations. Is the experience so
close to the original as to cause a general shift of interest from live to recorded music? While this is a
topic better addressed by sociologists, great progress continues to be made by audio researchers aiming
to diminish the difference between live and recorded sound, to immerse, to blur the line between reality
and illusion.
In a typical listening environment there will be significant interaction between the monitoring system
and the room. Subjective evaluations improve the understanding of that interaction, which may prove
necessary in achieving a better blend between recorded and playback acoustics; to reconcile the illusion
of a great symphonic performance in a concert hall to the reality that effectively the performance now
takes place in a typical listening room. How will listeners qualify the contribution of the monitoring
environment on the original program material? Where do recorded acoustics begin to beak down under
1. The invention of the microphone in the late 1800 allowed for acoustic sound power to be transduced into electrical voltage and then converted back into sound waves via a loudspeaker, and first served a purpose in early telecommunications.
1 / 11
the influence of the room and where are they helped by it?
There is little argument that even the finest present-day recordings auditioned in state-of-the-art
listening rooms are only a reminder of a live performances in real halls. Sound captured in a multitude
of venues through a variety of techniques, equipment, and post-production treatment, reappears at the
loudspeaker as an interpretation of its source by the collective effort of everyone involved in the
recording: it is therefore merely a subjective impression. Yet there are a number of outstanding
recordings, indicating that music and our perception of it endures through the aleatory conditions of
recording productions and communicates successfully with the listener.
There is no recipe for success, but part of what makes such recordings special is an Ideal towards
which recording engineers should strive; the closer it is approached, the better the chance that the
recording will be well received. In a paper outlining the current state of affairs in research on listening
environments Floyd Toole lists a number of important factors: timbral accuracy, impression of
direction, distance, and spatial information. These should be delivered with minimal contribution
caused by the listening room; loudspeakers and room should be neutral conveyors to the artistic
experience [1].
Among the many types of recorded material, Classical Music requires the greatest amount of
transparency upon playback. Any recording will ultimately be heard in a much smaller room than the
one where the original performance took place; therefore the recording engineer is required to
successfully capture an impression of the space for stereo or multichannel reproduction. To this end,
some familiarity with concert hall acoustics is essential.
Reflections and diffuse field
Acoustical aesthetics and listening conditions for classical music were established long before the birth
of sound recording. Concert halls should ensure a pleasant musical experience to all members of a large
audience. In such spaces listeners beyond the front couple of rows are in a predominantly reverberant
sound field and experience mostly reflections.
2 / 11
The acoustic architect strives to minimize absorption in order to strengthen the energy from un-
amplified instruments and voices via reflections, because a reflective sound field ensures the
distribution of that energy to all seats in the house. Hence the fundamental theory of concert hall
acoustics is that the sound field throughout a reverberant space should be homogeneous (the same
everywhere in space), and isotropic (with sound energy arriving at every point equally from all
directions) [1]. Such a field is referred to as diffuse.
A perfectly diffuse field is a theoretical ideal, and can not be achieved due to sound absorption at the
boundaries, but the air, by the audience, etc. But since it has a positive effect on music, the architect's
objective is to generate as much of it as is tastefully acceptable. The measure of the length of time
during which a diffuse field contributes to the sound is Reverberation Time2 (RT), and is perhaps the
most important measure of performance spaces.
Fig. 1: Direct sound, critical distance,and absorption. (After [1]).
This is a look at what happens in reverberant vs.
free field environments. Firstly, there is sound in
free field (solid line descending diagonally),
which decreases by half in intensity as distance
from the source doubles (inverse square law),
where sound energy is entirely absorbed by the
air. In large environments such as auditoriums or
concert halls, that same air absorption has a much
reduced effect due to reverberation, which
conserves and prolongs sound energy (dashed
curves).
The ability to recreate the impression of being in a real acousical space is the most significant argument
for multichannel3 audio over stereo. The engineer now has a greater control over the sound canvas,
listener envelopment (LEV), and apparent source width (ASW)4 [1]. Listeners seem to appreciate a
2. Acousticians refer to RT60, the length of time it takes for the diffuse field to fall below the listener's perception.3. Throughout the paper the term multichannel will refer to 5 playback channels or up, although in the strictest sense “multi” implies any count greater than one.4. Apparent source width (ASW) is a psychoacoustic term related to visual vs. audible impression of a sound source. For example in a live hall an orchestra will sound larger than the visual spread of the performers. Lateral reflections in the reproduction space will have an effect on ASW.
3 / 11
soundstage that extends beyond the physical arrangement of the loudspeakers. In classical recording
practice this immersive impression is created by capturing important binaural cues, or the effect can be
simulated by artificial means.
A sound in free field forces the observer to listen along a single axis: that of the direct sound. A sound
in a room on the other hand puts the HRTF to work, helping the observer not only to localize the sound
source, but to better perceive the subtle resonances of that source that give it its distinct timbre, which
results from spatial averaging of many reflections arriving at the ears from many angles [1].5
To resume, the presence of refections is a desirable quality. Humans enjoy music in enclosed spaces
rather than outdoors, prefer live acoustics to the lack of sound identity in anechoic spaces6. Hence, the
binaural information imparted by reflections is of utmost import. Just what reflections are preferred (at
what levels, and from which direction) will be seen shortly.
From Large towards Small
From the discussion so far, it can be seen that the one predominant acoustic phenomenon that dictates a
sense of space is the reflection. In large spaces we speak of a collection of reflections and their
duration; what happens as spaces get smaller?
A space is geometrically characterized by its volume and its surface area. As spaces get smaller their
volume shrinks more rapidly than their surface area. For a room with dimensions w = 5.3, l = 6.3, h =
3.7 vs. a hall ten times as large we get:
Area = 2ab + 2bc + 2ac, where a, b, and c are the lengths of the three sidesVolume = w * l * h
room:A = 131.4 m2
V = 123.543 m3
hall:A = 1 134 m2
V = 123 543 m3
5. Outdoors, where there are few binaural cues to aid localization, the visual sense takes over. An example often given in film sound is the tiger scene from Apocalypse Now, where Academy Award-winning mixer Walter Murch effectively uses the lack of ambient sound from the front speakers to focus the viewer's visual attention to the screen.6. A word often used to qualify absorptive or anechoic environments is “dead,” accurately reflecting in language the emotional association invoked.
4 / 11
The volume of the hall has increased by a 100 times while the surface area by only 10. This means that
a large space will have a relatively small surface area compared to a small one. Since virtually all
sound absorption occurs at the boundaries, sound absorbing material placed on boundary surfaces is
much more effective in a small room than a large one. The mean free path7 of a sound is then ten to a
hundred times greater for large rooms. Or to put it another way, a panel of sound absorbing material
will be encountered by a sound wave ten to a hundred times as often [2] in a small space. The smaller
the room, the less reverberant it becomes (more absorption going on). In small spaces such as control
or listening rooms the absorption is already such that an observer will detect no reverberant sound field
at all (although there will still be early reflections).8
Reverberation Time (RT)
In Sabine's RT formula for large spaces, the audience is always treated as a layer. This is because the
ceiling height is great compared to width and length, and it would be statistically justifiable to assign an
absorption coefficient to the audience plus floor. Sabine's mathematics however are based on statistical
analysis9, and small spaces can not generate enough sound energy to justify a statistical approach: a
group of listeners in a control room for example can not be treated as a layer because of the sheer
volume of space they occupy.
A small room on the other hand is characterized by a lower ceiling height relative to length and width,
significant areas of absorption on one or more of the boundary surfaces (carpets, drapes, other acoustic
treatment), large absorbing and/or scattering objects (mixing desk, furniture). Though it is possible to
calculate RT for small rooms, it will not offer very useful information. Average measurements across
of a number of listening rooms can give better information, as shown by Devantier in [3].
7. Mean free path: the average distance that a sound wave travels before it strikes a surface.8. It can be proven mathematically, that an enclosure of any size will generate a sound field akin to what is known as diffuse or reverberant in large rooms (provided wavelengths are smaller than the fundamental resonances of the space), but for this discussion of acoustics we will consider only variables that are perceptually significant.9. Benade's document From Instrument to Ear in a Room: Derect or via Recording sheds important light on the topic discussed in this paper, including a detailed mathematical analisys of Sabine's formula. Discussion on it is however not included in the present version of this paper. Another important publication to be considered is found in Ch. 5 of Handbook for sound engineers, Glen M. Ballou, ed., Small Room Acoustics by Doug Jones.
5 / 11
Reverberation time is a property of the room alone, and is measured using an omni-directional sound
source in order to reach all of the boundaries equally and thus set the stage for statistical analysis.
Furthermore, the mathematical formula for RT assume that the boundaries consist of reflection and
absorption, and the central volume of the space is empty (to allow for the build-up of the diffuse field).
In reality however, the directivity of loudspeakers in control and listening rooms produces reflection
patterns and decays that traditional RT math does not account for.
RT and frequency dependence
In large spaces RT can be used as a measure of the suitability of the venue for music, commonly
focusing on mid-frequency reverberation time and variations of RT with frequency, because the
architect strives to avoid acoustical treatment that would alter the spectral balance of the diffuse
reverberant field.
But then, what is heard in small rooms is dictated primarily by the loudspeakers, and then by some
early reflections. Anything else is far below detection, and traditional RT calculations will not reveal
any applicable information. Any suitability of the space for music is dependent only partially on the
room itself, and largely on the playback system.
So what are small rooms?
Acoustically “small” means:
• significant absorption at room boundaries
• sound absorbing and scattering objects
• likely to have low ceilings (compared to auditoriums)
• concepts developed for large spaces apply only partially, or not at all (RT, critical distance)
6 / 11
Schroeder frequency and room modes
The Schroeder frequency is the frequency above which the spacing between room modes (modal
density) decreases so much that they are no longer seen as resonant peaks. Modal density increases
with frequency. Although they play an important part of room acoustics, room modes are not directly
responsible for the recreation of space, and will be kept outside the scope of this paper.
Sound fields 10
Fig. 2: The frequency response of a loudspeaker in an anechoic room (solid line)vs. that of the same loudspeaker in a listening room (dotted line). Source: [3].
This figure shows the frequency spectrum of an enclosed spaces divided into two sound fields. The
point of separation (indicated by the arrow) is known as the Schroeder frequency for large spaces. It is
less clearly determined for small rooms [1] but exists nonetheless. In this particular graph we see the
anechoic response of a loudspeaker (solid line) superimposed over the response of that same speaker in
a real room.
The upper portion of the curve is of concern to spatial imagery reproduction. In small rooms, above the
transition frequency the sound field is dominated by strong individual early reflections (much stronger
than in large spaces where reflections are a random and diffuse whole).11
10. Not to be confused with Ambisonics and the SoundField microphone, the term sound field is used in this paper to indicate an area of the frequency response of a room.11. First reflections will be stronger in small rooms because the listener is much closer to a reflecting surface, and will thus experience a greater intensity of the reflection.
7 / 11
Evaluating reflections
Arguments have been made by control-room designers that early reflections caused by the monitoring
environment must be eliminated to allow the purity of the recording to come out. All the same,
experiments done by Olive and Toole (among others) show that lateral reflections generally have
neutral to beneficial effects on program material, and where the response of a room is concerned other
issues need to be addressed more urgently than reflections (such as room modes, for example).
Direct sound Average delay 9.6 ms
Vertical reflections
Floor bounce Average delayDeltaAttenuation
11.3 ms1.8 ms
-1.5 dB
Ceiling bounce Average delayDeltaAttenuation
14.5 ms4.9 ms
-3.6 dB
Horizontal reflections
Smallest angle Average delayDeltaAttenuation
18.9 ms9.3 ms
-5.7 dB
Second angle Average delayDeltaAttenuation
22.2 ms12.6 ms-6.6 dB
Third angle Average delayDeltaAttenuation
18.7 ms9.1 ms
-5.5 dB
Largest angle Average delayDeltaAttenuation
14 ms4.4 ms
-3.3 dB
In a small room the listener is much closer to reflective
surfaces than in a concert hall, and will always be within
reach or several strong early reflections. In [3] Devantier
looks at the relative levels and delays of the most
important reflections in typical domestic listening rooms.
Table 1 on the left shows the resulting data, and fig. 2
below puts it in context with other studies done on lateral
reflections.
Table 1: Average time delay and attenuation of direct sound and early reflections [3]
Figure 3 to the right shows detection thresholds
for single lateral reflections.
The point of this diagram is to show how single
reflections affect the original sound. The lowest
line (connected solid points) is where reflections
are barely audible. As they increase in level, they
begin to affect the ASW of the image, until at the
very top (solid curve) we perceive not a reflection
but a distinct second image (echo) of the original.
Devantier's data in that context clearly indicates Fig. 3: The first six reflections in a typical room superimposed against other
related studies on the perceptual effects of reflections. Source: [1]
8 / 11
that the six primary reflections in small rooms have a very perceptible effect.
There is a point up to which the acoustics of small rooms are benign to the signal, but determining the
right balance is still a topic of investigation. Understandably, a tiled shower room is not the ideal
listening environment; but then in the range between that and an anechoic chamber, what qualifies
desirable listening room acoustics? Is it a general question of absorption coefficients, or the
presence/absence/strength/direction of arrival of specific reflections and if so, which ones?
In reference [3] it has been observed that reflections originating directly from the front or rear of the
listener (off of a front / back wall, floor, or ceiling) are least flattering to a musical program or speech.
By contrast, the ear-brain mechanism appreciates lateral reflections from the sides which introduce
what has been qualified as a more pleasant spreading of the musical image, or improved intelligibility
as related to speech.
Fig. 4: Inter-Aural Cross Correlation and listener preference. Source: [1]
This is better illustrated by figure 4 above: it shows an experiment in which a listener's preference
increases as Inter-Aural Cross Correlation decreases: the more differences there are between the ears,
the more pleasant the listening experience.
In Conclusion
When discussing what spatial aspects of recorded sound combined with those of the playback
9 / 11
environment are desirable, it is important to consider the simple process of listening. In film, we speak
of suspension of disbelief: the willingness of viewers to overlook the limitations of a medium, so that
these do not interfere with the illusion; to provisionally suspend their judgment in exchange for the
promise of entertainment. Referring back to the idea mentioned at the beginning of the presentation,
that in spite of the greatest care taken to preserve the integrity of the original signal we are still
experiencing an impression, we see that the listener voluntarily agrees to accept the fact that what is
heard is a recording, and makes an effort to see past the reproduction system to the time and place
where the living performance took place.
By merely walking into a listening environment a visual inspection of the surroundings tells us
immediately that we in fact are not in a concert hall: an initial bias that what we are about to hear is an
illusion. Another indication is the acoustical property of the space: it will not sound like a concert hall.
Still, when we listen to a symphonic recording we experience the acoustics of the actual hall, and so
well is space reproduced that we can in fact recognize certain qualities of a particular venue.12
As engineers strive to improve sonic immersion, it is important to address the dichotomy between the
acoustic worlds of the performance and playback spaces, and one idea to be considered is that the
listening environment needs to have some influence over the playback material in order to improve the
fusion between the two; thus, it may be easier to accept the illusion.13 Balancing that blend of acoustical
identities is a topic of current research, for the preferences of listeners need to be taken into account not
only where treatment of the monitoring environments is concerned but also to evaluate the successful
imparting of spatial information from various microphone techniques. The roles of different channels
will thus also become more clear, contributing to the discussion on how much independence is required
for each. While it is only an account of some aspects of reproduced sound in rooms, this presentation
aimed to identify some directions of research based on what has been studied so far, and other related
threads of information have been mentioned throughout the text. In the words of BBC engineer James
12. Olive and Toole's JAES paper The Detection of Reflections in Typical Rooms contains important information about listener discrimination between spectra that are changing (music material) to spectra that are flat (recording space, loudspeaker, listening environment). This is directly relevant to the present discussion, but will not be examined in the present version of this paper.13. George Massenburg's room at Blackbird Studios in Nashville is intended to be “a room without walls” but with an acoustic identity – a perfectly diffuse small room without any early reflections. This is one way of integrating recorded sound with playback acoustics, but is mentioned only as an experimental example because the spaces discussed in this presentation are of a more traditional design.
10 / 11
Moir [1], “if a room requires extensive treatment for stereophonic listening there is something wrong
with the stereophonic equipment or the recording. The better the stereophonic reproduction system, the
less trouble we have with room acoustics.” It remains to be seen if this belief carries over to
multichannel sound.
REFERENCES
[1] F.E. Toole, Loudspeakers and Rooms for Sound Reproduction – A Scientific Review, Journal of
the Audio Engineering Society, Vol. 54, No. 6 (2006 June)
[2] E. R. Geddes, Premium Home Theater: Design and Construction (GedLee LLC, Novi, MI,
2002), gedlee.com
[3] A. Devantier, Characterizing the Amplitude Response of Loudspeaker Systems, presented at the
113th Convention of the Audio Engineering Society, paper 5638 (2002)
11 / 11