Conducting a Virtual Orchestra

10
The Virtual Orchestra is a multimodal system architecture based on complex 3D sound environment processing and a compact user interface. The system architecture supports high- performance sound content for VR applications, and the user controls the performance with a magnetically tracked handheld device. O ne of the most exciting and rich experiences in the real world is music and all that surrounds it. A music performance—be it a classi- cal music concert performed by a symphonic orchestra or a rock concert—always conveys a range of experiences. Music stimulates multiple senses at once. We hear and feel the sound vibra- tions in our bodies, and watching the perfor- mance complements the experience. As a result, we end up jumping, dancing, or discreetly fol- lowing the rhythm with our arms or feet. The objective of our work is to give music amateurs the opportunity to conduct a group of musicians and produce a new kind of multime- dia experience. To explore this idea, we devel- oped a semi-immersive virtual reality (VR) system, based on a large projection screen, designed to create an entertaining multimedia experience for nonexpert users. We decided to implement a music performance simulator, a VR system where several virtual characters play dif- ferent musical instruments and adapt their per- formance according to the user’s instructions. Although we pay special attention to the real- ism of the scenario’s visual representation, ani- mating the musicians, and the music they produce (3D sound rendering), the conducting interface doesn’t try to accurately reproduce the complex interaction of conducting an orchestra. This is left as future work. We provide an inter- face that partially captures the expressivity of users’ gestures without forcing them to perform a specific conducting pattern. Development challenges The participant in our VR system conducts a virtual orchestra performing in a 3D concert hall. Several autonomous virtual humans play musi- cal pieces codified as MIDI files. The user con- ducts the performance by deciding on the music score to interpret and modifying in real time the orchestra’s layout as well as the performance’s tempo and dynamics (volume). The conductor and audience can view and listen to the music ensemble from any position in the 3D environ- ment—with dynamic acoustic effects. The user doesn’t need to be a professional to conduct this orchestra because all the functionalities and options are available through a simple interface: a magnetically tracked handheld device. We faced several challenges in achieving a rea- sonable level of immersion for the users (the con- ductor and spectators) in such an environment. In response, we designed and implemented solu- tions to three main problems: 3D sound render- ing, the user interface, and the virtual environment and musicians. (See the “Related Work” sidebar for other research in these areas.) 3D sound The purpose of a VR system is to reflect as best as possible the real world to the human senses. To achieve a sense of presence, VR systems must provide content such as graphics, natural user interfaces, and sound. Three-dimensional sound processing is essential for VR worlds. The goal of 3D sound rendering in our work was to reproduce the musical instruments’ sound and compute the acoustic effects due to the sce- nario’s physical characteristics. (We explore this element in more detail in the “3D sound render- ing” section to follow. ) User interface The user interface in a virtual environment (VE) is a key component to creating a sense of presence. Our user interface problem involved how to direct the performance (tempo and dynamics) of the vir- tual musicians and control the characteristics of their environment. We aimed at maximizing the sense of presence by eliminating complex or cumbersome devices such as head-mounted displays (HMDs), data gloves, and similar peripherals—avoiding the classical VR approach. We chose the alternative of a semi-immersive scenario. Our VE is rendered on a large projection screen where several users can expe- rience the simulation. The immersion was enhanced by 3D images that users view with 3D glasses. Our objective was to create a fully customiz- able application where system developers and first-time users could modify, at any time, most of the environment settings such as the camera 40 1070-986X/04/$20.00 © 2004 IEEE Published by the IEEE Computer Society Conducting a Virtual Orchestra Sebastien Schertenleib, Mario Gutiérrez, Frédéric Vexo, and Daniel Thalmann Swiss Federal Institute of Technology in Lausanne Multisensory Communication and Experience through Multimedia

Transcript of Conducting a Virtual Orchestra

The VirtualOrchestra is amultimodal systemarchitecture basedon complex 3Dsound environmentprocessing and acompact userinterface. Thesystem architecturesupports high-performance soundcontent for VRapplications, and theuser controls theperformance with amagnetically trackedhandheld device.

One of the most exciting and richexperiences in the real world ismusic and all that surrounds it. Amusic performance—be it a classi-

cal music concert performed by a symphonicorchestra or a rock concert—always conveys arange of experiences. Music stimulates multiplesenses at once. We hear and feel the sound vibra-tions in our bodies, and watching the perfor-mance complements the experience. As a result,we end up jumping, dancing, or discreetly fol-lowing the rhythm with our arms or feet.

The objective of our work is to give musicamateurs the opportunity to conduct a group ofmusicians and produce a new kind of multime-dia experience. To explore this idea, we devel-oped a semi-immersive virtual reality (VR)system, based on a large projection screen,designed to create an entertaining multimediaexperience for nonexpert users. We decided toimplement a music performance simulator, a VRsystem where several virtual characters play dif-ferent musical instruments and adapt their per-formance according to the user’s instructions.

Although we pay special attention to the real-ism of the scenario’s visual representation, ani-mating the musicians, and the music theyproduce (3D sound rendering), the conductinginterface doesn’t try to accurately reproduce thecomplex interaction of conducting an orchestra.This is left as future work. We provide an inter-face that partially captures the expressivity ofusers’ gestures without forcing them to performa specific conducting pattern.

Development challengesThe participant in our VR system conducts a

virtual orchestra performing in a 3D concert hall.Several autonomous virtual humans play musi-

cal pieces codified as MIDI files. The user con-ducts the performance by deciding on the musicscore to interpret and modifying in real time theorchestra’s layout as well as the performance’stempo and dynamics (volume). The conductorand audience can view and listen to the musicensemble from any position in the 3D environ-ment—with dynamic acoustic effects. The userdoesn’t need to be a professional to conduct thisorchestra because all the functionalities andoptions are available through a simple interface:a magnetically tracked handheld device.

We faced several challenges in achieving a rea-sonable level of immersion for the users (the con-ductor and spectators) in such an environment.In response, we designed and implemented solu-tions to three main problems: 3D sound render-ing, the user interface, and the virtualenvironment and musicians. (See the “RelatedWork” sidebar for other research in these areas.)

3D soundThe purpose of a VR system is to reflect as best

as possible the real world to the human senses.To achieve a sense of presence, VR systems mustprovide content such as graphics, natural userinterfaces, and sound. Three-dimensional soundprocessing is essential for VR worlds.

The goal of 3D sound rendering in our workwas to reproduce the musical instruments’ soundand compute the acoustic effects due to the sce-nario’s physical characteristics. (We explore thiselement in more detail in the “3D sound render-ing” section to follow. )

User interfaceThe user interface in a virtual environment (VE)

is a key component to creating a sense of presence.Our user interface problem involved how to directthe performance (tempo and dynamics) of the vir-tual musicians and control the characteristics of theirenvironment. We aimed at maximizing the sense ofpresence by eliminating complex or cumbersomedevices such as head-mounted displays (HMDs),data gloves, and similar peripherals—avoiding theclassical VR approach. We chose the alternative of asemi-immersive scenario. Our VE is rendered on alarge projection screen where several users can expe-rience the simulation. The immersion was enhancedby 3D images that users view with 3D glasses.

Our objective was to create a fully customiz-able application where system developers andfirst-time users could modify, at any time, mostof the environment settings such as the camera

40 1070-986X/04/$20.00 © 2004 IEEE Published by the IEEE Computer Society

Conducting aVirtual Orchestra

Sebastien Schertenleib, Mario Gutiérrez, Frédéric Vexo, andDaniel Thalmann

Swiss Federal Institute of Technology in Lausanne

Multisensory Communication and Experiencethrough Multimedia

41

July–Septem

ber 2004

Implementing an interactive virtual orchestra is a mul-tidisciplinary project involving different research areas.This is an overview of the state of the art on the mainresearch topics involved in our work.

3D soundWhen we ask people about their expectations of a VR

simulation, they say that at first they feel attracted by allthe visual effects and fancy interaction devices—head-mounted displays (HMDs), data-gloves, and so forth.Sound effects are rarely mentioned as being the mostinteresting aspects of the experience, but in fact, theymake a big difference between a simple 3D applicationand an immersive experience.1 For a realistic VR applica-tion, the sound system is more important than we mightthink because

❚ we use sound to gather information about our sur-rounding environment,2

❚ rich sound environments help us pay attention todetails, 3 and

❚ the human brain responds to changes in sound (suchas tempo and loudness) and reacts immediately toany sound event. Audition produces more instinctiveanswers than visual artifacts. 4

In a VR environment, the auditory experience isopposed to the visual one. The user is already preparedto see nonreal images (habituation to TV and cinema). 5

Yet audio is different; experiments have shown that poorsound cues can confuse users and make them ignore anysound in the next few minutes (see http://www.cre-ativelabs.com/).6 A bad or irrelevant sound systemreduces the attention to and believability of a simulationas much as good sound systems increase it. Because a lis-tener’s expectations about the surrounding world will bedirectly affected by the sound processing, we need to fil-ter them appropriately.

Many VR applications don’t implement 3D soundproperly—if they use 3D sound processing at all.Generally, VR engines tend to be graphic-centered sys-tems, built on top of the rendering pipeline. However,with current graphics processing unit (GPU) systems,more CPU time is available for computing other compo-nents such as physics and 3D sound. 1 Some studies byNvidia and the Microsoft developers’ team conductedduring the conception of the Xbox’s audio hardware7,8

showed that the sound environment is so natural forhumans that simulating it in a VR system gives impres-

sive results, making the virtual characters’ behaviorsmore believable for the user.

During the last few years, the synthesis and process-ing of 3D sound in VR applications hasn’t receivedenough consideration. We can explain this with the fol-lowing reasons:

❚ Lack of space to physically locate the required speak-er configurations.

❚ The idea that a VR system’s sound component didn’thave the highest priority.

❚ The hardware required to compute real-time high-quality content on multiple speakers was expensiveor nonexistent.

User interfaceThere have been several attempts to reproduce the

experience of conducting an orchestra. One of the mostremarkable is the work presented at Siggraph 99 byJakub Segen and his colleagues.9 They proposed a visualinterface based on optical tracking of conductor gestures(hand and baton tracking). The user controlled the beat,tempo, and volume of the music performed by a MIDIsequencer. They achieved immersion with a projectionscreen displaying a group of virtual dancers who fol-lowed the music.

The Personal Orchestra, a system implemented at theHouse of Music Vienna, 10 is a more recent effort to pro-vide a virtual conductor application. The interface is aninfrared-tracked baton to control in real time the tempoand volume of prerecorded music and video from a realorchestra.

In both cases, the main interaction device is the con-ductor’s baton, tracked by image segmentation in thecase of the visual interface, or by infrared signals emit-ted by the commercially available Buchla Lightning IIconductor’s baton11 used in the Personal Orchestra.Another similar device, the Radio Baton, 12 tracks withradio-frequency signals. Researchers have developedother similar devices, some of which are more complexand detect not only the baton’s position and the pres-sure on its handle; these are complemented by addi-tional trackers to acquire the user’s gestures and capturetheir expressivity. 13,14 Our work doesn’t intend toadvance the state of the art on such conducting inter-faces like the Radio Baton or the work of Marrin, 14 whichusually require professional skills and are designed forexpert users. The goal of our interface was to acquire

Related Work

continued on p. 42

42

IEEE

Mul

tiM

edia

simple parameters, such as the amplitude and frequencyof arm gestures, and provide additional functionalities tointeract with the environment.

The research we’ve seen doesn’t provide an interfacefor customizing the environment or choosing a piece toplay. The Personal Orchestra10 shows video from a realorchestra, but the repertoire is limited to the pre-processed recordings. The Visual Conductor12 is moreflexible because it uses MIDI files, but it has no repre-sentation of the musicians (dancers only).

Usually, these systems focus on providing a conduct-ing interface—the scene and orchestra are preset and can’tbe edited by the user. The Personal Orchestra lets the userselect a piece to play and gives some additional informa-tion, but the user can’t modify any other parameter.

We decided to experiment with PDAs as conductinginterfaces. Early research on PDAs and VR include the workof Ken Watson and his colleagues.15 They implemented abasic interface for 2D interactions in a CAVE-like system.Recent research has focused on implementing 2D GUIsfor basic interaction: camera control, scene management,and interaction between multiple users. 16–18

In the touch-space system, 19 a PDA is opticallytracked and used to display virtual objects superimposedon the real world. This work shows that a PDA can serveas an efficient tool to reinforce human–human contactin a mixed-reality environment. This is an importantpoint to consider, because apart from immersing theusers into the simulation, we want them to share theirimpressions and enjoy the virtual performance together.

Virtual environment and musiciansOne of the first examples of virtual musicians was pre-

sented in the Electric Garden at Siggraph 97 by theDIVA-Group. 20 This demo showed four virtual musi-cians—3D human-like articulated models following a realconductor. The musicians’ movements were determinedby inverse kinematics calculation from a MIDI file and aset of constraint rules for each instrument. The anima-tion was coordinated with a synchronization signal gen-erated by a tempo-tracking module.

A more recent effort aimed at creating autonomous vir-tual musicians is the work of Martijn Kragtwijk and his col-leagues. 21 The authors presented a set of algorithms togenerate 3D animated musicians based on an abstract rep-resentation of the musical piece (MIDI files). They imple-mented a virtual drummer displaying realistic animationsof the limbs and sticks before the contact moments.Instead of using motion capture or inverse kinematics todefine the required poses for a given drum-kit configura-tion, the authors implemented a GUI-based pose editor to

set them manually. The virtual drummer was able to exe-cute only precalculated music scores, and there was noway to modify or conduct its performance in real time.

Some of the research work at VRlab-EPFL has focusedon the problem of virtual human’s autonomy, includingcreating autonomous musicians. Recent research has ledto defining a model to represent the content of virtualscenes and focus on the interactions between virtualhumans and virtual objects. 22 Researchers have demon-strated this by developing an autonomous virtual pianist(virtual agent) who can interpret virtual scorebooks. 23

The scorebook contains musical data—extracted andpreprocessed from a MIDI file—that the virtual pianistcan use as input for a learning phase (used to precalcu-late the pianist gestures) and a subsequent playingphase. The musical input data from the scorebook is alsoused to synthesize the piano animation and corre-sponding sound production.

References1. A. Rollings and E. Adams, “Construction and

Management Scene,” On Game Design, Pearson

Education, 2003.

2. G. Lecky-Thompson, “Infinite Game Universe,” Level

Design, Terrain, and Sound, vol. 2, Charles River Media,

2002.

3. C. Crawford, On Game Design, New Riders, 2003.4. C. Carollo, Sound Propagation in 3D Environments, Ion

Storm, 2002.

5. Facts And Figures about our TV Habits, TV-Turnoff

Networks, 2003

6. J. Blauert, Spatial Hearing: The Psychoacoustics of Human

Sound Localization, revised ed., MIT Press, 1997.

7. M. Deloura, Programming Gems 3, Charles River Media,

2002.

8. M. O’Donnel, Producing Audio for Halo, MS Press, 2002.

9. J. Segen, J. Gluckman, and S. Kumar, “Visual Interface

for Conducting Virtual Orchestra,” Proc. Int’l Conf.

Pattern Recognition, vol. 1, IEEE CS Press, 2000.

10. J.O. Borchers, W. Samminger, and M. Muhlhauser,

“Personal Orchestra: Conducting Audio/Video Music

Recordings,” Proc. Int’l Conf. Web Delivering of Music,

Wedelmusic, 2002, p. 93.

11. R. Rich, “Buchla Lightning II,” Electronic Musician, vol.

12, no. 8, 1996.

12. M. Mathews, “The Conductor Program and Mechanical

Baton,” Current Directions in Computer Music Research,

M.V. Mathews and J.R. Pierce, eds., MIT Press, 1991.

13. T. Marrin, “Possibilities for the Digital Baton as a

General-Purpose Gestural Interface,” Proc. ACM-Conf.

Human Factors in Computing Systems (CHI 97), 1997.

14. T. Marrin, Inside the Conductors Jacket: Analysis,

continued from p. 41

point of view, the musicians’ position, the pieceto play, and so forth. Moreover, we decided toseparate the interface from the simulation screento free the scene from overlaying menus or otherGUI components. At the same time, both con-trols—the environment settings and conductinginterface—were to remain equally accessible. Inthis context, the user would be able to play theroles of both the artistic director and conductor.

We looked for an interface that remained intu-itive enough while letting users execute varioustasks through a single interaction device. Oursearch led us to an area that researchers have juststarted to explore in the context of VR applica-tions: handheld personal digital assistants (PDAs).

Using the PDA not as a complement, but asthe main interface to the VE, let us implement ashared simulation where one of the users acts asthe scene or artistic director and conductor whileothers can enjoy the experience as spectators ofthe virtual concert.

Virtual environment and musiciansBesides a realistic 3D sound and an intuitive,

yet flexible interface, the virtual environmentrepresentation constitutes a key simulator com-ponent. A convincing animation of autonomousvirtual musicians obeying to the conductor inreal time contributes to the experience, makingit more attractive and believable.

One problem with virtual environment and

musicians involves making the virtual entitiesvisually appealing for believability and full userimmersion. The virtual musicians had to performlike real musicians—which called for autono-mous animation.

We adapted a virtual agent model that lets usedit the performance parameters in real time, inparticular the tempo and dynamics. Followingthe same model and design patterns, we imple-mented a new musician, a virtual flutist. We useboth of them in our simulation.

VR system architectureWe implemented the virtual orchestra con-

ducting application using a component-basedframework developed at VRlab. (See related lit-erature for a deeper overview of this system.1)The system’s software architecture is based ona distributed application model with TCP/IP asthe main communication channel bridging theserver and client (see Figure 1, next page).Consequently, the software is divided into twomain executable modules. The PC server con-sists of multiple software components that han-dle communication, including rendering 3Dscenes and sound; processing inputs from theexternal VR devices (handheld and magnetictracker); animating the 3D models; and in par-ticular, animating complex virtual humanmodels, including skeleton animation and skinand clothes deformation. We can modify the

43

July–Septem

ber 2004

Interpretation and Musical Synthesis of Interaction

Expressive Gesture, PhD thesis, MIT Media Lab,

2000.

15. K. Watsen, D.P. Darken, and W.V. Capps, “A

Handheld Computer as an Interaction Device to

a Virtual Environment,” Proc. 3rd Int’l Immersive

Projection Technology Workshop (IPTW’99), 1999.16. E. Farella et. al., “Multi-client Cooperation and

Wireless PDA Interaction in Immersive Virtual

Environment,” Proc. Euromedia Conf., 2003,

17. E. Farella et al., “Using Palmtop Computers and

Immersive Virtual Reality for Cooperative

Archaeological Analysis: the Appian Way Case

Study,” Proc. Int’l Conf. Virtual Systems and

Multimedia, Int’l Soc. on Virtual Systems and

MultiMedia, 2002.

18. L. Hill and C. Cruz-Neira, “Palmtop Interaction

Methods for Immersive Projection Technology

Systems,” Proc. Int’l Immersive Projection

Technology Workshop, 2000..

19. A. Cheok et al., “Touch-Space: Mixed Reality

Game Space Based on Ubiquitous, Tangible, and

Social Computing,” , vol. 6, no. 5–6, Jan. 2002,

pp. 430-442.

20. R. Hänninen, L. Savioja, and T. Takala, “Virtual

Concert Performance - Synthetic Animated

Musicians Playing in an Acoustically Simulated

Room,” Proc. Int’l Computer Music Conf., (poster

paper), 1996.

21. M. Kragtwijk, A. Nijholt, and J. Zwiers,

“Implementation of a 3D Virtual Drummer,”

Computer Animation and Simulation, N.

Magnenat-Thalmann and D. Thalmann, eds.,

Springer Verlag Wien, 2001, pp. 15-26.

22. J.Esmerado, A Model of Interaction between

Virtual Humans and Objects: Application to Virtual

Musicians, Ph.D. Thesis, No. 2502, EPFL, 2001.

23. J. Esmerado et al., “Interaction in Virtual Worlds:

Application to Music Performers,” Proc. Computer

Graphics Int’l Conf. (CGI), 2002.

interactive scenario’s state with python scriptsat runtime.

The client (a handheld device) is responsiblefor acquiring the user’s interaction with the sys-tem through its integrated 2D GUI and theattached sensor (a magnetic tracker). The mag-netic tracker information is analyzed by anotherPC (a gestures analyzer) acting as a second client.

We had a two-fold rationale for our choice ofthe distributed application model. First was ourdesired system functionalities and second wasdue to technical requirements during the systemdevelopment—the need to distribute the dataprocessing and components development.

3D sound renderingThe most important properties of a sound

source in terms of computer knowledge2,3 are the

❚ source’s 3D position and orientation in theworld;

❚ cone of propagation;

❚ distance between the listener and source;

❚ Doppler effect;

❚ volume and frequency; and

44

IEEE

Mul

tiM

edia

VR systemdatabase

HandheldDevice

Loading/savingservice

Skindeformation service

Skeletonanimation service

Facedeformation service

3Drendering service

Soundservice

Simulation runtime engine

Network layer

Message handler

Service manager

Data manager

TCP/IPmodule

Events,remote method invocations (RMIs)

TCP/IP packages

Scenestatus

Model view matrix

Communicationmodule

Scenarios, configurations

Scenarios,configurations

3D pointerpositionorientation

Read/writeskin mesh

Read/writeskeleton

Readvisims

Readsoundsamples

SimulationState

Animationorders

Faceorders

Playorders

Camerachanges

Dedicated service

Scenario managermodule and handheld

device controller

SimulationControlmessages

Read/writecamera read

XMLdata

Containerobjects

Containerobjects

Python interpreterService

datacontroller

GesturesAnalyzer

PointedobjectID

Figure 1. Architecture

of the virtual orchestra

VR system based on

component patterns.

❚ occlusion, obstruction, and exclusion.

By filtering the different sound sources basedon the previous settings, the direction parameterbecomes the most important property to consid-er. Otherwise, the system rendering process isdeeply limited. In our implementation, we rep-resent the sound propagation with a cone, whichgives us flexibility to specify the different filtersfor each sound source (see Figure 2).

When obstacles between the listener and a

sound source exist, we must calculate all attenua-tions due to occlusion, obstruction, and exclusionsituations depending on the 3D environment’sconfiguration. In general, obstructions occurwhen both sound source and listener are in thesame room. Exclusions occur when the roomsaren’t completely separated; otherwise, we havean obstruction.2

3D environmentsThe complexity of 3D sound processing

45

July–Septem

ber 2004

Soundsource

Inside volume

Transitional volume

Volume = inside volume + specified outside volume

Distance

Outside cone

Inside cone

Dec

ibel

s

0 Direct

Reflections Reverb

Decay_time

Reflections_delay

Reverb _delay

Time

(a)

(b)

Figure 2. (a) Sound

source propagation

cone model. (b) Sound

source attenuation due

to reflections and

reverberation models.

depends on the environment architecture. Weset up some predefined environments for com-mon places such as a church, bathroom, or audi-torium. Designers can define the shape of theirsound environments by making small modifica-tions—for example, defining the room’s size.

From a behavioral point of view, the air absorp-tion factor acts on the compression of the soundwave, which can distort the original sound source.Obstacles and walls influence the sound propaga-tion and modify it by adding reverberations,reflections, and delays (see Figure 2). Moreover, toincrease efficiency, we combine different compo-nents together to give the system additional infor-mation. The rendering pipeline containsknowledge about the environment that we canextract from the Binary Space Partition (BSP) Tree4

or Cosmo 3D Scenegraph (http://www.cai.com/cosmo/)—our viewer is based on the latter library.Then, this information can be used by the physicsor sound components for collision or obstacledetection among other operations.

Managing complex environmentsTo compute all attenuations from the initial

position and orientation of a sound source andthe listener, we need to find the path followed bythe wave, resulting from its interaction with

obstacles and walls. For this, we use a customizedversion of the A* algorithm5 (http://www.ia-depot.com), which gives us different points andangles for the objects (see Figure 3). Then we cancalculate all attenuations and effects to be set forfiltering the wave. However, due to time con-straints, we can’t perform intensive computa-tions for all sources so our simplified versionallows some errors but guarantees a smooth per-formance. In other words, we don’t computeevery sound source at each animation frame butcheck only if any drastic change occurs betweentwo continuous steps. Otherwise, we computeextrapolations using the previous states of thesound source properties.

Such operations are excellent candidates to beexecuted on a multiprocessing architecture. Totake advantage of particular systems like PCswith dual or hyper-threading processors,6,7 wehave implemented our system using an API(http://www.openmp.org), which increases thenumber of sound sources we can process in a sin-gle frame, based on a ratio of the number of log-ical CPUs available. Based on the system events,we analyze the data for controlling the humanmanager who is responsible for synchronizingthe sound processing with the human animation(see Figure 2).

46

IEEE

Mul

tiM

edia

Data 1

Data 2

Data 3

Data n

Analysisskill 1

Analysisskill n

Usabledata 1

Usabledata n

Skill interaction

Synchronization

External events occur

Different chunk for the same note

No events occur

Objectanimation

Interactionresults

Bodyanimation

Figure 3. Example of a

sound path depending

on external events that

control the

synchronization

between the sound

rendering and human

animation.

Playing dynamic musicTo play music that can be modified in real

time, we must be able to record the same trackwith different transitions and then use this data-base to choose which sample needs to be playeddepending on the current events. With our cur-rent implementation, we rely on Gigabase(http://www.ispras.ru/~knizhnik/gigabase.html),an object-relational database management system.The idea was to provide a database table based ona bidimensional array. One axis represents the dif-ferent ways we can play a single chunk of data(like a modification in dynamics), and the secondaxis represents the time. This required establish-ing some connections between the visual aspectsof the virtual musician and the sound rendering.8

Of course, this approach depends on the qual-ity and quantity of the primitive sound samples.As we want to extend the database without anynew recording session, we used MIDI files. Thus,we can manipulate them directly within thesoundboard for greater efficiency (see http://www.microsoft.com/directx).

Synchronizing animation with soundpropagation

Real-time systems must react fast enough toevents to maintain immersion. This is particu-larly true with sound rendering, which is deeplyaffected by any processing delay. In the case ofthe virtual orchestra, we need to synchronize theanimation of the musicians with the corre-sponding sound. Due to real-time restrictions, it’simpossible to predict state transitions, and thuswe can’t prepare the right keyframe at a certaintime. Carefully managing sound and animationdatabases is crucial to reducing latency.9

The processing is based on Joaquim Esmerado’swork.8 We extended his work by adding more flex-ibility and independence on the musician attrib-utes while relying on current hardware.

Our system relies on preordering data usingstatistical information based on Markov chains.Researchers have done some similar work in per-sistent worlds,10 but they need to quickly trans-fer small amounts of information through anetwork from a complex database. In our case,we store the data locally but must provide closeto instantaneous access to the current state,depending on external events. This involvesmanaging a fast access multiple-level database. Inour implementation, the high level consists ofdirect interaction with the real database image bymaking requests to the hard drive that stores the

data. However, as soon as the user selects a musicpartition, we can upload the full table in themain system memory.

Previous work11 has demonstrated that a laten-cy of 50 ms or less is adequate for human beings,and 100 ms is the absolute limit for sound propa-gation. These constraints prevented us from mod-ifying the partition directly. Because we’reworking with a fixed music score, we approachedthe problem from the opposite side. 12 Instead oftrying to follow the animation with the soundprocessing, we used the sound information toreflect and blend the right animation. A selectednote within a music score triggers the eventsallowing the manipulation of the different skele-ton’s bones of a specified musician (see H-Anim,http://www.h-anim.org). Thus, because the sys-tem is managed through sound events (see Figure3), we can avoid delays in the sound processing.This approach degrades the animation’s smooth-ness a bit, which is an acceptable loss, becausehuman senses are more affected by sound distor-tion than by visual artifacts.

Case studyIn our case study, we focused on the data man-

agement required to maintain the synchronizationof sound and animation, which is critical whenproviding a believable simulation of a virtualorchestra. Figure 4 shows our application running.

Our interface has two input channels: the 2DGUI on the PDA screen and a magnetic trackerattached to it (see Figure 5, next page). From the2D GUI, users have a simplified view of the scenewhere they can edit each musician’s position andorientation in the scene. If the user modifies theconductor’s position and orientation, the scene’spoint of view in the projection screen is modifiedto match the virtual conductor’s new positionand orientation. The corresponding modifica-tions in the sound environment are computedand applied by the internal sound component

47

July–Septem

ber 2004

Figure 4. Autonomous

flutist and the system

installation with a

large projection screen

and 5.1 Dolby digital

speakers.

unit. Additionally, the PDA lets users play andpause the execution as well as select the score-book the orchestra will interpret.

The orchestra direction window is one of twoalternatives available for modifying the perfor-mance’s tempo (largo, adagio, default, allegro,presto) and dynamics (pianissimo, piano,default, forte, fortissimo). The second methoduses gestures tracking. The user can choose towork with the sliders, or in the case of thetempo, tap rhythmically with the stylus on theconducting window. The tapping frequency ismapped to one of the available tempo options.The application running on the PDA calculatesthis operation. If the user stops tapping, the lasttempo detected is applied and the PDA sendsthe information to the main system.

The GUI-based tempo and dynamics controldoesn’t allow for interpreting the user emotionsexpressed through the arm gestures, the typical

conducting gestures. To overcome this problem,we attached a magnetic tracker to the PDA. Themagnetic sensor lets us acquire the amplitudeand frequency of the gestures performed whilethe user holds the handheld—a function of therotation angles measured by the tracker to repre-sent its orientation. The frequency is mappedinto one of the possible values for the tempo,while the gesture’s average amplitude affects thedynamics (see Figure 5).

We calculate the orientation values’ amplitudeand frequency in real time. The orientation valuesmeasured by the magnetic tracker are divided intofive regions corresponding to the possible valuesfor the dynamics; the frequency is calculated as afunction of the gesture’s acceleration. The tracker(Ascension Flock of Birds; http://www.ascension-tech.com) is sampled at 60 Hz over a TCP connec-tion. The information is processed by a programrunning on a dedicated PC gestures analyzer thatsends the current tempo and dynamics valuesthrough the network (TCP sockets).

ConclusionsWe’re still at the beginning of the research

phase, in particular concerning the methods tobetter acquire the user’s expressivity in a simple,noncumbersome way. In the future, we plan toperform a formal evaluation of the efficiency anduser acceptance of the user interface and theimmersion the system achieves. We must alsocarry out further research concerning the syn-thesis of autonomous musicians, the synchro-nization of music and animation, and thenaturalness of the gestures performed by the vir-tual characters—our virtual musicians still per-form robotic-type movements.

However, we can affirm that combining 3Drendering with a high-quality 3D sound andhandheld interfaces devices in VR applicationsproduces an entertaining multimedia experience.It opens the possibilities to novel interaction par-adigms and other research areas. MM

References1. M. Ponder et al., “VHD++ Development

Framework: Towards Extendible Component Based

VR/AR Simulation Engine Featuring Advanced

Virtual Character Technologies,” Computer

Graphics Int’l (CGI), 2003.

2. R. Altman, ed., Sound Theory Sound Practice,

Routledge, 1992.

3. J.R. Boer, Game Audio Programming, Charles Rivers

Media, 2002.

48

IEEE

Mul

tiM

edia

presto

forte

piano

2

-2

2

-2

4

44

-4

adagio

HandheldMagnetic

tracker

AscensionFlock of Birds

Figure 5. (a) The

handheld interface and

(b) the interpretation of

the data from the

magnetic tracker. The

blue dots represent

each musician’s

position, and the green

dot shows the

conductor.

(a)

(b)

49

July–Septem

ber 2004

4. T. Akenine-Müller and E. Haines, Real-Time

Rendering, A.K. Peters, 2002.

5. S. Rabin, IA Game Programming Wisdom, Charles

River Media, 2002.

6. R. Gerber, Advanced OpenMP Programming, Intel,

2003.

7. A. Binstock, “Multithreading, Hyper-Threading,

Multiprocessing: Now, What’s the Difference?,”

Pacific Data Works, Intel Developer Service, 2003

8. J. Esmerado, A Model of Interaction between Virtual

Humans and Objects: Application to Virtual

Musicians, Ph.D. Thesis, No. 2502, EPFL, 2001.

9. Y. Arafa et al., “Two Approaches to Scripting

Character Animation,” Proc. Workshop `Embodied

Conversational Agents: Let’s Specify and Evaluate

Them!, AAMAS,, 2002, p. 35.

10. K. Wilson, Game Object Structure, GDC, 2003.

11. M. O’Donnel, Producing Audio for Halo, MS Press,

2002.

12. D. Sonnenschein, Sound Design, Michael Weise

Production, 2001.

Sebastien Schertenleib is a

research assistant and PhD stu-

dent in the VRlab at EPFL. His

research interests include VR sys-

tem architectures for continuous

VEs. He is also involved in recon-

structing ancient heritage through augmented reality

simulations. He has an MS in computer science from

the Swiss Federal Institute of Technology in Lausanne.

Mario Gutierrez is a research

assistant and PhD student in the

VRlab at EPFL. His research focus-

es on Internet and mobile appli-

cations, interaction techniques for

VEs, and autonomous animation

of virtual humans. He has an MS in computer science

from the Monterrey Institute of Technology, Toluca

Campus, Mexico.

Frederic Vexo is a senior

researcher and project leader of

the VRlab at EPFL. His research

interests include virtual reality,

human–computer interaction,

virtual music performance, and

standards for multimedia content representation. He

has a PhD from Reims University.

Daniel Thalmann is a professor

and director of the VRlab at EPFL.

His research interests include vir-

tual humans, computer anima-

tion, and networked VEs. He has a

PhD in computer science from the

University of Geneva and an Honorary Doctorate from

University Paul-Sabatier in France.

Contact the authors at {Sebastien.Schertenleib,

Mario.Gutierrez, Frederic.Vexo, Daniel.Thalmann}@

epfl.ch.

S E TI N D U S T R Y

S T A N D A R D S

computer.org/standards/

HELP SHAPE FUTURE TECHNOLOGIES • JOIN AN IEEE COMPUTER SOCIETY STANDARDS WORKING GROUP AT

IEEE Computer Society members work together to define standards like IEEE 802, 1003, 1394, 1284, and many more.

802.11 FireWiretoken rings

gigabit Ethernetwireless networks

enhanced parallel ports