Download - How gesture recognition can be implemented as an aid for electroacoustic composition: with a specific focus on the Leap Motion Device

How gesture recognition can be implementedas an aid for electroacoustic composition:

with a specific focus on the Leap Motion Device

Jonathan Higgins

Submitted in partial fullfilment of therequirements for the degree of BMus

Department of MusicUniversity of Sheffield

England

August 2015

Acknowledgements

Firstly I would like to thank my dissertation supervisor Adrian Moore for all the excellent help and

advice you have provided whilst I have been working on this dissertation (as well as most of my

other work). The research areas that you suggested have significantly shaped the direction of this

project and I am extremely grateful for all of your help.

I would also like to thank my partner Mabel, for her help, patience and cups of tea; my friends Alex

and Jay, for their help with troubleshooting bugs in the jh.leap tools; and finally, my parents, for

their support and regular gifts of Dominos pizza.

2

Contents

1 Introduction 4

1.1 Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Input and Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Musical Applications of Gesture Recognition 7

2.1 Early Applications of Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Contemporary Applications of Gesture Recognition . . . . . . . . . . . . . . . . . . 9

2.2.1 The Wiimote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 The Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 The Leap Motion Device 12

3.1 Construction and Vision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Musical Applications of the Leap Motion Device . . . . . . . . . . . . . . . . . . . . 15

4 jh.leap 17

4.1 The Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 jh.leap main . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.2 jh.leap sample player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.3 jh.leap reverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.4 jh.leap tremolo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.5 jh.leap pan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 The Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Conclusion 23

Bibliography 25

3

Chapter 1

Introduction

There are countless peripherals available to musicians that offer methods of human-computer inter-

action beyond the mouse and keyboard; however few of these are available to consumers and fewer

still attain widespread adoption (Collins 2010, p. 198). As well as peripherals specifically designed

for musical purposes, such as MIDI keyboards and drum pads; many other peripherals either offer

a more general method of interaction, for example the Leap Motion Device (Garber 2013), or are

repurposed for use within a musical environment, examples of this include the Playstation Move

(Pearse 2011) and the Wacom Graphics Tablet (Moore 2008).

Emmerson (2000, p. 209) explains that approaches to human-computer interaction within musical

applications can be placed into two main groups: “devices which track and measure physical ac-

tion”; and devices “which analyse the sound produced in performance [...] for the control of sound

production or processing”. The first of these groups has been extensively implemented since the

first developments in electronic music (ibid., p. 209). Examples of this vary from the simple MIDI

keyboard through to complex data gloves (Fischman 2013). The second of Emmerson’s groups

requires significantly higher processing power and more complex software in order to be successful.

Because of this its use is significantly less widespread; one example of this method of interaction is

the OMax system developed at IRCAM (Assayag, Bloch, and Chemillier 2006).

1.1 Gesture Recognition

Gesture recognition is a method of human-computer interaction which fits into the first of Em-

merson’s approaches to interaction. As the name suggests, gestural recognition utilises sensors to

identify and track the gestures and motions of the user. These can be full body movements or

be limited to movements of a hand or finger. For this paper I will be focusing on hand gesture

recognition. Gestures can be defined as a movement that imparts meaning. They differ from purely

functional movements as they also carry information. As Badi and Hussein (2014, p. 871) explain,

the motion of steering a car and describing a circular object with your hands are extremely similar.

4

Steering a car serves only to fulfil a functional purpose whereas motions describing a circular object

contain information about the size of the object.

The variety of gestures the human hand can create is vast, these gestures can be sorted into four main

categories: conversational gestures (gestures which function alongside speech), controlling gestures

(gestures such as pointing or the orientation of hands in a 3D space), manipulative gestures (gestures

which interact with real or virtual objects) and communicative gestures (such as sign language) (Wu

and Huang 1999). Musical applications of gestural recognition usually focus on utilising controlling

and manipulative gestures as these are both the easiest to detect and the most consistent between

users; the applications of pairing these two groups of gestures have been widely explored and are

increasingly expanding (Al-Rajab 2008, p. 12). By being less user dependent than conversational

and communicative gestures, controlling and manipulative gestures are not as prone to variation and

co-articulation, issues that often make gestural recognition difficult (Dix et al. 2004, p. 383).

Dix et al. (ibid., p. 88) suggested in 2004 that the “rich multi-dimensional input” provided by

gestural recognition devices was a solution in search of a problem, as most users do not require such

a comprehensive form of data input and those that do cannot afford it. However with advances

in technology, consumer grade electronics capable of detailed gesture recognition are beginning to

become increasingly affordable and widespread. Because of this, software designers are beginning to

take advantage of the new levels of interaction available to them (Zeng 2012, p. 4).

1.2 Input and Computer Vision

There are two main hardware approaches to relaying information about hand movement to a com-

puter. The first utilises a peripheral that is worn on the hand called a dataglove. There are several

approaches to the construction of a dataglove and each approach employs different sensors to detect

motion. The most common method uses fiberoptic cables attached to the fingers of a lycra glove.

When a finger bends, light leaks from the bend in the fiber optic cable; the glove detects these

fluctuations in light intensity and relays this information back to the computer. The computer uses

this information to map the movement of the hand (one example application of this approach is the

Lady’s Glove developed for Laetitia Sonami (Bongers 2000, p. 482)) (Dix et al. 2004, p. 88). The

second approach uses computer vision.

Computer vision is the “acquisition and processing of visual information” by a computer; this infor-

mation can then be used by the computer for a variety of different applications (Badi and Hussein

2014, p. 876). Approaches to detecting hands through computer vision are numerous. They can

vary from obvious methods such as detecting infrared, through to the more obscure, such as detect-

ing which pixels of a video-feed are skin by analysing the colour of each pixel (Forsyth and Ponce

2003, p. 591). As well as the ability to detect hands, computer vision also allows us to track the

hand’s movements in space. Depth can be tracked in a variety of ways. The most common method

uses the same process our eyes and brain use to detect depth. Using two cameras mounted next to

5

each other the computer can compare the two image streams. By knowing the physical relationship

between the two cameras i.e. how far apart they are, the computer can analyse the difference in

the two images to create a strong sense of depth; this method is known as stereoscopic vision (or

simply stereo vision) (Forsyth and Ponce 2003, p. 321). Another method of measuring depth is an

approach called structured light sensing. Structured light sensors project a known pattern onto an

unknown surface, by analysing the deformation of the pattern the sensor is able to determine the

three dimensional shape of the unknown surface (Weichert et al. 2013, p. 6381).

6

Chapter 2

Musical Applications of Gesture

Recognition

In this chapter I will be examining previous applications of gesture recognition within a musical

environment. I will be looking at how approaches to gestural input and the technology required to

successfully carry it out have developed over the last century and will present two case studies of

contemporary applications of gesture recognition.

2.1 Early Applications of Gesture Recognition

Although often considered cutting edge technology, gesture recognition has a rich history withinelectronic music; the Theremin, patented in 1928 is the earliest example of gesture being used as ameans of interaction within electronic music (Bongers 2000, p. 481). The Theremin works using apair of slightly detuned oscillators that broadcast extremely high frequencies (>1,000,000 hz) over aradio antenna (Theremin and Petrishev 1996, p. 50). Moving a hand nearer to the antenna causesa change in capacitance as the hand and performer ground some of the broadcasted signal. TheTheremin measures these changes and uses them to control pitch and volume (ibid., p. 50). Whilstdeveloping his device Leon Theremin stated:

“I believe that the movement of the hand in space, unencumbered by pressing a stringor bow, is capable of performing in an extremely delicate manner. If the instrumentwere able to produce sounds by responding readily to the free movement of the handsin space, it would have an advantage over traditional instruments.” (ibid., pp. 49-50)

This approach to instrument design was unique for its time and inspires many contemporary devel-

opments in gestural control. Although basic in its ability to recognise gesture, the system was none

the less revolutionary, both for being the first implementation of gestural control and for being a

proof of concept for the worth of gesture recognition.

After the release of the theremin in the 1920’s it wasn’t until the late 1970’s that gesture recognition

began to be explored further. Development of the Sayre Glove in 1979 kick-started the field of gesture

7

recognition as a means of human computer interaction (Sturman and Zelter 1994, p. 32). As this field

began to expand, musicians inevitably began to adopt and develop gesture recognition technology for

their own means, producing pioneering research in computer music and human computer interaction

(for example: (Buxton et al. 1979)). Michel Waisvisz’s The Hands was one of a number of dataglove

systems designed specifically for musical applications in the 1980’s (Roads 1996, pp. 630-635).

Developed at STEIM The Hands was the result of Waisvisz’s experiments with instrument design for

the improved performance of electronic music (Bongers 2000, p. 482). The Hands contained several

different methods of interaction, some were common in many non-wearable musical peripherals -

such as buttons and pressure sensors - however, The Hands expanded on this existing technology by

incorporating mercury tilt switches and ultrasonic sensors to enable aspects of gesture recognition

(Bongers 2007, pp. 11-12). Other dataglove devices that were developed for musical applications in

the 1980’s and early 90’s include the Lady’s Glove (discussed in section 1.2) and modified versions

of the Mattel Powerglove, a dataglove released by Nintendo for use as a game controller (ibid.,

pp. 12-13).

Alongside the developments in dataglove technology, during the 80’s and 90’s musicians were also

looking at alternative input methods for gesture recognition. One of these methods was a peripheral

called an electronic conductor’s baton. Several different electronic conductor’s batons were produced

during this time, employing a variety of different sensors to detect the motions of a conductor. These

batons allowed real time control of a synthesised or acousmatic composition (Roads 1996, p. 654).

Some batons like the various MIDI Batons developed at Queen’s University, Canada, employed

hardware sensors like accelerometers within the baton to detect the movements of the conductor.

This information could then be relayed to a computer for analysis and detection of gestural content

(Keane and Gross 1989) (Keane and Wood 1991). Another method for tracking the baton employed

rudimentary computer vision systems that allowed the computer to watch the gestures produced by

the conductor. Many of the computer vision systems utilised cameras fitted with infrared filters to

track an infrared light on the end of the baton (Morita, Hashimoto, and Ohteru 1991, p. 47) (Marrin

et al. 1999). Pairing the baton with a dataglove the system developed by Morita, Hashimoto, and

Ohteru (1991) allowed the conductor to relay further information to the computer such as changes

in dynamic. Conducting falls into the category of communicative gestures (more: section 1.1). Like

spoken language, communicative gestures are user dependent; as voice recognition software often

struggles to decipher various accents, so too does gesture recognition software struggle to decipher

each user’s ‘gestural accent’. Many of these batons were hampered by their ability to work precisely

when changing between conductors and because of this many systems faced a trade-off between

stability and sensitivity (Keane and Gross 1989, p. 153) (Morita, Hashimoto, and Ohteru 1991,

p. 52). This inconsistency, coupled with the expense of producing a baton, is possibly why these

electronic batons have become less popular in recent years.

Advancements in computer vision in the 80’s and 90’s were beginning to allow musicians to interact

with computers without the need to hold or wear peripherals. The Infrared-based MIDI Event

Generator designed by Genovese et al. (1991) is a gestural controller that operated using infrared

to track objects moving in space above the controller. As we will see in chapter 3, this device

8

utilises a similar hardware approach to computer vision as the Leap Motion device. By using four

groups of infrared transmitters and receivers mounted in a 25cm square, the Infrared-based MIDI

Event Generator could track movements in a pyramid space above the device (Genovese et al. 1991,

pp. 2-3). By transmitting infrared light in a pyramid, any object that moved into the pyramid would

reflect the infrared light back towards the device. These reflections were then picked up by the

infrared receiver which allowed the device to see (ibid., p. 3). This data was then analysed and

output as MIDI allowing the device to control any MIDI capable hardware (ibid., pp. 5-6). Another

early approach to computer vision used sonar. By utilising ultrasonic signals, Chabot (1990, p. 20)

was able to create a live electronic music performance system capable of computer-vision-based

gesture recognition. Interestingly, Chabot (ibid., p. 27) emphasised the importance of good software

to accompany new hardware. He states: “We have seen too many quick hacks jeopardizing the

use of great hardware gesture controllers”. This emphasis on software is missing from many papers

written on musical human-computer interaction in this era. As we move into the next section we will

explore applications of consumer-grade electronics and how innovation has begun to shift towards

software developments.

2.2 Contemporary Applications of Gesture Recognition

With gesture recognition technology becoming more affordable and widespread, applications of ges-

ture recognition are gradually becoming increasingly popular amongst musicians. To see this, one

need only look at the number of papers submitted to NIME (New Interfaces for Musical Expres-

sion) on gesture recognition in recent years (NIME 2015). Although extensive research into the

development of new gestural devices specifically for musical applications still exists, there has been

a significant shift in recent years towards utilising consumer grade gesture recognition devices for

musical applications. This has been prompted by the widespread adoption of gesture recognition

within console gaming as a means of interaction through peripherals such as the Wiimote and Kinect

(Collins 2010, p. 197).

2.2.1 The Wiimote

The Nintendo Wii games console was released in 2006 and to date has sold over 101 million units

worldwide (Nintendo 2015, p. 2). The controller that came with the Nintendo Wii (the Wiimote)

was a wireless hand-held device that tracked its position in space, as well as acceleration on X,

Y, and Z axes, and the controllers rotation, pitch and yaw (Wingrave et al. 2010, p. 72). One of

the two primary sensors that the Wiimote employs utilises a separate sensor bar which contains a

number of infrared LEDs at fixed widths (ibid., p. 74). The Wiimote contains an infrared camera

which when pointed at the sensor bar allows the Wiimote to calculate its relative position in space

from the sensor bar (ibid., p. 74). The second primary sensor is a three axis accelerometer similar

to those found in mobile phones (ibid., p. 75). The Wiimote is not without its draw backs however.

9

By utilising an accelerometer rather than a gyro-meter the controller has no gravitational bearing

and because of this its measurements are notoriously approximate (Pearse 2011, p. 126).

One of the most popular applications of the Wiimote is to trigger sound files like a virtual drum stick

(Collins 2010, p. 197). By utilising a peak detection algorithm on the output of the accelerometer it

is possible to determine when a drumming like motion has been made (Kiefer, Collins, and Fitzpatrick

2008). By utilising both the sensor bar and the Wiimote, Miller and Hammond (2010) were able to

create a virtual instrument that mimicked the violin or cello. The performer plays the instrument

by pressing buttons on the Wiimote to determine finger positions and uses the sensor bar as a

bow (ibid.). This approach was interesting as it required the performer to hold the Wiimote still

and to move the sensor bar rather than the intended configuration where the sensor bar stays in a

fixed position (ibid., p. 497). Peng and Gerhard (2009) used the Wiimote to create an electronic

conductor’s baton by attaching an infrared LED to the end of a conventional conductor’s baton.

Using the infrared camera in the Wiimote to track the motion of the baton they were able to create

a low cost alternative to other computer-based conducting systems. This system works using the

same principle as the batons developed by Morita, Hashimoto, and Ohteru (1991) and Marrin et al.

(1999) which were discussed in section 2.1.

The Wiimote introduced many people to gesture recognition and its adaptation by consumers has

had a profound effect on the field of human-computer interaction (Wingrave et al. 2010, p. 71).

However the Wiimote as a musical controller is limited by its hardware and the approximate data

that it outputs. This means that very few useful applications have been developed for it. Rapid

advancements in gesture recognition technology since the Wiimote’s release mean that consumers

have access to more accurate hardware, such as the Kinect. Because of this the Wiimote has been

all but forgotten by most musicians in the last few years.

2.2.2 The Kinect

Another game controller which allows gestural input is the Kinect sensor. Released by Microsoft in

2010, the Kinect was designed for use with the Xbox 360 and Xbox One games consoles. By 2013

Microsoft had sold in excess of 24 million Kinect sensors (Microsoft 2013). The Kinect features a

full RGB camera as well as an infrared projector, infrared camera and a four microphone array (Zeng

2012, p. 4). To detect depth the Kinect sensor utilises structured light sensing (Weichert et al.

2013, p. 6381). By projecting a constant pattern of infrared laser dots onto a scene and recording

the results with the infrared camera, the device is able to compare the resulting pattern of dots to

a reference pattern. By analysing the displacement between the two images the Kinect device can

use this information to estimate depth (Khoshelham and Elberink 2012, p. 1438). Using either the

drivers provided by the OpenKinect project or the Kinect SDK it is possible to connect the Kinect

sensor to a computer and utilise its data output for functions other than gaming (OpenKinect 2015)

(Microsoft 2015).

Since its release there have been many projects that utilised the Kinect sensor for gesture controlled

10

musical applications. One such project is the Crossole system developed by Senturk et al. (2012).

Crossole is a meta-instrument which utilises gesture recognition as its main method of interaction. A

meta-instrument implements a “one-to-many mapping between a musician’s gestures and the sound

so that a musician may perform music in a high level instead of playing note by note” (ibid., p. 1).

Using the Kinect’s depth data the system tracks the position of both the user’s hands. The user can

then perform a variety of controlling gestures (pointing, swiping) and manipulative gestures (drag

and drop) (ibid., p. 3). These gestures allow the user to control a myriad of parameters including

tempo and dynamic, as well as controlling melody sequencing and chord changes (ibid., pp. 2-3).

Another musical application of the Kinect sensor is the USSS-Kinect environment and the USSS-

KDiffuse tool designed by Pearse (2011). The USSS-Kinect environment allows the user to place

virtual spheres that can be touched, intersected or moved to control a musical parameter. What

this controls can be defined by the user as the data is sent out via Open Sound Control (OSC), a

data transfer protocol which is favourable over MIDI control messages (ibid., p. 128). OSC employs

networking technology to transfer data for sound control between devices; unlike MIDI the data

values and naming scheme are user defined allowing for complete control over the output (Wright

2002). One example application of how the USSS-Kinect environment can be employed is the

USSS-KDiffuse. A four-by-eight matrix of spheres in the USSS-Kinect environment controls a four-

in eight-out diffusion system. Each row of the matrix of spheres represented a sound source and

each column represented a different speaker routing (Pearse 2011, pp. 128-129). The system allows

the user to interact with diffusion in a unique way and produce spacial effects that would have been

extremely difficult, if not impossible with traditional diffusion setups.

The Kinect sensor allows for significantly more precise input than the Wiimote but costs nearly four

times as much. The wealth of depth information output by the Kinect allows for detailed interaction

that before its release was unheard of in consumer grade electronics. The Kinect’s skeletal tracking

is limited and is unable to track fingers. The release of the v2 Kinect and SDK in late 2014

introduced the ability to track more joints including thumbs which allows for more detailed methods

of gestural interaction. It will be interesting to see how in the coming years musicians utilise these

new features.

11

Chapter 3

The Leap Motion Device

In the past decade console gaming has caused a significant rise in the popularity of gesture recogni-

tion. Because of this, manufacturers have begun to produce gesture recognition hardware designed

specifically for interfacing with computers. Released in July 2013, the Leap Motion was one of the

first of these peripherals to come to market. In this chapter I will be looking at how the Leap Motion

device functions and how it has been utilised in musical applications.

3.1 Construction and Vision Processes

Figure 3.1: Construction of the Leap Motion Device (Weichert et al. 2013, p. 6382)

There is no official documentation on how the Leap Motion functions. However, by analysing the

construction of the device and patents filed by Leap Motion, Inc. and David Holz (the original

inventor of the Leap Motion device), we can theorise on how the Leap Motion operates. As can be

seen from figure 3.1 the Leap Motion’s hardware is surprisingly simple, formed only of three infrared

LEDs and a pair of infrared cameras. The peripheral is a little larger than a match box and is

designed to sit in front of a computer. It connects to the computer via a microUSB 3.0 cable. The

Leap Motion’s effective range is approximately 25 to 600 millimetres above the device and the field

of view is an inverted pyramid centred on the device (Guna et al. 2014, p. 3705). It is probable that

the Leap Motion utilises edge detection to distinguish between objects and their background. Edge

12

detection is prone to inaccuracies due to changes in lighting conditions. The three infrared LEDs

likely serve to remedy this as they can be used to increase the contrast of the image as outlined by

D. Holz and Yang (2014).

Due to the Leap Motion containing two cameras Guna et al. (2014, p. 3705) theorise that the device

utilises the stereo vision principle to detect motion and depth; although this is a reasonable theory,

patents filed by Leap Motion, Inc. suggest that this is incorrect. “Systems and methods for capturing

motion in three-dimensional space” is a patent granted to David Holz and Leap Motion, Inc. for

a method of detecting the shape and motion of a three dimensional object, utilising two or more

cameras as well as one or more light sources (David Holz 2014b). The patent suggests that once

the Leap Motion’s software has applied edge detection to locate objects it then analyses the objects

for changes in light intensity and uses this to create a series of ellipse-shaped 2D cross-sections of

the object. These 2D cross-sections can then be pieced together to create a 3D image of the object

(figure 3.2) (ibid., p. 3). By using two cameras instead of one, an object can be viewed from at

least four different vantage points. The increased vantage points help to capture a full picture of

the changes in light density across the surface of the object. This in turn increases the accuracy

of each 2D cross-section (ibid., pp. 5-6). It is discussed in the patent that the accuracy of this

process can be increased by using multiple light sources each tuned to emit different wavelengths.

Each wavelength of light can be identified individually on the object to speed up the process of

determining spatial position (ibid., p. 2). As the Leap Motion employs three different infrared LEDs

it is possible that this process is being implemented.

Figure 3.2: Graphic illustration of hand model generated using 2D cross-sections (David Holz 2014b)

13

Looking closely at figure 3.1, small pieces of plastic can be seen partially obscuring the outer two

IR LED’s. It is possible that these allow the device to determine the orientation of an object in a

manner similar to the process employed by the Kinect sensor (structured light sensing). David Holz’s

system for determining the orientation of an object in space utilises a partially obscured light source

to create a shadow line (D. Holz 2014). The system captures images of objects intersecting this line

and by analysing how the shadow displays on the object, information about the orientation of the

object can be established (ibid., p. 4). This is another process that is probably being implemented.

This assumption is evidenced by the fact that the Leap Motion informs the user of finger prints

or smudges on the glass, a process that is outlined in the patent as a useful byproduct of utilising

shadow lines for motion tracking (ibid., p. 1).

Another process developed by Holz (also probably implemented in the Leap Motion) utilises inter-

lacing and alternating light sources to remove noise from the image and reduce latency (David Holz

2014a). By alternating between two light sources for each video frame captured by a motion sensor

it is possible to compare the two images and remove noise from the image (ibid., pp. 1-2). The two

light sources available to the Leap Motion device are the infrared LEDs and ambient room lighting.

Interpolation can be used to reduce the latency caused by having to constantly perform analysis on

alternating images. By only transporting half the lines of an image (alternating between odd and

even lines) to the readout circuitry it is possible to increase the frame rate of a video at the expense

of halving the resolution of each image (ibid., p. 1). Despite utilising USB 3.0 technology the Leap

Motion is backwards compatible with USB 2.0 devices. If the Leap Motion does utilise interlacing

it would be particularly useful when dealing with the relatively low bandwidth of USB 2.0. When in

use the Leap Motion visualiser reports that the device outputs in excess of 100 frames per second.

This high frame rate is partially responsible for the Leap Motion’s precise computer vision.

Although relatively rudimental individually, the combination of these software processes allow the

device to gather a large amount of data from its field of view. These innovations enable the Leap

Motion to produce accurate results with a simple hardware set-up.

3.2 Accuracy

Testing the accuracy of the Leap Motion device is difficult as it is designed to specifically recognise

hands and therefore conventional sensor testing methods for accuracy need to be modified in order

to be utilised. Despite this, some studies into the accuracy and robustness of the device’s motion

sensing capabilities have been carried out. Upon its release the Leap Motion was purported as having

sub millimetre accuracy down to 0.01mm (Garber 2013, p. 23). However proof for these impressive

claims were not provided by Leap Motion, Inc.. Guna et al. (2014, p. 3202) found that the Leap

Motion is indeed capable of sub millimetre accuracy, but noted that the performance of the device

was inconsistent particularly at distances of more than 250mm above the controller. Weichert et al.

(2013, p. 6391) found the Leap Motion to have an average accuracy of 0.7mm for static discreet data

input and an average accuracy of 1.2mm for continuous data input. Weichert et al. (ibid., p. 6387)

14

also found the device to be inconsistent across the axes. The x axis significantly out-performed the y

and z axes in all of their accuracy and repeatability tests. Although these findings are not consistent

with the purported 0.01mm accuracy, it is important to note that the human hand is generally only

capable of a maximum accuracy of between 0.2mm and 1.1mm, so an average of 0.7mm still allows

for extremely detailed input (Weichert et al. 2013, p. 6383).

3.3 Musical Applications of the Leap Motion Device

Since the release of the Leap Motion device many musical applications have been created for it.

These applications vary from note based compositional environments such as The BigBang Rubette,

through to versatile MIDI and OSC generators such as GECO (Tormoen, Thalmann, and Mazzola

2014) (GECO 2015). This section will analyse various existing approaches for utilising the Leap

Motion device in musical applications, as well as identifying the advantages and drawbacks of these

implementations of gestural control.

Arguably one of the most popular musical applications available for the Leap Motion is GECO

(ibid.). By converting simple gestures into discrete MIDI or OSC messages GECO allows the user

to control any software or hardware that accepts these protocols. Despite not directly manipulating

or generating sound, GECO can be extremely useful to musicians who are looking to incorporate

the rich multidimensional input of the Leap Motion device into their existing workflow. The simple

interface and setup allows musicians to easily utilise the Leap Motion device without the need to

understand the complexities of programming gesture recognition. However this simplicity of use

does have some drawbacks: only very simple gestures are recognised by GECO and these mostly

consist of just motion tracking. By focusing on the position of the hands instead of interpreting what

the hands are doing, the software only utilises the bare minimum of data that gesture recognition is

capable of producing. Despite its rudimentary gesture recognition capabilities, musicians have begun

to develop tools designed specifically to work with GECO. Musical software development utilising

GECO is aided by its ability to save custom presets into a file that can be distributed by developers,

allowing a fast setup for the end user. The Greap project developed by Konstantinos Vasilakos

(2015) utilises GECO as a link between a Supercollider patch and the Leap Motion device as there

is currently no way to natively interact with the Leap Motion within Supercollider. GECO provides

a convenient work-around, allowing musicians to utilise software with which they are familiar whilst

gaining the benefits of gestural input.

Tekh Tonic developed by Ethno Tekh is also a MIDI and OSC generator. However unlike GECO it

utilises manipulative gestures and physics simulations to create a sophisticated gestural interaction

environment (EthnoTekh 2015). Despite the more advanced gesture recognition provided by Tekh

Tonic it has not achieved the same level of success as GECO. This can likely be attributed to the

increased complexity of setup as well as flaws in the design of the software. The majority of physics

simulations utilised by Tekh Tonic take place inside a cuboid space. However, as the field of vision of

the Leap Motion is an inverted pyramid, interaction in the extremities of the simulation is unreliable

15

and at times impossible (Guna et al. 2014, p. 3705). Additionally many of the simulations rely heavily

on the y-axis for interaction and, as discussed in section 3.2, the y-axis is the most inaccurate and

unreliable (Weichert et al. 2013, p. 6387). This reliance on the y-axis, coupled with poor interaction

at the extremities of the simulation, means that Tekh Tonic often produces undesired results. This

is unfortunate as when the physics simulations work successfully, they provide a unique and intuitive

method of interaction with many creative possibilities.

Hantrakul and Kaczmarek (2014, p. 648) discuss a variety of musical applications for the Leap Motion

device in their paper “Implementations of the Leap Motion in sound synthesis, effects modulation

and assistive performance tools”. Their research focuses on utilising the Leap Motion for the live

performance of electronic music. In all of their projects Hantrakul and Kaczmarek (ibid., p. 649)

used the aka.leapmotion object for MAX MSP to interface with the Leap Motion and built a patch

to extract and interpret the desired data from the device (Akamatsu 2014) (Cycling74 2015). From

this patch Hantrakul and Kaczmarek (2014) developed several tools for live performance including

a system similar to GECO that interfaces with Ableton live. Of particular note is their granular

synthesis tool. Their granulation tool utilises a mixture of controlling and manipulative gestures.

Grains are triggered by depressing a virtual piano key with the left hand, whilst other parameters

are mapped to the x, y and z positions of both the user’s hands. This allows the user to manipulate

a multitude of parameters at once in a manner that would be extremely difficult, if not impossible

with conventional hardware (Hantrakul 2014). Unfortunately the software’s hand identification

is extremely rudimental: the hand with the smallest x coordinate is assigned the left hand and

conversely the hand with the largest x coordinate, the right hand (Hantrakul and Kaczmarek 2014,

p. 649). Due to this unsophisticated method of hand detection, certain combinations of parameters

within the software are impossible to produce, as they require the left and right hands to cross over,

whereupon the software immediately reassigns them. However the authors do discuss the possibility

of implementing the new Skeletal Tracking API in the future. It will be interesting to see how their

projects develop after the implementation of improved hand detection.

BigBang Rubette is an extension for Rubato Composer, a visual music programming environment

that utilises various mathematical theories for the composition and analysis of music. Designed to

bring real-time interaction and gestural control to Rubato Composer, BigBang Rubette was originally

designed to be controlled with a mouse (Thalmann and Mazzola 2008). Currently BigBang Rubette

is being expanded to include gestural input from the Leap Motion device (Tormoen, Thalmann,

and Mazzola 2014). Working with synthesised sound BigBang Rubette allows the user to compose

music by “creating and manipulating Score-based denotators” (similar to MIDI data) in a virtual

three dimensional space (ibid., p. 208). Unfortunately BigBang Rubette only allows for composition

with synthesised sound and this limits its use as a compositional tool for many musicians.

16

Chapter 4

jh.leap

jh.leap is a suite of bespoke compositional tools that utilise the Leap Motion device. Built in

the visual music programming software Pure Data the tools utilise the leapmotion object built by

Chikashi Miyama (2013) to interface with the Leap Motion device. Designed for the manipulation

of audio, jh.leap provides a variety of audio processing tools as well as utilities for implementing the

Leap Motion device into any Pure Data patch. These tools were built as an aid for my composition

and as support for this dissertation.

The tools can be used individually, or be connected to other tools (Pure Data objects/patches) to

create an effects chain. All of the audio processing tools contain motion capture systems allowing

the user to record and automate gestural input, making it possible to utilise more than one tool

at once. Only a very basic understanding of Pure Data is required to use jh.leap and every tool

within the suite comes with a detailed help file explaining how it works, making it accessible to both

beginners and advanced users. jh.leap utilises a variety of controlling and manipulative gestures,

creating a rich multidimensional environment for sound manipulation that is designed to be intuitive

and promote creativity. Where possible, gestures have been specifically chosen to mimic the sounds

produced, thus creating a sense of causation. The tools can be downloaded from appendix 1 or

alternatively the most up-to-date version can be found on github (Higgins 2015).

4.1 The Tools

4.1.1 jh.leap main

To function correctly all of the jh.leap tools require jh.leap main to run. jh.leap main does not

process any audio; instead it acts as a bridge between the Leap Motion device and the rest of the

tools allowing them to communicate with each other. It also presents some useful information to the

user regarding the Leap Motion’s field of view and the CPU load of Pure Data. A visualiser of the

Leap Motion’s field of view can be toggled from jh.leap main and it opens in a new GEM (Graphics

17

Figure 4.1: jh.leap main and it’s options panel

Environment for Multimedia) window (figure 4.2). Visual feedback allows the user increased precision

when interacting with the Leap Motion as it is clear where the user’s hands are in it’s field of view.

The visualiser is customisable and its options are displayed with the “Opt” toggle. The ability to

change the window size, as well as what is rendered means the visualiser can be used effectively on

almost any size screen and run on computers with low processing power.

Figure 4.2: jh.leap main’s visualiser displaying two hands

4.1.2 jh.leap sample player

jh.leap sample player is the main method of sound file playback provided by the tools. Inspired by

the granular synthesis tool discussed in section 3.3, jh.leap sample player utilises the manipulative

gesture of pressing a virtual piano key to trigger a sound file (Hantrakul and Kaczmarek 2014). The

motion of depressing a finger is detected by measuring velocity. Once a finger’s negative velocity

along the y-axis crosses a set threshold a ”bang” is output (a ”bang” is a boolean value used by Pure

Data as a trigger). A soundfile can be loaded per finger by clicking the corresponding “Open sf”.

18

Figure 4.3: jh.leap sample player tool

Once all soundfiles are loaded the user can play back a sound file by depressing a finger in the air.

Although similar in movement to playing a physical midi keyboard this gesture is much freer; because

of this it is significantly easier to create natural sounding rhythms as well as rapid passages. The

sample player works particularly well with micro sounds. These can be played back rapidly to create

various compound gestures. The tool also works well with longer drone based sounds (particularly

pitched material). Different layers can be overlapped subtly over time to create gradually evolving

textures (when working with long textures the “Stop all” button is particularly useful when you

accidentally trigger a 40 minute sound file). The record and loop functions (which also feature on

all other tools) allows for gestures to be recorded and played back. This can be used to create a

loop or to change the sounds whilst keeping the gestures the same.

4.1.3 jh.leap reverb

Figure 4.4: jh.leap reverb tool

jh.leap reverb utilises a variety of manipulative gestures to change the parameters of the effect. To

control the reverb users manipulate a virtual space in which their hands are the room and the Leap

Motion device is the sound source. Room size is controlled by how far apart the user’s hands are on

19

the x-axis; wet/dry is controlled by how far the user’s hands are from the Leap Motion device along

the y-axis; damping is mapped to the z-axis; and finally, freeze is toggled on or off by clenching a fist

as if to catch the sound. The ability to control so many parameters at once allows the composer to

simply focus on the composition rather than focusing on automating a variety of parameters so that

multiple events can happen at the same time. jh.leap reverb is also capable of motion capture. This

can be toggled for all parameters or selected parameters allowing the composer complete control

over the manipulation.

4.1.4 jh.leap tremolo

Figure 4.5: jh.leap tremolo tool

Similar to the reverb, jh.leap tremolo utilises manipulative gestures that mimic the process taking

place to control the parameters of the tool. The user shapes a virtual sine wave to vary the amount

of modulation the effect has on the input sound. The depth of the tremolo is controlled by how far

apart the hands are on the y-axis; frequency is controlled by how far apart they are on the x-axis

and the smoothness of the wave is shaped by rotating the hands (flat palm down being closer to

a square wave and 45 degree angles closer to a sine wave). Palm rotation is not measured by the

Leap Motion device; instead this is calculated by measuring the difference along the y-axis between

the thumb and little finger. The ability to control the tremolo in this way makes it an extremely

expressive tool capable of both subtle undulations and harsh rhythmic cuts, as well as the ability to

rapidly transition organically between the two. Also similar to jh.leap reverb, this tool is capable of

motion capture on individual parameters.

4.1.5 jh.leap pan

The final tool currently part of the jh.leap suite is jh.leap pan, a multichannel panner with cus-

tomisable speaker layouts. jh.leap pan has presets for stereo, quadrophonic, 5.1 and eight channel

setups. However, it is possible to utilise the tool for any speaker setup up to eight channels using

the “Speaker X/Y” sliders to move speakers to custom positions. Unlike the other tools in the

suite jh.leap pan only uses one hand as input. It tracks the position of the hand along the x and

z-axes to control panning position and the user can clench a fist to make the sound wiggle along

20

Figure 4.6: jh.leap pan tool set up for 8 channel output

the x-axis. Currently the tool only accepts a mono input. Plans to expand the tool to accept stereo

input and utilise both of the user’s hands are currently in development. Like the other tools it is

also possible to record automation. This is particularly useful when using panning patterns that can

loop seamlessly, such as circular motions. The tool also has a toggle for random panning which

makes a sound jump around in space. This is a quick way to create interesting spatialisation whilst

composing active gestural passages of music.

4.2 The Utilities

As well as the audio processing tools listed above, the jh.leap suite also includes a variety of practical

tools for interfacing with the Leap Motion device. These allow the user to quickly implement

gestural control into any Pure Data patch. The utilities recognise a variety of gestures: from

simple controlling gestures, such as tracking hand positions (jh.leap hand); through to more complex

gestures such as clenching a fist (jh.leap fists), swiping a hand (jh.leap swipe) or depressing a finger

(jh.leap keyboard). Each utility comes with a detailed help file (see figure 4.7). These follow a

similar layout to the default Pure Data help files, providing information regarding what the object

does as well as the inlets, outlets and creation arguments of the utility. The ability to quickly view

this information helps to speed up the process of implementing the Leap Motion into a patch, making

it accessible to users who have never worked with gesture recognition before.

21

Figure 4.7: Help file for jh.leap hand vel

22

Chapter 5

Conclusion

This dissertation has explored the rich history of gesture recognition within electronic music. The

Theremin (1928) is the earliest example of gesture control being implemented within electronic

music (Bongers 2000, p. 481). Since then advances in computer technology have allowed musicians

to explore the possibilities of gesture recognition in depth; developing a variety of tools to track

gesture. Due to constraints in computer vision technology, the majority of early musical applications

of gesture recognition relied on wearable or hand held peripherals to track motion. Some examples

of these devices include data gloves such as: Michele Waisvisz’s The Hands and the Lady’s Glove;

as well as electronic conductor’s batons, a variety of which were developed at Queen’s University,

Canada (Roads 1996, pp. 630-635) (Bongers 2000, p. 482) (Keane and Gross 1989). Advances in

computer vision in the late 1990’s and early 2000’s began to allow for gesture recognition to move

beyond wearable peripherals. However gesture recognition was for a long time too expensive to

receive widespread adoption.

Innovations in console gaming over the last decade have brought gesture recognition into the main-

stream. Gaming companies produced low cost peripherals for gesture control that were unparalleled

by previous consumer grade electronics. Peripherals such as the Wiimote and the Kinect have been

repurposed in a variety of ways to produce cost effective musical applications for gesture recognition.

Systems utilising these peripherals, such as the USSS-Kinect, provide methods of interaction with

musical applications in ways that until recent years were unattainable for many outside of research

institutions (Pearse 2011).

Released in 2013 the Leap Motion device was designed specifically to interface with a computer.

Capable of tracking hands within ' 0.7mm, the Leap Motion brought unprecedented accuracy to

the consumer price bracket (Weichert et al. 2013). Rather than utilising expensive hardware, the

Leap Motion instead likely utilises several innovative software processes to produce such accurate

results. Several musical applications have been developed for the Leap Motion, including the popular

GECO software, which converts gestures to MIDI and OSC messages (GECO 2015). Throughout

this dissertation I have attempted to identify processes that are employed by the Leap Motion device.

I have done this by providing the first detailed analysis of patents filed by Leap Motion, Inc..

23

The jh.leap tool suite provides a variety of compositional tools that utilise gesture recognition.

Although currently limited in the number of tools available, the suite serves as a useful proof of

concept for the merits of gesture recognition as an aid for electroacoustic composition. I have

utilised the tools myself when composing, most notably in my piece Digital Spaces. When using

the jh.leap suite, I have found that they allow me to quickly create interesting and organic gestural

material, particularly when working with micro-sounds. The ability to manipulate and shape audio

using only your hands brings an element of tangibility to the compositional process. This sense of

physically manipulating sound is often devoid when composing using only a mouse and keyboard.

Although some other compositional tools that utilise the Leap Motion do exist; jh.leap is to the

authors knowledge, the first compositional environment for the manipulation of audio to utilise the

Leap Motion device.

I intend to continue the development of the jh.leap suite and currently have plans to implement a

granulation tool, as well as a third order ambisonic panner. To increase the accuracy of the tools I

am currently in the process of updating Chikashi Miyama’s leapmotion object to work with the new

skeletal tracking API (Miyama 2013). When complete this will allow the tools to better keep track

of fingers locations (even when blocked by the rest of the hand), as well as the ability to identify and

assign which is the left or right hand. Beyond this I would like to research how other peripherals for

gesture control can be implemented into the suite and compare what effect the peripheral has on the

compositional process. Finally, research into providing mid-air haptic feedback for gesture control

using the Leap Motion - called Ultrahaptics - is currently being developed at the University of Bristol

(Carter et al. 2013). When this hardware becomes available, it will be interesting to investigate how

haptic feedback effects gesture recognition as a compositional tool.

24

Bibliography

Akamatsu, Masayuki (2014). aka.leapmotion. url: http://akamatsu.org/aka/max/objects/.

Assayag, G, G Bloch, and M Chemillier (2006). “OMAX-OFON”. In: Sound and Music Computing

(SMC) 2006. Marseille.

Badi, Haitham Sabah and Sabah Hussein (2014). “Hand posture and gesture recognition technol-

ogy”. In: Neural Computing and Applications 25.3-4, pp. 871–878.

Bongers, A.J (2000). “Interaction in multimedia art”. In: Knowledge-Based Systems 13.7-8, pp. 479–

485.

– (2007). “Electronic Musical Instruments: Experiences of a New Luthier”. In: Leonardo Music

Journal 17, pp. 9–16.

Buxton, William et al. (1979). “The Evolution of the SSSP Score Editing Tools”. In: Computer

Music Journal 3.4, pp. 14–25.

Carter, Tom et al. (2013). “Ultrahaptics: Multi-Point Mid-Air Haptic Feedback for Touch Surfaces”.

In: UIST’13.

Chabot, Xavier (1990). “Gesture Interfaces and a Software Toolkit for Performance with Electronics”.

In: Computer Music Journal 14.2, pp. 15–27.

Collins, Nick (2010). Introduction to Computer Music. Hoboken: John Wiley & Sons Inc.

Cycling74 (2015). Cycling 74 MAX MSP. url: https://cycling74.com/.

Dix, Alan et al. (2004). Human-Computer Interaction (3rd Edition). Essex, England: Pearson Prentice-

Hall.

Emmerson, Simon (2000). Music, Electronic Media and Culture. Aldershot: Ashgate.

EthnoTekh (2015). Tekh Tonic. url: http://www.ethnotekh.com/software/tekh-tonic/.

Fischman, Rajmil (2013). “A Manual Actions Expressive System (MAES)”. In: Organised Sound

18.03.

Forsyth, David and Jean Ponce (2003). Computer Vision: a modern approach. N.J. : Prentice Hall:

Upper Saddle River.

Garber, Lee (2013). “Gestural Technology: Moving Interfaces in a New Direction [Technology

News]”. In: Computer 46.10.

GECO (2015). url: http://uwyn.com/geco/.

Genovese, V et al. (1991). “Infrared-Based MIDI Event Generator”. In: Proceedings of the Interna-

tional Workshop on Man-Machine Interaction in Live Performance. Computer Music Department

of CNUCE/CNR. Pisa, pp. 1–8.

25

Guna, Joze et al. (2014). “An Analysis of the Precision and Reliability of the Leap Motion Sensor

and Its Suitability for Static and Dynamic Tracking”. In: Sensors 14.2.

Hantrakul, Lamtharn (2014). Linked Media For ICMC 2014. url: http://lh-hantrakul.com/

2014/04/15/linked-media-for-icmc-2014/.

Hantrakul, Lamtharn and Konrad Kaczmarek (2014). “Implementations of the Leap Motion in sound

synthesis, effects modulation and assistive performance tools”. In: Proceedings ICMC SMC 2014.

Athens, Greece, pp. 648–653.

Higgins, Jonathan (2015). jh.leap tools. url: https://github.com/j-p-higgins/jh.leap_

tools.

Holz, D. (2014). Determining the orientation of objects in space. US Patent App. 14/094,645. url:

https://www.google.com/patents/US20140267774.

Holz, D. and H. Yang (2014). Enhanced contrast for object detection and characterization by optical

imaging. US Patent 8,693,731. url: https://www.google.com/patents/US8693731.

Holz, David (2014a). Object detection and tracking with reduced error due to background illumina-

tion. US Patent App. 14/075,927. url: https://www.google.com/patents/US20140125815.

– (2014b). Systems and methods for capturing motion in three-dimensional space. US Patent

8,638,989. url: https://www.google.com/patents/US8638989.

Keane, David and Peter Gross (1989). “The MIDI Baton”. In: Proceedings of the International

Computer Music Conference 1989, pp. 151–154.

Keane, David and Kevin Wood (1991). “The MIDI Baton III”. In: Proceedings of the International

Computer Music Conference 1991, pp. 541–544.

Khoshelham, Kourosh and Sander Oude Elberink (2012). “Accuracy and Resolution of Kinect Depth

Data for Indoor Mapping Applications”. In: Sensors 12.12.

Kiefer, Chris, Nick Collins, and Geraldine Fitzpatrick (2008). “Evaluating the Wiimote as a Musical

Controller”. In: Proceedings of the International Computer Music Conference 2008. Belfast.

Marrin, T. et al. (1999). Apparatus for controlling continuous behavior through hand and arm

gestures. US Patent 5,875,257. url: http://www.google.co.uk/patents/US5875257.

Microsoft (2013). Xbox Execs Talk Momentum and the Future of TV. url: http : / / news .

microsoft.com/2013/02/11/xbox-execs-talk-momentum-and-the-future-of-tv/.

– (2015). Kinect SDK. url: http://www.microsoft.com/en-us/kinectforwindows/.

Miller, Jace and Tracy Hammond (2010). “Wiiolin: a virtual instrument using the Wii remote”. In:

Proceedings of the 2010 Conference on New Interfaces for Musical Expression. Sydney.

Miyama, Chikashi (2013). Leapmotion PD Object. url: http://puredatajapan.info/?page_

id=1514.

Moore, Adrian (2008). “Fracturing the Acousmatic: Merging Improvisation with Disassembled Acous-

matic Music”.

Morita, Hideyuki, Shuji Hashimoto, and Sadamu Ohteru (1991). “A Computer Music System that

Follows a Human Conductor”. In: IEEE Computer 24.7, pp. 44–53.

NIME (2015). url: http://www.nime.org/?s=gesture+recognition.

Nintendo (2015). Consolidated Sales Transition by Region 2015. Sales Report.

26

OpenKinect (2015). url: https://github.com/OpenKinect/libfreenect.

Pearse, Stephen (2011). “Gestural Mappings: Towards the Creation of a Three Dimensional Com-

positon Environment”. In: Proceedings of the International Computer Music Conference 2011.

University of Huddersfield. Huddersfield, UK, pp. 126–129.

Peng, Lijuan and David Gerhard (2009). “A Wii-based gestural interface for computer conducting

systems”. In: Proceedings of the 2009 Conference on New Interfaces for Musical Expression.

Pittsburgh, PA, United States.

Al-Rajab, Moaath (2008). “Hand Gesture Recognition for Multimedia Applications”. PhD thesis.

University of Leeds.

Roads, Curtis (1996). The Computer Music Tutorial. United States: Mit Press.

Senturk, Sertan et al. (2012). “Crossole: A Gestural Interface for Composition, Improvisation and

Performance using Kinect”. In: Proceedings of the 2012 Conference on New Interfaces for Musical

Expression. Ann Arbor, Michigan.

Sturman, David and David Zelter (1994). “A survey of glove-based input”. In: Computer Graphics

and Applications, IEEE 14.1, pp. 30–39.

Thalmann, Florian and Guerino Mazzola (2008). “The Bigbang Rubette: Gestural Music Composition

With Rubato Composer”. In: Proceedings of the International Computer Music Conference 2008.

Belfast.

Theremin, Leon and Oleg Petrishev (1996). “The Design of a Musical Instrument Based on Cathode

Relays”. In: Leonardo Music Journal 6, pp. 49–50.

Tormoen, Daniel, Florian Thalmann, and Guerino Mazzola (2014). “The Composing Hand: Musical

Creation with Leap Motion and the BigBang Rubette”. In: Proceedings of the International Confer-

ence on New Interfaces for Musical Expression. London, United Kingdom: Goldsmiths, University

of London, pp. 207–212.

Vasilakos, Konstantinos (2015). Greap 1.0v. url: https://github.com/KonVas/Greap.

Weichert, Frank et al. (2013). “Analysis of the Accuracy and Robustness of the Leap Motion Con-

troller”. In: Sensors 13.5. Images reproduced under the terms and conditions of the Creative

Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Wingrave, Chadwick et al. (2010). “The Wiimote and Beyond: Spatially Convenient Devices for 3D

User Interfaces”. In: IEEE Computer Graphics and Applications 30.2.

Wright, Matthew (2002). Open Sound Control 1.0 Specification. url: http://opensoundcontrol.

org/spec-1_0.

Wu, Ying and Thomas Huang (1999). “Human hand modeling, analysis and animation in the context

of HCI”. In: Image Processing, 1999. ICIP 99. Proceedings. Vol. 3. Kobe, pp. 6–10.

Zeng, Wenjun (2012). “Microsoft Kinect Sensor and Its Effect”. In: IEEE Multimedia 19.2.

27