Dance, Dance Evolution: Accelerometer Sensor Networks as Input to Video Games

HAVE 2007 - IEEE International Workshop onHaptic Audio Visual Environments and their ApplicationsOttawa, Canada, 12-14 October 2007

Dance, Dance Evolution: Accelerometer Sensor Networks as Input to Video Games

Nick Crampton, Kaitlyn Fox, Hannah Johnston, Anthony WhiteheadSchool of Information Technology

Carleton University1125 Colonel By Drive, Ottawa, ON, Canada KIS 5B6

{ncrampto,kfox,hj ohnsto,awhitehe } @connect.carleton.ca

Abstract - We have created and tested a wearable sensor network collection and interpretation of accelerometer data. This pa-that detects a user's body position as input for video game per also looks at the creation of a network of four accelero-applications. It is envisioned to take video game experiences such as meters for recording and detecting full-body poses usingDance Dance Revolution to a whole new level, replacing the binary Mahalanobis distance. Experiments were conducted to de-foot-pad with a more immersive, full-body input system. We describethe design and functionality of the sensor network and experiment trn the effectrofmutilean pose acc-with Mahalanobis distance as a nearest-neighbour means of racclassification. Resultsfrom our experiments with distance threshold es in difficulty between poses, appropriate thresholds for de-levels, combined data sets and the effects ofpractice on user success tection, the effect of recording time, and the difference inrates are discussed. recognition rates between individual and combined data sets.

A summary of our findings is then presented to support theKeywords - sensor networks, video games, human-computer use of accelerometer-based sensor networks as input in videointeraction, accelerometer, pose detection games. It is concluded with a brief look at future research

I. INTRODUCTION opportutes.

As the demand for innovative video game control in- II.BACKGROUNDcreases, a focus on producing better graphics may no longer Although there has been much prior research into the areabe enough to attract today's discerning consumers. Dynamic of accelerometers as input devices [1,3,4,5,6], little has beeninput systems are proving popular with users, with Ninten- found in the way of creating full-body sensor networks fordo's Wii outselling Microsoft's Xbox360 2:1 and Sony's PS3 use in video game applications.4:1 [1]. This new form of interaction has allowed users to The existing work found in the field of human-inputbecome more engaged in the game. But with these advances, devices can be grouped into three categories: handheldnew issues have arisen within the design aspects of game controllers, multi-sensor activity and pose recognition, andcreation. Game difficulty is now affected not only by timing video game control.(ex. when you press the buttons), but also by body movementand control - not just when you do it, but how you do it. Handheld Controllers. Currently the best known handheld

Given the acceptance of the Wii, the next logical progres- inertial-sensing device, Nintendo's Wii remote, utilizes a tri-sion is to create a multi-sensor network to allow for an even axis accelerometer and a gyroscope to detect motion for usemore immersive gaming experience. Such a network allowsusers to replicate elaborate dance poses, replacing the typical ionjgames ithals sensoinare an Bluetth ptecnolog in^ ~~~~~conjunction with a sensor bar to track the position of thebinary button press. Although it is desirable to have a system remote in relation to the screen [3]. Since only onethat easily interprets human input, it is also expected that the accelerometer is used in one inertial sensor location, the basicusers must develop the game-specific skills with practice. motions it detects can easily be cheated with partial

Accelerometers have inherent limitations when it comes todetctig mtio.Wth nlythre pece ofinfrmaion movement. For example, swinging a sword can be replaceddetecting motion. With only three pieces of information by quick flicks of the wrist.

supplied from tri-axis accelerometers (x, y, z), it is impossible b ickoflicksoevwrist.to know the exact rotation and position of the accelerometer. icros a developed a dicece dstheXW ndwhcintegrates a wide variety of different kinds of sensorsIn this respect, it can not act as a replacement for a gyroscope including a magnetometer, a gyroscope, a microcontroller, anand cannot accurately detect movement by itself. Despite JR LED sensor, and one two-axis accelerometer [4]. Thethese limitations, this work shows that it is still possible to^ ~~~~~~~~~~purposeof this apparatus iS to detect basic motions thatrecognize and detect full body poses using several sensors. would control a multitude of household electronics throughThis paper examines prior research into the field of sensor pointing and gesturing. However, the majority of the motionsnetworks as input devices. It builds upon previous research that are currentlv recoanized bv the device can be easilvconducted by Anthony Whitehead, which will be published relctduigasnet-xsaclro tr,mkgthfor the Future Play 2007 conference proceedings in Novmoncuinvft emadiioa sesr.se ednatber [2]. While his research focused primarily on recognition Ohr aeue nai ceeoeest raesmlmethods and accelerometer function, this paper analyses the hnhl nt orpaeiptfo osik,bto rse

978-1-4244-1571-7/07/$25.00 ©2007 IEEE 107

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 25, 2009 at 09:51 from IEEE Xplore. Restrictions apply.

and mouse movements [5,6]. Similar to the Wii remote, these more from keyboard input than does DDR, it is still quitecontrollers are limited in their capabilities by only using a limited in its range of detectable movements.single location accelerometer for input. EyeToy is a camera-based motion recognition system

developed for Playstation 2. Its camera allows users toMulti-Sensor Activity and Pose Recognition. Current research literally appear in games. However, being camera based, ithas focused more on the use of larger sensor-networks for inherits the same limitations of other video-based detectionactivity recognition, such as distinguishing between walking systems, such as lighting dependencies and occlusion issues.and sitting, than for their use in video games [7]. Many ofthese systems use accelerometers for detecting motion instead III. DESIGN AND IMPLEMENTATIONof static poses. This use of accelerometers can be prone toerror if not used in conjunction with other types of sensors A. The Sensor Networklike gyroscopes. Acceleration due to gravity and centripetalacceleration will all have an effect on the raw values obtained Fig. 1 shows a tri-axis accelerometer placed into afrom the chip, making it difficult to know its exact orientation protective casing, consisting of a plastic shell with a foamwhile in motion. These discrepancies will lead to large errors interior; this serves the dual purpose of protecting the chipsin distance calculations, making it necessary to constantly and connections from wear, while minimizing unwantedrecalibrate the sensors [8]. movement. The plastic capsules also provided a way to fastenSome research has been done to find optimal placement for the accelerometers to limbs by clamping on to adjustable

multiple accelerometers on the body to most accurately straps.recognize activities. Research suggests that using multipleaccelerometers for motion recognition can help to distinguishone activity fromn another and ensure that the gesture iS__

methods are compultationally intense.

accelerometer, a tn1-axis gyro, and a magnetometer, designedAthobe paced onl anly limbn ddepttucatiu motlyion paters.cl |

Alhog mainl tete in aplcain anlzn physical Fig. 1. Plastic capsules used to encase accelerometers with straps.activities like running andl Tai Chi, preliminary tests haveused a single sensor unit to control a character in a

snowbardigsimlatin [9] Unfrtunaely,due t the The SUit (or sensor network) consists of 4 straps, created' ~~~~using elastic, hook & loop tape and metal loops for fastening,complexity of the included components on each Motion eaho .hc' streddtruhapasi asl.Ec

Band, creating a full-body sensor network would be too' ~~~~~~~~~~~~~~capsulecontains two layers of foam that sandwich a 2g tri-costly at the present time for the average consumer. axsaclrmtr nsrn trmisscuei toiinThe Aceleatio Sensng Gove ASG) s a love The USB cable for each accelerometer iS threaded through a

consisting of six biaxial accelerometers, which acts as a hl tteedo h asl,pugn noaUBhbtaportable replacement for a computer keyboard [10]. Usersreplicate predefined hand positions that each translate to a clp ont th ako at.Tehu scnetdthdifferent character or keyboard command. Though it coptruiga2-otUBeenonab.currently makes use of twenty-eight different commands BAcero trPlemn(twenty-six letters, space and delete), the system has thetheoretical potential of containing up to 4000 predefined Th naueotepssdiaeshe lcmnt fposes. Dynamic hand gestures could be recognized, but for accelerometers on the body.It is.necessary to get position andreasons of simplicity, they have not yet been explored,.oainifraino oham n es osrp r

Vide GaeCntro. Te mjoriy o deelopent inthe placed in areas to prevent unwanted shifting and slipping.fiel of umaninpt deicesfor ide gams mae us of Straps on the legs are held in place by knee caps, and straps

field of human-1nputdev1ces for v1deo games make use of.. .'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.

on the wrists were held in place by the wrist bone and

perform ted tasitndd7]Howevrmeers wenoethtths

sensors other than accelerometers [11]. increasig diameter of the arm (see Fig. 2). Adding moreTwo of the most popular human knetic games on the an r

accelerometer,~~~ ~ aceermeeraoltri-axisgyro,a resarcatontmte,dsge

market are Dance Dance Revoluton (DDR) and Guitar Hero.Although they seem more interactive than standard games,they still make use of binary input, similar to button presses.

ParaParaParadise is another dance-oriented game that usesmotion sensors in an octagonal ring above the user to detectvarious arm movements. Although the input system differs

108


C. Pose Detection (recognized as move m*) for which such distance d(M, m) isminimum. Formally:

Several phases are required to capture and detect poses foruse in a game context. As with any classification technique, m* = Min d(M, mi) { i = 1,2. . C} (1)we rely on the basic training and recognition phases. [12]

and d(M, mi) is the Mahalanobis distance metric [2]. It isTraining Phase. The purpose of the Training Phase is to based on correlations between variables by which differentcreate a raw data set of a given number of poses for an patterns can be identified and analysed. It differs fromindividual. To record a pose, the subject performs each pose Euclidean distance in that it takes into account thefive times. 500 readings are taken each time, over a two correlations of the data set and is scale-invariant.second period, with a three second break between; this yields2500 readings. A reading consisted of the x, y, and z

p qi)2accelerations for each accelerometer in the sensor network. d(p, q)

27 (2)

We consider a data set for training to come from a single 107sample user. If multiple data sets are to be used for detection,the raw data from each pose must be combined before where ai is the standard deviation of the data over the samplecalculations are performed. set. If the distance falls within an acceptable threshold, the

pose is deemed "detected".

III. EXPERIMENTAL RESULTS

A series of experiments was performed to examine thecapabilities and limitations of our sensor network,specifically to determine the feasibility of applying similarprinciples in a video game context. Participants were asked toperform multiple trials consisting of eight distinct poses,shown in Fig. 3. Tests were conducted in a supervised labsetting with a consistent level of instruction and coachinggiven by the trainer.

Fig. 2. The accelerometer sensor-network.

To convert a set of raw data into a recognizable pose dataset, two basic calculations must be performed. The meanacceleration value and the standard deviation for each axis ofeach accelerometer for each pose must be determined. Theacceleration values and standard deviations for each axis of Fig. 3. The eight poses used in the experiment, inspired by Michaeleach accelerometer for each pose are stored in a pose bank Jackson's Thriller.used for subsequent recognition tasks.

A. Effect of Multiple Accelerometers on Pose AccuracyRecognition Phase. Acceleration values from the suit are read

Table 1. Mahalanobis Distance for Pose 3 (see Fig. 3)In continuously and compared to the poses stored In the pose Number of Accelerometers Mahalanobis Average Distance Perbank. We use a minimum mean distance rule classifier as our Used Distance Sensorrecognition system based on previous experiments [2]. It 4 2.676948 0.669236characterizes each category by mean and standard deviations 8 7.269316 0.9086645of the components of its training feature vectors. The distancebetween an unknown sample M(input move) and the mean of This experiment independently validates the work of [7]the features of class m (trainedpose), d(M, m) , is then com- which suggests that the more accelerometers there are in aputed. The unknown sample is then assigned to class m* sensor network, the more accurately poses can be

109


differentiated. Table 1 shows that increasing the number of The graph in Fig. 5 illustrates a visible difference in thesensors makes more demands on the user to properly recognition rates for the poses. For example, Pose 4 had anreplicate the pose, since each added sensor contributes to the overall recognition rate of 18.3%, whereas Pose 6 had a rateoverall distance from the optimal pose. This is reflected by 66.7 00. As shown in Figure 3, Pose 4 required balancing onthe higher average distance per sensor. one foot, similar to Pose 6, however there was an aspect of

ambiguity to the angles of the other three limbs; Pose 6 hadB. Learning Curve more rigid positions at 900 and 1800 angles. Pose 4 presented

further problems for many participants due to physicalThe learning curve experiment was conducted to examine differences, like leg length, resulting in different angles

the effects of repetition on the successful recognition rate of created by the user's body.poses. Our hypothesis was that with practice, any user wouldbe able to achieve full recognition for every pose in a trial. 100% -

In a video game context, we also want the players toprogressively succeed with practice. Seven participants _performed the eight poses set out by the pose bank illustratedin Fig. 3 sequentially, up to fifteen trials, stopping if they ° %

successfully executed all poses in one trial. It is important to 0note that this experiment was completed with a trainingsample of one.The resulting trend lines, illustrated in Fig. 4, support our 40% _

hypothesis in that, given enough repetition, any user can %improve. However, there will not necessarily be improvement

-

at the sam..e rate. It1iS interesting to note th...........at over tim..... e,paricipants' improvement woul plateau as fatigue set in. 10Further experimentation is required to determine the optimal 0 O - T Tl T-Tl T_ Tl ..........................l...lnumber of trials to maximize success. Although there was 1 2 3 4 6 6 7 8observed improvement overall, certain poses were lldiVidUdai PosE~problematic for most subjects, which led to further pose- Fig. 5. Graph ofpose recognition averages by four subjects, over fifteenbased analysis discussed next. trials using a training data set size of one.

100J% Although it is desirable for the population to be capable of96 ____ performing every pose, a difference in complexity among the

poses can prove to be beneficila in some aspects of game86% developmnent, allowing for an added level of difficulty withnn

g 6 /0__ These results suggest a need for a larger training data sett e for the poses in order to accommodate a wider variety of

.a 56% body types and physical limitations. This is not an

pi40% unexpected result; however it shows that a large trainingu 30% sample is not necessarily required to create games that are

2% / playable by a rangerof p eople. Large r trainingdatasets arebasdnalsidicuse net.trilsusng tainngdaa stizeofondiscussedlater on.

6% D. Threshold1 2 3 4 5 6 7 8 9 16 11 12 13 14 15

NumberoATtaItIt is unreasonable to expect a user to be able to achieve aFig. 4. Trend lines for the learning curves of seven subjects. "perfect" pose, in other words achieve a recognition distance

of zero, even if they are a part of the traiinng sample. SimpleC. Pose Difficulty variations are inevitable due to core mechanics of the human

musculature system. Therefore, an allowable range, definedWhile performing the Learning Curve experiment, it was by a threshold, is experimentally determined to allow

noted that certain poses were consistently problematic for reasonably similar poses to be detected while avoiding falsemany subjects. Training data from a single individual was positives. The threshold for an individual's data set wastested by four subjects over fifteen trials using a low determined by continually increasing the allowablerecognition threshold in order to determine the level of Mahalanobis distance after each trial until the lowest possibledifficulty among the.eight.poses. value was reached where all of a user's poses could be

110


recognised (Fig. 6). It should be noted that the threshold confusion matrix (Table 3) was created to find an appropriatefound for one individual's data set will not necessarily be the threshold for the seven sets of data that were combined assame as another's since their standard deviations will vary training data; the chosen threshold was four.based on how they perform the pose in the training phase andhow steady they are during recording.

100% Table 3. Pose Confusion Matrix for Combined Data of Seven Subjectso980% 1 2 3Pose to Detect

13 2 3845867 4o0am- ./ ~~~~~~~~~~~~~~~~~~~1 3.996 16.386 12.308 12.074 20.178 25.487 29.183 38.874

<> 606 > / | 2 15.878 3.104 14.394 22.878 33.016 33.768 20.393 5.631 505.993 27.239 2.192 1 3 22.355 45 34821 12

_ 4 10.918 32.047 16.059 2.868 16.341 45.845 39.553 43.530, 40eS 1 cL~~~~~~~~~~~~~~a5 9.704 37.074 12.302 11.682 3.773 48.108 47.171 25.971

u 6 22.948 21.116 27.062 23684 45.247 3.31593 2 ~~~~~~~~~~~~~~~~~~~743.606 27.261 16.094 30.934 43.458 35.509 3.212 60.027

10St t ~~~~~~~~~~~~~8|12.896 |28.030 |8.702 |18.970 |29.038 |39.525 |36.946 3.992|

0%12.4 12.6 13.2 13.6 14 14.4 As can be seen in both confusion matrices, the lowestThreshold distance in each column correctly corresponds to the pose

Fig. 6. The success rates of a user with a single data set at various thresholds. that was to be detected; the other distances were

Once an acceptable threshold was found of 14.4, a approximately an order of magnitude higher, presentingconfusion matrix (Table 2) was created to determine the minimal risk of encountering false positives.likelihood of false positives among other poses. For this Training and Standard Deviationparticular experiment, poses were inspired by MichaelJackson's Thriller, and selected to ensure differences in limb The more variation allowed in the pose during training, theorientation. The results indicate little possibility for the higher one would expect the standard deviations to be.occurrence of false positives. Subsequently, this makes the pose more easily detected. If the

Table 2. Pose Confusion Matrix for Individual Data deviation gets too high, more error and thus less precise posesPose to Detect will be tolerated. Contrarily, if the standard deviation is too

| | | 1 1 2 3 4 5 6 7 8 low, it will be exceptionally difficult to detect the pose.

2 44.729 17.343 40.511 44.732 86.571 34.71047.305 79.7240.043 5 2 9 53.19 0.035E

I° 4 46.490 110.294 50.742 113.212 48.137 55.676 61.594 1163.215 lll l l

,,l6 67.150 76.294 74.614 66.155 92.945 111.573 65.587 1127.252 .- ED Seconds 1 1a.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~00

7 77.809 93.010 54.775 66.293 104.825 68.028 12.067 124.497 1 02 *2SeicondsW0 1.57 MO~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~5Seconds

8 23292.41 3159 7360876 3.809.2 1.05on

1.01

To address the issue of the threshold's dependence on the D.i0iindividual being in the training data, seven sets werecombined, thereby generalizing the pose bank due to a largertraining sample. Once combined, the standard deviations RightAnn LeA RghtL. LeLeg

Accele rolieter Axis

within the set increased, which in turn lowers the distances Fig. 7. The effect of recording time on the standard deviations of Pose 3 (seeduring the recognition phase. This required lowering the Fig. 3).threshold to prevent the occurrence of false positives. AsTable 2 shows, the threshold could easily be set as high as Standard deviations vary based on several factors: the persontwenty-five without encountering false positives. However as recording the pose (size, shape, athletic background), thethe threshold increases, a user can be less precise in his or her complexity of the pose, and the recording time. If the personpose while it is still successfully detected. This phenomenon, recording the pose is unable to consistently maintain a pose,in fact, results more from the training data accounting for standard deviation may be unusually high, whereas if theydifferences in body sizes, shapes and flexibilities more so remain perfectly still, it will be low. For this reason, in ourthan a deficiency in classification methodology. Another experiments, the subjects perform each pose five times, with

11 1


breaks in between to ensure some variation. Similarly, poses over time. Furthermore, recognition rates continue to improvethat require balancing often result in higher deviations. The as a larger and more diverse sample of training data is used.faster the recording of the 500 samples, the smaller the Game complexity can be naturally introduced by playing withstandard deviation as figure 7 shows. This tells us that the the threshold value for recognition allowing easy, normal,training phase requires some significant thought and hard, and outrageous levels of difficulty.supervision throughout the entire process. Future research will investigate the optimal size of aF. Training Sample Size training sample necessary to accurately represent the

population. We will continue to expand our sensor networkIt became apparent throughout the earlier experiments that and its applications by adding more accelerometers to the suit

even with practice, some participants were unable to as well as other possible sensors. We are also hoping tosuccessfully replicate certain poses from a single individual's explore more dynamic gesture recognition.training data set. Physical limitations can make it impossiblefor some subjects to get their body into the exact same ACKNOWLEDGEMENTposition as the person who trained the data set. Theserestrictions include things like muscle size, limb length, This work is supported in part by the NSERCproportions, and overall body type. Combining multiple Undergraduate Student Research Awards. We would also likepeople's data sets would make it possible for a wider variety to thank the gracious volunteers who participated in ourof users to achieve higher recognition rates. To test this, we experiments.collected training data from seven individuals of varyingheights and shapes. When data sets are combined, standard REFERENCESdeviations increase, and we would expect them to continue todo so until there are enough people in the sample to give an [1] NPD Group Market Research, Port Washington, N.Y. 2007[2] A. Whitehead, N. Crampton, K. Fox, H. Johnston. "Sensor Networks asaccurate representation of the population in general. As a Video Game Input Devices" in ACM Proceedings Futureplay Conference.result, the threshold needs to decrease accordingly. The Toronto, Canada, 2007.thresholds used in the experiments were chosen by creating [3] IGN Staff, "Wii Best of E3 2006 Awards," May 19, 2006,confusion matrices of multiple subjects using self-trained http:/wiiign.com/articles/709/709244plhtmI. (Accessed July 27, 2007)data and combined data sets. It was possible to determine the [4] D. H. Wilson and A. Wilson, "Gesture recognition using the XWand,"

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep.lowest values for thresholds that would still yield 100% CMU-RI-TR-04-57,2004.recognition rates for the users. [5] J. Payne, P. Keir, J. Elgoyhen, M. McLundie, M. Naef, M. Horner and P.Using these thresholds, an experiment was run with one Anderson, "Gameplay issues in the design of spatial 3D gestures for video

subject testing against another subject'sdgames," in CHI '06 extended abstracts on Human factors in computingsubject testing against another subject's data and a combined ytm,20,p.11-22systems, 2006, pp. 12 17-1222.set of seven other individuals' data. The recognition rates are [6] P. Baudisch, M. Sinclair and A. Wilson, "Soap: how to make a mouseshown below in Table 4. work in mid-air," in CHI '07 extended abstracts on Human factors in

computing systems, 2007, pp. 1935-1940.Table 4EfetoCobndDaoReo[7] L. Bao and S. S. Intille, "Activity recognition from user-annotated

Tabler 4.Eeof CData SetsUsedThreshond Su sRecognitionR acceleration data," in Pervasive Computing: Second InternationalNumber of Data Sets Used Threshold Successful Recognition Conference, 2004, pp. 1-17.

7 1 8735 [8] D. Ashbrook, T. Westeyn, and T. Starner, "Dancing in the streets:smartphones and gaming," presented at Workshop on UbiquitousEntertainment and Games at the 7th International Conference on Ubiquitous

These results support our hypothesis that increasing the Computing, Tokyo, Japan, 2005.size of the training set will improve the success rate of a [9] K. Laurila, T. Pylvanainen, S. Silanto, and A. Virolainen, "Wirelessrandom individual. In the game design context, we see that motion bands," presented at UbiComp '05 Workshop on Ubiquitous

success rt ineedmtacomputing to support monitoring, measuring, and motivating exercise,success rates increase dramatically with the number of Tokyo, Japan, 2005.training samples. However, a small set of samples is still [10] J.K. Perng, B. Fisher, S. Hollar and K. S. J. Pister, "Accelerationeffective, making the method feasible for even the smallest Sensing Glove (ASG)". In Proc. of the Third International Symposium ongroup of game designers. Wearable Computers (ISWC'99), San Francisco, 1999, pp. 178-180.

[11] J. R. Parker, "Human motion as input and control in kinetic games," inProceedings ofthe 2006 Future Play Conference, 2006, pp. 1-7.

IV. CONCLUSION AND DISCUSSION [12] Richard 0. Duda,. Peter E. Hart. Pattern Classification. WileyInterscience. (1973)

In this work we have shown that sensor networks ofaccelerometers are a viable input system for control in videogames. By only using accelerometers, we have created arelatively inexpensive suit still capable of accurate, full-bodypose detection. Though a new user to the system may notinitially achieve 1000% pose recognition, it has been shownthat with practice, improvement will be seen. This effect isdesirable in game development to help maintain user interest

112


Dance, Dance Evolution: Accelerometer Sensor Networks as Input to Video Games

Documents

Transcript of Dance, Dance Evolution: Accelerometer Sensor Networks as Input to Video Games