MARKOV MODELING OF A TENNIS POINT PLAYED

14
Markov Modeling of a Tennis Point Played Andrei Loukianov 1 and Vladimir Ejov 1 University of South Australia 2 University of South Australia (Received xxx 2006, accepted xxx 2006, will be set by the editor) Abstract. We propose a new and original model, analytical approach and numerical technique to determine a winning probability of ‘a tennis point played’ on serve/return. The fundamental idea is to include sport specific competition situations in stochastic analysis of tennis. The potential of Hawkeye TM technology can be discovered through the model and utilised in a tennis player development with any level of skill. Keywords: Markov Chain, hot spots, transition matrix, stationary probability, conditional probabilities 1.Introduction What method can be used for recreating of a player’s technical profile from a match statistics? How can a player’s psychological profile be partially recreated from a tournament statistics? Statistical analysis of tennis has covered game modeling (see Kemeny and Snell [8]), match outcome prediction (see Barnett [1]), new scoring systems (see Pollard [13]), definition of the most important points (see Morris [11]), properties of points distributions (see Klaassen [9]), rating system (see Clark and Dyte [3]), expected length of games in a match (see Jackson [7]), effect of new balls (see Norton and Clark [12]), impact of court surface (see Cross [4]) and many other aspects of the contest. The researchers have been able to predict an outcome and give insight into the game. However we believe players need rather specific knowledge of their personal technique and behavior to achieve a competitive advantage in the game affected by environmental and psychological factors. Corresponding author. Tel.: +61-8-8302-3343; fax: +61-8-8302-5785. E-mail address: [email protected] 1 ISSN 1750-9823 (print) International Journal of Sports Science and Engineering Vol. xx (2007) No. xx, pp. xxx-xxx (Will be set by the publisher)

Transcript of MARKOV MODELING OF A TENNIS POINT PLAYED

Markov Modeling of a Tennis Point Played

Andrei Loukianov 1 and Vladimir Ejov1 University of South Australia2 University of South Australia

(Received xxx 2006, accepted xxx 2006, will be set by the editor)

Abstract. We propose a new and original model, analytical approach andnumerical technique to determine a winning probability of ‘a tennis pointplayed’ on serve/return. The fundamental idea is to include sport specificcompetition situations in stochastic analysis of tennis. The potential ofHawkeye TM technology can be discovered through the model and utilised in atennis player development with any level of skill.

Keywords: Markov Chain, hot spots, transition matrix, stationaryprobability, conditional probabilities

1.IntroductionWhat method can be used for recreating of a player’s technical profilefrom a match statistics? How can a player’s psychological profile bepartially recreated from a tournament statistics?

Statistical analysis of tennis has covered game modeling (seeKemeny and Snell [8]), match outcome prediction (see Barnett [1]), newscoring systems (see Pollard [13]), definition of the most importantpoints (see Morris [11]), properties of points distributions (seeKlaassen [9]), rating system (see Clark and Dyte [3]), expected lengthof games in a match (see Jackson [7]), effect of new balls (see Nortonand Clark [12]), impact of court surface (see Cross [4]) and manyother aspects of the contest.

The researchers have been able to predict an outcome and giveinsight into the game. However we believe players need rather specificknowledge of their personal technique and behavior to achieve acompetitive advantage in the game affected by environmental andpsychological factors.

Corresponding author. Tel.: +61-8-8302-3343; fax: +61-8-8302-5785. E-mail address: [email protected]

1

ISSN 1750-9823 (print)International Journal of Sports Science and Engineering

Vol. xx (2007) No. xx, pp. xxx-xxx

(Will be set by the publisher)

A few years ago The Hawk-Eye Officiating System was introduced toreduce margin for error in the referee's decision at major tennistournaments. The system collects the enormous amount of specific ball-tracking data. The data can be transformed in XYZ coordinate locationof the ball landing.

We propose an analytical approach and a stochastic model whichare able to produce a sequence of specific actions and decisions to bemade by a player in conditions defined by a coach. We also illustratethat the decisive winning factor in a tennis match is the ability of aplayer to produce with high probability successful winners/approachshots from well inside the court.

The concept of Markov dependency was published by the Russianmathematician A.A. Markov in 1906 [11]. An extensive research ofstructure and properties of Markovian models can be found, for examplein Borovkov [5] and Meyn and Tweedie [12]. We provide only a fewmathematical definitions to introduce the theoretical potential forsport science professionals.

The discrete time process is called a Markov Chain iffor all thefollowing is true:

.

The quantity is called the state transition probability which is theconditional probability that the process will be in state at time immediately after the next transition, given that it is in state attime (see Ibe [8]). The numbers can be arranged in a transitionprobability matrix

.

It is a stochastic matrix because for any row , . Any rowof a stochastic matrix is a probability vector which is a synonym fordiscrete probability distribution (see Iosifescu [9]).

The chain is homogeneous in time, because the same transitionmatrix P is used to represent the transitions from time 1 to time 2,

2

from time 2 to time 3, and so on (see Berchtold [3]). The property isvalid for that

.

The chain is homogeneous in time, because the same transitionmatrix P is used to represent the transitions from time 1 to time 2,from time 2 to time 3, and so on (see Berchtold [2]). For certaintypes of MC, after a number of transitions the values of thetransition matrix are approximately the same from transition totransition. If this is the case, the MC reached steady state.

Our quantitative analysis is effectively an analysis of thetransition matrix and its properties for homogeneous MC in stochasticgames.

2.Analysis and InterpretationThe Markov model of a tennis game was developed to analysenonequivalence of value of the points depending on the current scoreby Morris [11]. The probability that player A wins the gamewhen the point score is (a, b), is given by:

where p is the probability of player A winning a point and the boundaryvalues are

For example, we obtain the probability of A to win from score(40:40)

hence

We can calculate the probabilities from each score line by regression, for example (0:15) and (0:0)

3

The model requires a calculation of a probability to win a singlepoint as an input for further analysis. Barnett [1] used the model topredict a tennis game and match outcome assuming the probability ofplayer A winning from various score lines is given and fixed. Howeverwe can demonstrate that p is changing. The same player (see RafaelNadal’s match statistic samples at the US Open 2010) producedstatistical probability of winning on serve 0:752 playing in one match(vs. Michail Youzhny) and 0:6772 in another (vs. Novak Djokovic). Theprobability is calculated by the formula

(1) ,where

is ‘1st Serve %’,

is ‘Winning % on 1st Serve’,

is ‘Winning % on 2nd Serve’.

Fig. 1: Nadal vs Youzhny and Nadal vs Djokovic.

3.ModelWe propose a method to calculate the probability p of winning a pointon serve for given player A against given opponent B with nohistorical data provided. The tennis court is represented by a number

4

of most targeted spots as MC states, for example player A is hittingthe ball from the spot to the spot and produces the sampletransition matrix where

Fig 2: Tennis court representation of ‘Serving’, ‘Receiving’, and ‘Spotting’.

4.Method

Our steps are the following:1. Represent a field by a set of most targeted areas (hot spots).2. Trace a single point played by given player hitting a certain

spot from another spot.3. Build a player profile module that incorporates Spotting is an

interpretation of a sequence of Serving and Receiving.4. Build a MC transition matrix for player ‘A server and B receiver’

and for player ‘B server and A receiver’.5. Calculate stationary probability distribution.6. Calculate probability of winning a point on serve as ‘winning in

more or equal than 4 shots’ and ‘winning in less or equal to 3shots’.

5.Algorithm

5

We are able to show our solution for calculation of a probability p of‘a tennis point played’.

We conjecture that the latter can be approximated by

,

where is stationary probability distribution of ‘Spotting’matrix (A is to mark a column of the ‘Server’), is an indexed ‘hotspot’, and is the row of initial probabilities after the 3rd shot.

,

where event X – 2nd shot was not played, event Y – 3rd shot wasnot played, and event Z – 4th shot was not played.

6.Numerical ExampleWe analysed the first set of the match between Roger Federer andRafael Nadal at AU Open 2009. Federer lost the set 5:7 on his serve.The match statistics as follows:

The set was played in 35 points. Federer made 2 doubles and 3 aces served. Nadal made 6 errors and 1 win on return from serve. Federer made 1 win and 3 errors on playing Nadal’s return.

Official match analysis p calculation (1) is

6

We calculate the probability of Federer winning a point on servewhere all of the match points were traced by landing.

The numbers are matched (0.48 vs. 0.48). The possible deviationswill be estimated in future work.

Fig. 3: the row of initial probabilities after the 3rd shot.

A is Federer, B is Nadal and column X is to mark an acting Serverand acting Receiver. The sum of probabilities is bigger than 1 due torounding in matrix multiplication.

Fig. 4: ‘Spotting’ P transition matrix.

7

Fig. 5: ‘Spotting’ P transition matrix for Server ‘S’.

Fig. 6: ‘Spotting’ P transition matrix for Receiver ‘R’.

Fig. 7: P stationary probability distribution for Federer to Nadal.

Fig. 8: P stationary probability distribution for Nadal to Federer.

7.Top up -

What will happen if Federer would play better from the center of thecourt? We change transitions probabilities in row to top up column, produce modified stationary matrix and do our calculationsfollowing the steps presented earlier.

8

Fig. 9: to top up.

Fig 10: P stationary distribution for Federer to Nadal top up.

Fig. 11: P stationary distribution for Nadal to Federer top up.

We can conclude that Federer would most likely win the set (0.6vs. 0.48 calculated p) if he improved his winning from the center ofthe court.

8.Top up -

What will happen if Federer would win more by forehand from the baseline? We change transitions probabilities in row to top up column. Than we produce modified stationary matrix and do ourcalculations following the steps presented earlier.

Fig. 12: to top up.

Fig. 13: P stationary distribution for Federer to Nadal top up.

Fig. 14: P stationary distribution for Nadal to Federer top up.

9

We can see that chances to win are hardly improved (0.5 vs 0.48 calculated p).

9.Top up - What will happen if Federer would play just better forehand from hisbase line diagonal to Nadal’s base line? We change transitionsprobabilities in row to top up column. Then we produce modifiedstationary matrix and do our calculations following the stepspresented earlier.

Fig. 15: to top up.

Fig. 16: P stationary distribution for Federer to Nadal - top up 0.6 from .

Fig. 17: P stationary distribution for Nadal to Federer - top up 0.6 from .

We can see that chances to win are not improved at all (0.48 vs 0.48 calculated p).

10. Top up - ideal play from the center of the court

Suppose Federer is playing conservative from baseline and perfectlyfrom the center of the court.

10

Fig. 18: P ideal game for Federer to Nadal.

Fig. 19: P stationary distribution for Federer to Nadal modified to ideal playfrom center.

Fig. 20: P stationary distribution for Nadal to Federer modified to ideal playfrom center.

The calculations suggest that Federer would win the set (0.48 vs 0.66 calculated).

11. ConclusionIt is clear that players are competing to dominate the area inside thecourt and to force the winning combination of shots with the bestavailable technique.

The potential of Hawkeye TM technology can be discovered throughthe model and utilised in a tennis player development with any levelof skill.

Fig. 21: Top up 0.6 summary analyses.

11

Fig. 22: Serve by Federer vs. receive by Nadal correlation.

Fig. 23: Serve vs. serve/scored correlation by Federer.

12

Fig. 24: Receive vs. receive/scored correlation by Nadal.

The modeling and numerical technique can be applied in skillimprovement and quantification of game trends related to playerdevelopment. We recommend to structure the training process so that aconsiderable time and effort is devoted to expand the player’sspectrum of winning shots from a close range. Targeted practicesubject to availability of opponent’s profile is also recommended.

12. References[1] T. Barnett, A. Brown, and S. Clarke. Developing a Model that Reflects Outcomes of Tennis Matches. Proc. of the 8th Australasian Conference on Mathematics and Computers in Sport, pp. 1-11, 2006.[2] A. Berchtold. Markov chains computation for homogeneous and non-homogeneous data. Journal of Statistical Software, 6(2001)3:1-82.[3] S. Clark and D. Dyte. Using official ratings to simulate major tennis tournaments. International Transactions in Operational Research, 7(2000):585–594.[4] R. Cross. Measurements of the horizontal coefficient of restitution for asuperball and a tennis ball. Am. J. Phys., 70(2002)5:482–489.[5] O. Ibe. Markov processes for Stochastic Modeling, Elsevier, 2009.[6] M. Iosifescu. Finite Markov Processes and Their Applications, John Wiley and Sons, 1980.[7] D. Jackson. Index betting on sports. The Statistician, 43(1994)2:309–315.[8] J. Kemeny and J. Snell. Finite Markov Chains. Springer–Verlag, 1960.[9] F. Klaassen and J. Magnus. Are points in tennis independent and identically distributed? evidence from a dynamic binary panel data model. Journal of the American Statistical Association, 96(2001), 500-509.

13

[10] A. Markov. Extension of the Law of Large Numbers to Dependent Variables. Kazanskii University, 1906.[11] C. Morris. The Most Important Points in Tennis, In Optimal Strategies in Sports, S. Ladany and R. Machol eds. Amsterdam: North– Holland, pp. 131-140, 1977.[12] P. Norton and S.R. Clarke. Serving up Some Grand Slam Tennis Statistics. Proc. of the 6MCS. pp. 202–209, 2002.[13] G. Pollard and K. Noble. The Characteristics of Some New Scoring Systems in Tennis. Proc. of the 6MCS. pp. 221–226, 2002.

14