formulating optimal betting strategies through a statistical ...
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of formulating optimal betting strategies through a statistical ...
FORMULATING OPTIMAL BETTING STRATEGIES
THROUGH A STATISTICAL PREDICTION OF THE
OUTCOME AND MARGIN OF VICTORY OF NFL GAMES
A THESIS
Presented to the University Honors Program
California State University, Long Beach
In Partial Fulfillment
of the Requirements for the
University Honors Program Certificate
Ranil Weerackoon
Spring 2015
I, THE UNDERSIGNED MEMBER OF THE COMMITTEE,
HAVE APPROVED THIS THESIS
FORMULATING OPTIMAL BETTING STRATEGIES
THROUGH A STATISTICAL PREDICTION OF THE
OUTCOME AND MARGIN OF VICTORY OF NFL GAMES
BY
Ranil Weerackoon
____________________________________________________________
Olga Korosteleva, Ph.D. (Thesis Advisor) Department of Mathematics and Statistics
California State University, Long Beach
Spring 2015
ABSTRACT
Billions of dollars are bet on sports each year, with no sport receiving more wagers than
football. The purpose of this thesis is twofold: to figure out which football team to bet on and
then how much to bet on that team. In this thesis, we construct a model that can be used to
predict the outcome of NFL games as well as the margin of victory. Tested over the course of the
2014 NFL season, the model constructs a rating for each NFL team by using variables such as a
teamβs strength of schedule and adjusted margin of victory. After calculating the probability of a
team winning as well as its predicted margin of victory, a half Kelly staking strategy is used to
determine the optimal fraction of oneβs bankroll that should be used for betting on the outcome
of games. Considering only games with an estimated positive expected value, our model
accurately predicts 70.4% of games and 55% of games against the Vegas point spread and
returns an overall profit.
iii
TABLE OF CONTENTS
LIST OF TABLES ......................................................................................................................... iv
LIST OF FIGURES .........................................................................................................................v
CHAPTER
1. INTRODUCTION ...............................................................................................................1
Sports Betting and Terminology...................................................................................1
Strategies Associated With Betting ..............................................................................6
Fixed Staking ...............................................................................................7
Percentage Staking .......................................................................................7
Kelly Staking ...............................................................................................8
2. STRATEGIES USED FOR PREDICTIVE PURPOSES ..................................................10
3. DATA DESCRIPTION .....................................................................................................11
Strength of Schedule (SOS) and Simple Rating System ...........................................11
Adjusted Margin of Victory (AMOV) .......................................................................12
Team Rating ...............................................................................................................14
4. METHODOLOGY ............................................................................................................17
5. RESULTS ..........................................................................................................................19
Statistical Model ........................................................................................................19
Betting Strategies .......................................................................................................21
6. DISCUSSION ....................................................................................................................23
REFERENCES ..............................................................................................................................26
iv
LIST OF TABLES
Table Page
1. List of NFL Statistics .........................................................................................................16
2. List of Variables used each week in Statistical Calculations .............................................18
v
LIST OF FIGURES
Figure Page
1. Comparison of Statistical and Intuitive Analyses ..............................................................20
2. Accuracy in Predicting the Outcome of Games .................................................................21
3. Betting Profits by Week .....................................................................................................22
1
CHAPTER 1
INTRODUCTION
Sports betting has grown into a marketable industry for bookmakers. Perhaps no sport has
gained greater profits for these industries than football, where a staggering $1.34 billion was bet
solely on football games in the state of Nevada in 2011 (Spear, 2013). However, while casinos
and Vegas have hauled in exorbitant profits, American gamblers lost $119 billion in 2013 betting
on sports (Aziz, 2014).
Unlike traditional gambling, in which the probability of an event occurring is known, the
probability of an outcome in sports can only be estimated. The purpose of this research study is
to ascertain ways to bet efficiently on NFL games. Football was chosen as the sport of interest
due to the high popularity of the game and because an overwhelming 65% of sports betting is
directed at football alone (McIntyre & Sauter, 2011). This study aims to identify a model that
will predict both the outcome and the point differential of a game so that one will have a higher
probability of making a profit by wagering on football. In addition to providing an analysis as to
which team to bet on in a particular game, this thesis also illustrates an optimal strategy as to
how much to bet on a particular team in order to maximize profits.
Sports Betting and Terminology
This paper will focus on two different forms of betting. One type of betting is predicting the
outcome of a game outright (i.e., betting on which team will win). For this form of betting,
bookmakers use odds that can be used to bet on a team for each individual game. The bookmaker
2
will denote one team as a favorite and assign lower money odds to the bettor should that team
win while the team designated as the underdog will receive higher odds. For instance, an odds
line of -340 on the favorite and +280 on the underdog indicates that a bettor would make a profit
of $100 per $340 bet when the favored team wins the game. If a bet is placed on the underdog, a
bettor would earn $280 per $100 bet. In other words, the negative odds line (e.g., -340) indicates
how much the bettor would need to bet on the favorite while the positive odds line (e.g., +280)
indicates the winnings in the event the underdog wins the game. As is clear from the money
odds, a bet on both teams results in a profit for the bookmaker. In the example above, if a bettor
were to place a bet on both the favorite and the underdog, the bettor would lose $340 if the
favored team lost and win back only $280 when the underdog wins, resulting in a net profit of
$60 for the bookmaker. If the favorite won the game, the bettor would earn no profit (the $100
bet on the underdog would negate the $100 earned from the favored team winning). These odds
are thus biased towards bookmakers to ensure they have an edge over the bettor. The exception
to this occurs when a football game ends in a tie, in which the money used to bet is returned back
to the bettor and no profit is earned for either the bookmaker or the bettor. However, these
instances are extremely rare in the NFL as only five games out of 4,345 (0.11%) have ended in a
tie since the start of the 1998 season through the conclusion of the 2014 season.
The probability of a team winning a game can be calculated through the bookmakerβs
odds by taking the quotient of the amount bet and the potential amount received in revenue. For
example, the aforementioned -340 odds used for the favorite indicates that the favorite has a
πππ
πππ= 77.3% chance of winning the game while the +280 odds used for the underdog indicates
that the underdog has a πππ
πππ= 26.3% chance of emerging victorious. The probabilities sum to
3
103.6%, which accounts for the vigorish (also known as the vig or the overround) used by the
bookmaker to create an edge over the bettor and thus ensure a profit for the bookmaker, granted
the game does not end in a tie. However, the probability of a team winning after accounting for
the vigorish can be calculated by taking the denoted probability and divided it by the overall sum
that includes the vigorish (SillanpÀÀ & Heino, 2013). Using the example from above, the favored
team has a π.πππ
π.πππΓ 100% = 74.6% chance of winning. Similarly, the underdogβs probability of
winning can be depicted as π.πππ
π.πππΓ 100% = 25.4%. The odds are designed such that the expected
value for the bettor is always negative and bets on the favorite assume greater risk. From the
example above, the expected value on a bet placed on the underdog is (-100) (0.746) + (280)
(0.254) = β $3.48 while the expected value on a bet placed on the favorite is (100) (0.746) + (-
340) (0.254) = β $11.76.
The second form of betting involves the point spread, or the line, for games. The point
spread is devised such that the team favored to win the game is assigned a handicap. For
instance, a bet on a team that is designated as a seven point favorite (i.e., a line of -7) needs not
only for the team to win the game, but for the team to win by more than seven points in order to
make a profit. In other words, the margin of victory for the favored team must be bigger than
seven points. If the margin of victory is identical to the point spread, it is denoted as a push and
no money changes hands (i.e., no profit is attained for either the bookmaker or the bettor). To
avoid a push, half point spreads (i.e., a line of 7.5) are often devised so that a win or a loss on a
bet is guaranteed. If the team favored wins by a margin greater than the line or if the underdog
βwinsβ the game after the handicap is assigned to the final score, the team is denoted as having
covered the spread.
4
Most bookmakers used 11-10 odds for betting on a point spread (Winston, 2012). More
specifically, an $11 bet on a team would result in a profit of $10 if the team covers the spread.
For example, in Super Bowl XLVIII, the Denver Broncos were a 2.5 point favorite over the
Seattle Seahawks. Bettors who bet on the Broncos -2.5 would only receive a $10 profit per $11
bet if the Broncos won by three or more points. Bettors who bet on the Seahawks +2.5 would
win $10 per $11 bet if the Seahawks lost by less than three points or if the Seahawks won the
game. Thus, if a bet is placed on each team, the bookmaker receives a total of $22 and only has
to pay out $21 to the bet that results in a win (the $11 used to bet and the $10 profit), thus
guaranteeing the bookmaker a profit of $1 (Winston, 2012). Thus, the bookmaker is assured a
π
ππ= 4.5% profit, which accounts for the bookmakerβs vigorish. If a bettor has a probability π of
winning a bet, then one will break even when betting on the spread when πππ β ππ(π β π) =
π, indicating that p = ππ
ππ= 52.4%. Therefore, a bettor must win at least 52.4% of the time when
betting on the point spread in order to make a profit.
A common misconception as it pertains to sports betting is that the point spread is
devised in such a manner as to get an equal amount of money bets on both sides to ensure the
bookmakersβ 4.5% profit. However, Steven Levitt showed that bettors have a propensity to bet
on favorites even though home favorites and away favorites cover the spread only 49.1% and
47.8% of the time, respectively, while home underdogs and away underdogs cover the spread
57.7% and 50.4% of the time, respectively (as cited in Winston, 2012). Thus, bookmakers can
make a greater profit by increasing the point spread due to bettorsβ inclination to place money on
the favorite. For example, assume the true point spread is Team A β6 over Team B (i.e., Team A
is a six point favorite). The bookmaker may opt to increase the point spread to seven because of
5
the bias towards the favored team, thus resulting in a situation where Team A no longer has a
50% chance of covering the spread. This gives the bookmaker an opportunity to earn a profit
exceeding 4.5%. In Levittβs sample, bettors won 49.45% of their bets and lost 50.55% of the
time, thus creating 0.4945(-10) + 0.5055(11) = 61.56 cents profit for the bookmaker per $10 bet.
This exceeds situations where bettors win 50% of the time, in which bookmakers would make a
0.5(-10) + 0.5(11) = 50 cent profit, resulting in a 61.56/50% = 23% increase in profit for the
bookmaker (as cited in Winston, 2012).
Nevertheless, bookmakers constantly adjust the lines for games to take into account the
public perception of the teams and to ensure that not an overwhelming amount of bets are placed
on one team. For example, assume Team A starts off as a three point favorite over Team B. If the
public believes Team A is more than three points better than Team B and a large influx of bets
come in on Team A, the bookmaker may opt to change the odds to designate Team A as a five
point favorite. While the people who have already placed bets on Team A as a three point
favorite retain that point spread, future bettors are now forced to bet with the new point spread. If
an overwhelming amount of bets are placed on Team B (now a five point underdog), the
bookmaker may again adjust the line to Team A β4 to create a situation where the amount of
money placed is more evenly distributed.
While the bookmaker will seek to inflate the line so that it is biased towards the favorite
to increase the probability of a profit, the bookmaker also needs to make sure there are enough
bets placed on the underdog to reduce the chance of a loss should the favorite cover the spread.
For instance, in the event the favorite covers the spread when a total of $770 is bet on the
underdog and a total of $1320 is bet on the favorite, the bookmaker would retain the $770 bet on
6
the underdog but would have to pay bettors the $10 profit per $11 bet ($1320 Γππ
ππ = $1200) and
thus lose $770 β $1200 = $430. Thus, the bookmaker would look to increase the point spread in
this scenario to encourage more bets on the underdog and ensure a profit regardless of the
outcome.
This point adjustment on the part of the bookmaker in turn creates an inherent advantage
towards the bettor. As the bookmaker is implementing a point spread to forecast the behavior of
the bettor rather than the most accurate forecast of the true point spread, it is possible for the
bettor to formulate profitable betting strategies (Warner, 2010). Even though in the long run
bookmakers design odds and lines to make successful bets increasingly difficult, this bias can be
exploited if bets are optimally placed on the hidden true point spread of the game rather than the
line established by bookmakers.
While there are many other forms of betting (i.e., parlays, progressive parlays, teasers,
etc.), this thesis will focus exclusively on betting on the outcome of games and the point spread
as these are the most common forms of betting.
Strategies Associated with Betting
While it is clearly important to determine on who to bet, it is imperative to ascertain how
much to bet in order to ensure maximum profits. For instance, it may not be advantageous to bet
the same amount on a team that has an estimated 60% chance of winning compared to a team
that has an estimated 80% chance of winning. In situations where there is a positive expected
value, it may initially seem that the goal of the bettor should be to maximize the amount he/she
currently possesses. However, the expected payoff is only maximized when the bettor gambles
7
her/his entire bankroll (Moya, 2012). This is disadvantageous as the probability of losing
approaches one after a number of bets, thus resulting in bankruptcy for the bettor. Conversely, a
bettor may choose to minimize the risk of total ruin by minimizing the amount bet on each game.
However, this too is unfavorable in that it leads to smaller expected profits (Moya, 2012). Thus,
staking plans are used to maximize profit, minimize losses, and minimize overall variance of
oneβs bank payroll. A discussion of various staking plans follow.
Fixed Staking
Fixed, or level, staking assigns a certain value for each bet (Moya, 2012). In other words,
the same amount is placed on each bet, regardless of the odds or percentage of winning the bet.
This staking plan has obvious drawbacks in that the betting amount doesnβt change when the
bankroll changes and the odds of success are not taken into account. Thus, there is a greater risk
of bankruptcy when the current bankroll is smaller than the initial bankroll as the staking plan
encompasses a greater proportion of the bank. Additionally, the opportunities to maximize
profits are mitigated when the current bankroll has expanded from its initial state, as a smaller
proportion of the bank is being used for betting purposes. Despite its inherent flaws, level staking
is used as the basis for other staking plans.
Percentage Staking
Percentage staking assigns a certain percentage of the current bankroll for each bet
(Moya, 2012). Thus, percentage staking addresses the weaknesses of fixed staking in that it calls
for larger bets when the current bankroll is larger than the initial bankroll, allowing for greater
profits, as well as calling for smaller bets upon a reduction from the current to the initial
bankroll, thus avoiding a further precipitous decline. The main drawback with percentage staking
8
is that it takes longer to recover losses than fixed staking as a smaller bet is assigned. However,
this form of staking is theoretically safer and sacrifices a short term profit with a more long term
approach (Moya, 2012).
Kelly Staking
Kelly staking assigns a variable percentage to be used for each bet. Rather than
maximizing the expected value for each bet, Edward Kelly posited that the expected log of the
payoff should be maximized as this maximizes the growth rate of the profit. (Moya, 2012). For a
given estimated probability of winning a bet, the optimal fraction to bet is:
π =πΓπΎπ°π΅βπΓπ³πΆπΊπ¬
πΎπ°π΅Γπ³πΆπΊπ¬=
π
π³πΆπΊπ¬ β
π
πΎπ°π΅ (1)
where π = the fraction of the bankroll that should be used to bet, WIN = profit made per $1 bet,
LOSE = loss per $1 bet, π = probability of winning the bet, and π = 1 β π = probability of losing
the bet (Winston, 2012). Thus, the Kelly betting strategy calls for high bets when the probability
of the team winning or the WIN odds are high. Conversely, if a teamβs probability of winning are
low or if the LOSE odds are high, the amount bet should be lowered. Given the probability of a
team winning, the optimal betting fraction can be used to calculate the log of the expected
growth rate of oneβs bankroll, as depicted in Eq. (2):
π₯π§(ππππππππ ππππππ ππππ) = π Γ π₯π§(π + πΎπ°π΅ Γ π) + π Γ π₯π§(π β π³πΆπΊπ¬ Γ π) (2)
Eq. (1) and (2) can be used for determining the most efficacious amount of a bankroll that should
be used to bet on both the outcome of a game and the point spread. For example, assume that a
team has a 70% of chance of covering the spread, where one earns $1 for a successful $1 bet and
9
loses $1.10 per $1 losing bet. Then the Kelly betting strategy would suggest to bet π =
(.πΓπ)β(.πΓπ.π)
πΓπ.π = 33.64% of the amount of money one has available to bet. Over time, the log of
oneβs bankroll would be expected to grow π. π Γ π₯π§(π + π Γ. ππππ) + π. π Γ π₯π§(π β π. π Γ
. ππππ) = . ππππ, indicating that the expected growth rate is π.ππππ =1.06647, or a 6.647%
increase in oneβs bankroll if one wins 70% of her/his bets.
The biggest failure of the Kelly approach is that, unlike traditional gambling, the true
probability of success is unknown and can only be estimated. Thus, it is not definitive that a
maximum profit will be attained (SillanpÀÀ & Heino, 2013). Perhaps more significantly, the
Kelly approach calls for large bets where the estimated probability of success is high. In the case
a bet on a far statistically superior team fails (a non-atypical occurrence in the NFL due to
injuries, overall parity, and upsets), a large proportion of the bankroll is lost due to that one bet,
thus resulting in a high risk of bankruptcy.
The half Kelly staking plan, in which only half of the optimal Kelly fraction is used, has
often been suggested to address this shortcoming. This method has 75% of the growth rate of the
Kelly staking plan but significantly reduces the probability of a damaging loss and thus decreases
the risk of bankruptcy (Moya, 2012). Due to the randomness of NFL games that prevent an
ability to calculate an exact probability of success, this thesis adopts the half Kelly staking plan
as the optimal betting strategy for maximizing profits while also minimizing potential losses.
10
CHAPTER 2
STRATEGIES USED FOR PREDICTIVE PURPOSES
In order to predict accurately the outcome of games and the margin of victory, two
separate statistical models must be determined for each type. With regards to predicting the
outcome of games, in which the dependent variable is binary as it only has two outcomes (win or
loss, excluding games that end in a tie), a logistic regression analysis can be performed (Bi &
Jeske, 2010). For predicting the point spread, a multiple linear regression analysis is preferred as
the dependent variable (margin of victory) is a continuous variable. The results from the
formulation of these models will be compared to other statistical models, such as the NFL ELO
ratings designed by Nate Silver and picks from the Microsoft Cortana system.
This study will also make use of predictions based on intuition in order to compare how
an intuitive analysis compares with a statistical analysis. As there are numerous variables in
football that are difficult to quantify and account for (i.e., injuries, momentum, weather), there
may be some benefits to going with intuition. Thus, a comparison from an intuitive perspective
to what the statistics suggest regarding the gameβs outcome or point spread may be noteworthy.
Along with 13 NFL experts and analysts from ESPN, we will use our own intuitive analysis as a
means of comparison to the statistical model.
11
CHAPTER 3
DATA DESCRIPTION
The data used for this research include information from weeks 5-17 of the 2014 NFL
season, a sample size of 195 games. The first four weeks of the season (61 games) will be used
as the training data set to predict the outcome of the games in week five. The data accumulated
from the first five weeks will then be used to predict the outcomes in week six. It will continue in
an iterative process so that outcomes from the last week of the NFL season (week 17) will be
predicted using data from the first 16 weeks (240 games). The exclusion of the first four games
for predictive purposes mirrors the approach of Warnerβs statistical formulation (Warner, 2010).
A number of variables exist that can be used for predictive purposes. The model developed in
this thesis incorporates some basic NFL statistics, such as points per game and turnovers, as well
as a few more complex analytical features, described in detail below.
Strength of Schedule (SOS) and Simple Rating System (SRS)
The strength of schedule feature is designed to take into account the quality of opponent a
team has faced. Each team is assigned a variable rating, thus forming a system of 32 equations
with 32 unknowns overall, with each unknown value representing each of the 32 teams in the
NFL. The simple rating system is the sum of a teamβs average strength of schedule and the
teamβs average margin of victory. For instance, if Team A plays Team B, Team C, and Team D
and has an average margin of victory of five points per game, then Team Aβs SRS can be
denoted as π»πππ π¨ = π + π
π (π»πππ π© + π»πππ πͺ + π»πππ π«). In order to determine Team
12
Bβs SRS, a separate equation needs to be set up for Team B based on Team Bβs strength of
schedule. This simple formulation provides an intuitive approach to designing a teamβs rating.
For instance, an average team would have a rating of zero, and a team that has faced an average
strength of schedule would have a rating exactly equal to its average margin of victory. The
major benefits to this system are that the rankings are easy to interpret; for example, a team with
an SRS of 5 indicates the team is five points better than an average team while a team with an
SRS of -3 is three points worse than an average team.
Adjusted Margin of Victory (AMOV)
Margin of victory can often be viewed as more informative in ranking a teamβs true
performance than winning percentage. However, one of the main drawbacks in simply using
margin of victory is that blowouts can oftentimes overstate a teamβs rating. For instance, a team
that wins a game by 20 points as compared to 5 points (a 15 point difference) is significant in
that the former would represent a convincing win while the latter would seem to indicate a close
win. However, if a team wins a game by 35 points as compared to 20 points (still a 15 point
difference), the difference is negligible as both games represent decisive victories. To
compensate for this, Keener (1993) adjusted the margin of victory in the following fashion:
πππ =πΊππ+π
πΊππ + πΊππ+π (3)
where πππ depicts Team πβ²s adjusted margin of victory (AMOV) against Team j, πΊππ represents
the amount of points scored by Team π against Team π, and πΊππ represents the amount of points
scored by Team π against Team π. The 1 and 2 on the numerator and denominator, respectively,
are included to ensure that a team doesnβt take all the credit in the case of a shutout. This
13
approach ensures that a teamβs margin of victory for a given game falls between 0 and 1. Warner
(2010) developed a similar system, but in a manner such that the AMOV was not distributed in a
linear fashion:
πππ = π¨π΄πΆπ½ = π. π + π. ππππ (πΊππ+π
πΊππ + πΊππ+πβ
π
π) β|π (
πΊππ+π
πΊππ + πΊππ+π) β π| (4)
where πππ(π) = {
βπ ππ π < ππ ππ π = ππ ππ π > π
} denotes the standard sign function.
The use of the square root function allows πππ to approach 0 or 1 quickly to compensate for
blowout wins. One of the drawbacks of Warnerβs method for adjusting the margin of victory is
that it can rank a team with a close win in a low-scoring game over a bigger win in a high-
scoring game. For example, a 6-3 win by Team A would be considered a close win, but the
AMOV = π. π + π. πβ|π (π+π
π+π+π) β π| = π. ππ, thus indicating a rather significant win for
Team A. However, a 38-24 win would be considered by many to be a more impressive win, and
yet the AMOV = π. π + π. πβ|π (ππ+π
ππ+ππ+π) β π| = π. ππ. This thesis compensates for this by
taking into account both the margin of victory and the AMOV, as described in the section below.
The model used in this thesis also simplifies Warnerβs formulation such that the adjusted margin
of victory falls between -1 and 1 rather than 0 and 1 to create a significant difference between a
win and a loss, as depicted in Eq. 5:
πππ = β|π (πΊππ+π
πΊππ + πΊππ+π) β π| (5)
14
The AMOV is kept between -1 and 1 so that a loss produces a negative AMOV while a win is
designated by a positive AMOV as opposed to the formulation designed by Warner in which a
loss has an AMOV less than 0.5 and a win produces an AMOV greater than 0.5. The approach
used in this thesis is designed in a manner such that for each game, the loserβs AMOV is the
negative value of the winnerβs AMOV.
Team Rating
Each teamβs rating is composed of the sum of the simple rating system (which takes into
account strength of schedule) and a teamβs average adjusted margin of victory. Since the AMOV
falls between -1 and 1, the simple rating system was adjusted to fall between 0 and 1, as depicted
in Eq. 6:
ππ =πΊπΉπΊπβπ¦π’π§(πΊπΉπΊπ)+π
π¦ππ±(πΊπΉπΊπ)βπ¦π’π§(πΊπΉπΊπ)+π (6)
where πΊπ is a teamβs adjusted SRS, πΊπΉπΊπ is a teamβs SRS, and πΊπΉπΊπ is the rating for the πππteam
in the NFL that Team π has faced. In using the SRS rather than SOS, the actual margin of victory
is still taken into account to ensure that low scoring wins donβt bolster a teamβs rating (as
described in the above section), but is combined with the AMOV to ensure that blowouts donβt
overstate or understate a teamβs rating. Thus, a teamβs rating, ππ can be denoted as:
ππ =π
ππβ (πππ+ππ)
πππ=π (7)
15
where ππ is the number of games Team π has played up to the point in the season the rating is
calculated. This rating system ensures that a big win against a mediocre team doesnβt buffer a
teamβs rating as much as a smaller win against a good team.
Vegas point spreads and other statistical models often take into account home-field
advantage by giving the home team 2.5 or 3 points. For example, if Team A has an SRS of 5 and
Team B has an SRS of 3, it can be interpreted that Team A is two points better than Team B on a
neutral field. When Team A hosts Team B, Vegas will give Team A three points to account for
home-field advantage, and thus Team A becomes a five point favorite. If Team B is playing at
home, Team B becomes a 1 point favorite. This system has significant flaws as it assumes every
team benefits by about three points from playing at home. However, some teams may have a
greater home-field advantage and/or play better at home than an average team. Conversely, some
teams may not have as decisive a home-field advantage or a team might play particularly better
on the road, thus not justifying the approach of giving the home team an additional three points.
To compensate for this, a home/road adjusted rating is composed for each team by calculating
the teamβs rating in Eq. 7 for (1) home games only and (2) road games only. This provides a
home rating and a road rating for each team. Thus, a home adjusted rating can be developed for
each team by placing greater weight (i.e. three times as much) on home games than road games.
ππππππ_ππ π=
π(ππππππ)+(ππππππ )
π (8)
Similarly, a teamβs road adjusted rating can be devised by placing three times as much weight on
road games than home games.
ππππππ _ππ π=
πππππππ +(ππππππ)
π (9)
16
The complete set of variables considered in this model are listed in Table 1.
TABLE 1: List of NFL statistics
Statistics
1. Win percentage
2. Points per game
3. Points per game allowed
4. Turnovers (giveaways) per game
5. Defensive turnovers (takeaways) per game
6. Net passing yards per attempt (PYA)
7. Defensive net passing yards per attempt (DPYA)
8. Home-field advantage
9. Strength of Schedule (SOS)
10. Simple Rating System (SRS)
11. Adjusted Simple Rating System (ASRS)
12. Adjusted Margin of Victory (AMOV)
13. Team Rating
14. Home/Road Adjusted Rating
In a vein similar to the neural network system developed by Khan, the underdogβs statistics
are subtracted from the designated favorite in order to simplify each variable (Khan, 2013). In
other words, if Team A is favored and averages 25 points per game and Team B averages 22
points per game, the data entry for the teamsβ difference in points per game is 25-22=3,
indicating that Team Aβs offensive output is three points per game better than Team B. The
analyses will reveal whether these variables are significant predictors in determining the results
of a game and the point spread.
For games that were played in London, the difference in each teamβs rating was used in place
of a home/road adjusted rating. Since the designated home team isnβt playing in its home
stadium, it did not warrant using the home/road adjusted rating statistic.
17
CHAPTER 4
METHODOLOGY
The odds of a team winning as well as the point spread was gathered from Football Locks, a
website that accrues the average Vegas team odds and point spreads for each game. In predicting
the outcome of each game for the upcoming week, data is accumulated for the variables listed in
Table 1 for each team from all the previous weeks. A stepwise selection procedure was run prior
to a logistic regression analysis to determine which subset of variables should be used in the final
model. As a result, it is possible that different variables may be used to predict the outcome of
games each week as the significance of a particular variable in the final model can change with
the data accumulated over the previous weeks. Similarly, a stepwise selection procedure was
conducted to determine the subset of variables used in a multiple regression analysis to forecast
the margin of victory. Table 2 details the variables used in both the logistic regression and
multiple regression analyses for each week of the season.
With regards to predicting the outcome of games, a call for a bet on a team will be placed
when there is a positive expectation of winning (i.e., probability of the team winning calculated
through the model multiplied by the odds for a win is greater than the probability of a loss
multiplied by the odds for a loss). The fraction of a bankroll to be used for betting purposes was
ascertained using the half Kelly staking approach. Because there were between 12 and 16 games
per week, the sum of the half Kelly fractions were greater than zero (i.e. if π²π = . ππ and π²π =
. ππ, the sum of the two optimal fractions to be used is 1.08 > 1 = 100% of the bankroll). To
adjust for this, the total bankroll was divided by the sum of the half Kelly fractions to determine
18
the amount of money to be bet on the game, as shown in Eq. 10:
π²π =π©
β π²πππ=π
(10)
where π²π represents the amount of money to bet on game π, π© is the total bankroll for that week,
and π²π is the half Kelly fraction. For the purposes of this thesis, a bankroll of $πππ was
estimated per week, where π represents the amount of games for that week. If π²π < π, then
π²π was rounded up to 5 since a minimum $5 bet is generally required when betting. Additionally,
π²π was rounded to the nearest dollar for each game to simplify betting procedures.
As it pertains to predicting the point spread, a call for a bet on a team will be determined
when the predicted margin of victory is significantly higher than the bookmakerβs point spread (a
difference of more than one point from the Vegas spread). When the probability of a team
winning or the predicted margin of victory is not significantly higher, it will be advised that no
bet be placed. Once again, $20 was used as the average bankroll for each game.
TABLE 2. List of Variables used each week in Statistical Calculations
Week Predicting Outcome of Games Predicting Margin of Victory
5 Win Percentage + Home/Road Adj. Rating SRS + Home/Road Adj. Rating
6 Home/Road Adj. Rating Home/Road Adj. Rating
7 Win Percentage + Home/Road Adj. Rating SRS + Home/Road Adj. Rating
8 Win Percentage + AMOV + Home/Road
Adj. Rating
Points per game + Points per game allowed
+ AMOV + Home/Road Adj. Rating
9 Home/Road Adj. Rating SRS + team rating + Home/Road Adj.
Rating
10 Win Percentage + Home/Road Adj. Rating Home/Road Adj. Rating
11 Win Percentage + Home/Road Adj. Rating Home/Road Adj. Rating
12 Home/Road Adj. Rating Home/Road Adj. Rating
13 Win Percentage + Home/Road Adj. Rating Home/Road Adj. Rating
14 Home/Road Adj. Rating Home/Road Adj. Rating
15 Home/Road Adj. Rating Home/Road Adj. Rating
16 Win Percentage + AMOV + Home/Road
Adj. Rating
SRS + team rating + Home/Road Adj.
Rating
17 Home/Road Adj. Rating Home/Road Adj. Rating
19
CHAPTER 5
RESULTS
Statistical Model
A Hosmer and Lemeshow Goodness-of-Fit Test was conducted to test to see if the logistic
analysis was a good fit for the model (Magel & Childress, 2011). The logistic equation was
deemed appropriate with a p-value > 0.6 for all 12 weeks used in the analysis.
Overall, the model predicted 130 out of the 194 games (67.0%) successfully, with one game
excluded due to it ending in a tie. However, when removing games where there was a negative
expected value and thus no bet was advised, the model predicted 112 out of 159 games correctly
(70.4%). This was comparable to other statistical models, such as the ELO system and the
Microsoft Cortana system, which both had a record of 135-59 (69.6%), over the same time
period used by the model developed in this thesis. The ELO ranking system went 176-79
(69.0%) over the course of the entire season, slightly higher than the 169-86 (66.3%) record
compiled by the Cortana system.
The statistical analysis was also compared to intuitive analyses from 13 NFL experts from
ESPN. The 13 experts compiled an average record of 131-63 (67.5%) after the fourth week of
the season, with a 166-89 (65.1%) record over the course of the entire season, comparable again
to the records compiled by the statistical analyses. To estimate how an average NFL fan would
fare predicting games solely on instinct, we also recorded our own intuitive analysis over the
course of the season and finished with a 126-68 (64.9%) record. A graphical representation of
the records for each system can be found in Figure 1.
20
FIGURE 1: Comparison of statistical and intuitive analyses. The analyses are compared for
predicting game outcomes between Weeks 5-17 of the 2014 NFL season. The probability
corresponding to the ESPN experts is calculated from the average of 13 ESPN experts.
With regards to the point spread, no bet was advised on 24 games due to a similar point
spread predicted by Vegas bookmakers and our model. This resulted in a 94-77 (55.0%) overall
record against the spread, which is slightly, but not significantly, higher than the 52.4% needed
to break even when betting against the spread (p = 0.25). The ELO ranking system was the only
other system that predicts against the spread, and it compiled a comparable 104-75 (58.1%)
record between weeks 5 and 17 and a 129-108 (54.4%) record overall.
We were unable to conduct an intuitive analysis using the ESPN experts as they did not pick
against the point spread. However, our own intuitive analysis led to a 98-95 (50.8%) record
between weeks 5 and 17 while we registered a mark of 130-123 (51.4%) over the entire season,
both of which fall below the threshold to break even.
62%
63%
64%
65%
66%
67%
68%
69%
70%
71%
Model Model (Betting) ELO Cortana ESPN Experts Intutiveanalysis
System
21
FIGURE 2: Accuracy in predicting the outcome of games. The graph depicts the accuracy of
predicting games using our statistical model and when using the model only for games with a
positive expectation of winning (games in which betting is advised). Aside from Week 5 and
Week 16, our model performed better when recommending betting only on games with a
positive expectation as compared to betting on every game.
Betting Strategies
Using the half-Kelly staking strategy and assuming a bankroll of $20 per game, a bankroll of
195 games Γ $20 = $3900 is required over the course of the season when betting on the outcome
of a game (an actual total of $3924 was used for betting on the outcome of games due to
rounding to the nearest dollar for each game). Overall, the half Kelly staking strategy resulted in
a profit of $107.25 over the season, a 2.73% return on investment. A fixed staking approach
where exactly $20 was bet on each game (and thus a total of $3900 on the season) accumulated a
profit of $154.83, a 3.97% return on investment. Surprisingly, the fixed staking approach also
recorded profits in 10 of the 13 weeks, as opposed to a profit return in only eight out of the 13
weeks when using the half Kelly staking approach.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
5 6 7 8 9 10 11 12 13 14 15 16 17
Acc
ura
cy P
erce
nta
ge
Week
Our Model Our Model for Positive Expectation
22
FIGURE 3: Betting profits by week. The betting profits are operated under the assumption of a
bankroll of $20π each when betting on both the outcomes and point spread for π games per
week.
Since the point spread model is not equated with a probability but only used for betting
purposes when there was a one point differential between our model and the Vegas spread, no
optimal fraction was needed. The $20 bet on games against the spread for each of the 173 games
($3420 on the season) resulted in a total profit of $186 on the season, a 5.44% return on
investment on the season. Figure 3 contains the complete breakdown of the profit accumulated
from betting on both the outcome and point spread by week. For the season, betting on both the
outcome and point spread resulted in a total profit of $107.25 + $186 = $293.25, or a 3.99%
yield rate.
-200.00
-150.00
-100.00
-50.00
0.00
50.00
100.00
150.00
200.00
5 6 7 8 9 10 11 12 13 14 15 16 17Pro
fit
Week
Outcomes Point Spread
23
CHAPTER 6
DISCUSSION
Formulating a successful betting approach against established bookmakers has proven to be a
difficult endeavor. Previous studies, such as the one conducted by Warner, were not able to
develop a model that had a sustainable level of success against Vegas odds. For instance, even
though Warnerβs initial feature set included a wide array of variables, such as a ranking system
whereby the margin of victory was modified as well as temperature difference to determine if a
teamβs level of play was affected by the temperature at game time, his model only had a 64.36%
success rate in predicting the outcome of games and a 50.90% success rate against the spread
during the course of the 2008 and 2009 seasons (Warner, 2010). Both of these percentages were
below the threshold needed to make money when betting against Vegas.
This model was able to build on the ranking system for each team by taking into account the
adjusted margin of victory as well as a teamβs strength of schedule. The calculations in this thesis
correctly predicted 67.0% games and 55.0% games against the spread. However, a sample size of
one season is particularly small and further compounded by the fact that multiple games had to
be excluded because of the similarity in the predictions by both the models of this thesis and the
one developed by Vegas.
One of the significant drawbacks to the model is its inability to take into account a playerβs
value or contribution to a teamβs success. In other words, an injury to a star player on a football
team would dramatically affect the odds of a team winning a game as well as the point spread in
Vegas but would be unaccounted for in this model. This has frequently been a problem in other
24
statistical models as Vegas has the advantage of heuristics and other real life information that
cannot be explained by statistics only. Thus, investigation into a manner of rating a teamβs
players to account for injuries could prove beneficial, such as the implementation of a rating that
incorporates wins above replacement for each player by examining the difference in a teamβs
performance with a starter as compared to the use of an average-level replacement player
(Hughes, Koedel, & Price, 2014). A ranking system for each individual player in the NFL is
outside the scope of this thesis. However, accounting for each playerβs individual impact on a
teamβs overall performance can help in modifying the rating system when a certain player is
unable to play due to an injury or suspension.
While the model had some success in its statistical methodology, which in turn led to a
successful return on investment on both the outcome of games (2.73%) and margin of victory
(5.44%), the appropriateness of a half Kelly staking strategy for predicting games requires
additional testing because a fixed staking strategy resulted in a greater profit (3.97% return on
investment). The failure of the half Kelly approach was largely due to the fact that huge losses
were accrued when large bets were placed on heavy favorites and those favorites were upset.
Because only small profits are attained when heavy favorites win and large losses ensue when
those favorites lose, any one game that results in an upset when bets are solely placed on
favorites can result in an overall deficit. Since upsets are commonplace in the NFL due to overall
league parity and other mitigating factors (i.e. injuries), a fixed staking strategy has some merit
and some balance between the two staking strategies might prove more beneficial. While the half
Kelly staking approach was used to predict games, only a fixed staking approach was used when
predicting margin of victory. It may prove advantageous to determine if there are optimal betting
strategies that can be used when betting against the spread.
25
Additionally, further investigation is warranted into the possible use of the predictive model
for postseason play. This thesis covered regular season games only due to other circumstances
that can affect postseason games. Nevertheless, a test trial was conducted for the 2014 NFL
playoffs and the model recorded a 7-4 overall record in predicting games as well as a 6-4 record
against the spread. When excluding playoff games where no bet was advised because of a
negative expected value in profit, the model registered a 4-1 record against the spread. However,
this is an incredibly small sample size for playoff games and thus an examination into how this
model can predict playoff games across multiple seasons is needed before any definitive
conclusions can be made.
This study provides an innovative examination into how to construct a team rating while
taking into consideration strength of schedule and an adjusted margin of victory in addition to
accounting for home-field advantage. Though this predictive model achieved a certain level of
success in predicting the outcome of games and margin of victory and attained an overall profit
from the half Kelly staking approach, further research needs to be conducted to determine if both
the model and betting strategies are sustainable over time.
26
REFERENCES
Aziz, J (2014). How did Americans manage to lose $119 billion gambling last year? The Week.
Bi, Y., & Jeske, D. R. (2010). The efficiency of logistic regression compared to normal
discriminant analysis using class-conditional classification noise. Journal of Multivariate
Analysis, 1622-1637.
Hughes, A., Koedel, C., & Price, J. (2014). Positional WAR in the National Football League.
Department of Economics, University of Missouri-Columbia.
Keener, J. (1993). The Perron-Frobenius theorem and the ranking of football teams. SIAM
Review, 80-93.
Khan, J. (2013). Neural Network Prediction of NFL Football Games.
Magel, R., & Childress, G. (2011). Examining the outcome effects of the turnover margin in
professional football. International Journal of Sports Science and Engineering, 147-152.
McIntyre, D. & Sauter, M. (2011). The major league teams Americans bet on the most. 24/7
Wall St.
Moya, F. E. (2012). Statistical Methodology for Profitable Sports Gambling. Simon Fraser
University.
SillanpÀÀ, V., & Heino, O. (2013). Forecasting Football Match Results - A Study on Modeling
Principles and Efficiency of Fixed-odds Betting Markets in Football. Aalto University
School of Business.
Spear, G. (2013). Think sports gambling isn't big money? Wanna bet? NBC News.
Warner, J. (2010). Predicting Margin of Victory in NFL Games: Machine Learning vs. the Las
Vegas Line.
Winston, W. (2012). Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use
Mathematics in Baseball, Basketball, and Football. Princeton University Press.