Quantitative Research (QR) #1: Stratagem Football Model

Introduction

This document contains a short description of the football models that Stratagem supply in the StrataBet platform.

Pre-match final score

The pre-match (or dead-ball) model is based on the well known Dixon and Coles [1] final score model for football matches. In this model the number of goals scored by each team is assumed to follow a Poisson distribution which has a single parameter that can be interpreted as the expected number of goals for that team. For a particular match these parameters are found by a combination of other values:

  • Home team expected goals
  • Home team attacking strength
  • Away team defensive strength
  • Home team goals mean
  • Away team expected goals
  • Away team attacking strength
  • Home team defensive strength
  • Away team goals mean
In a basic version of this model the home and away goals are considered independently, however this is not an accurate reflection of reality and has a tendency to under-estimate the number of draws that are observed in football. To account for this Dixon and Coles introduce an additional dependence parameter is that modifies the probability of a draw given values of the home and away expected goals. Stratagem take this a step further and introduce additional parameters (similarly structured) that account for bias across a greater range of common football scores.

The parameters in the above models (team strengths, goal means, dependence) are fitted with the most appropriate values based on historic football scores data. In the Stratagem models the parameters are chosen separately for each country but within each country data from multiple leagues is used (the number of leagues being dependent on the data available from that country). Additionally, more recent (and thus more relevant) data is weighted more highly when computing these values. The method and scale of the weighting of football score data in time is arrived at by optimization over the predictive accuracy of the model. For the Stratagem models we use five years of historical data.

Once such a model has been defined and the parameters have been fitted, the probability of any score in any match between two teams can be estimated. This allows predictions to be made on any quantity relating to the final score of the match such as the expected number of total goals in the match, the fair price of a home win etc. In the Stratagem model these predictions are made for future matches after each match day so that every prediction is always based on the most up-to-date data available.


Further modifications of the model include:
  • Division of goal mean parameters into competition and country to reflect the different scoring rates/home advantages across leagues
  • Separate discounting to allow team parameters to change at a different rate to the country/league parameters
  • A connected model of shots on target so that the shots on target recorded in a match will impact the attack and defence strength of the teams
  • Additional down-weighting of historical fixtures at the end of the season that have little impact on the final league standings
  • More advanced handling of teams that move between divisions to account for imperfect separation of domestic leagues
  • Truncation of goal difference at extreme levels to avoid one-sided games having a disproportionate impact on the team strengths

Competition

The competition model is an extension of the pre-match final score model. The final score model is used to estimate the probability of each score in each of the remaining fixtures in the competition. The remainder of the season is then simulated many times with the score in each match determined at random in proportion to the pre-match model probabilities. Each simulation yields a final table for the competition and by combining the results of all the simulations we can estimate quantities such as:
  • Probability of a team finishing in a range of positions (e.g. relegation places)
  • Expected points of a team
  • Expected goal difference of a team

In-play final score

The in-play model is based on the model described by Dixon and Robinson [2]. This model assumes that goals are scored via an inhomogeneous Poisson process – that is there is a goal scoring rate for each team and this rate evolves over the course of the match. Changes in the goal scoring rate of both teams are caused by:
  • Change in score
  • Passage of time (to capture scoring rates being higher as the match progresses)
  • Specific times (to capture the recording of goals at the end of halves and a depression in scoring rate at the start of each half)
  • Stratagem’s in-play model also accounts for red cards

The original version of this model fits these rate changing parameters from historical data alongside team strength and mean goal parameters similar to the pre-match model. Stratagem’s model goes in a different direction, using a pre-defined goal expectation for both teams as an additional factor, the idea being that there is a view on the match at the kick-off and then the model evolves based only on what happens in the match (and based on what has happened in the course of previous matches).

These pre-defined goal expectations are calculated from the market where possible. This is achieved by taking market odds at kick-off and reverse engineering a model similar to the pre-match model described above – so we go from predictions (odds) to home and away expected goals instead of the other way round.

The predictions from this model for a specific fixture are found by simulating the match many times. In each simulation, goals and red cards are generated according to the model and the scoring rates are updated accordingly. At the end of the simulation this yields a final score and by combining the results of all simulations we get an estimate of the probability of each score and, as in the pre-match model, can estimate expected goals for both teams or the fair odds of any goal based betting market. This method can also be applied at any game state (match time, score, red cards) so the predictions are updated live as the match progresses.

References

[1] M. Dixon and S.G. Coles, 1997, Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Applied Statistics, 46(2), 265-280
[2] Mark J. Dixon and Michael E. Robinson, 1998, A Birth Process Model for Association Football Matches, Journal of the Royal Statistical Society. Series D (The Statistician) , Vol. 47, No. 3 pp. 523-538

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s