Quantitative Research (QR) #1: Stratagem Football Model

Introduction

This document contains a short description of the football models that Stratagem supply in the StrataBet platform.

Pre-match final score

The pre-match (or dead-ball) model is based on the well known Dixon and Coles [1] final score model for football matches. In this model the number of goals scored by each team is assumed to follow a Poisson distribution which has a single parameter that can be interpreted as the expected number of goals for that team. For a particular match these parameters are found by a combination of other values:

  • Home team expected goals
  • Home team attacking strength
  • Away team defensive strength
  • Home team goals mean
  • Away team expected goals
  • Away team attacking strength
  • Home team defensive strength
  • Away team goals mean
In a basic version of this model the home and away goals are considered independently, however this is not an accurate reflection of reality and has a tendency to under-estimate the number of draws that are observed in football. To account for this Dixon and Coles introduce an additional dependence parameter is that modifies the probability of a draw given values of the home and away expected goals. Stratagem take this a step further and introduce additional parameters (similarly structured) that account for bias across a greater range of common football scores.

The parameters in the above models (team strengths, goal means, dependence) are fitted with the most appropriate values based on historic football scores data. In the Stratagem models the parameters are chosen separately for each country but within each country data from multiple leagues is used (the number of leagues being dependent on the data available from that country). Additionally, more recent (and thus more relevant) data is weighted more highly when computing these values. The method and scale of the weighting of football score data in time is arrived at by optimization over the predictive accuracy of the model. For the Stratagem models we use five years of historical data.

Once such a model has been defined and the parameters have been fitted, the probability of any score in any match between two teams can be estimated. This allows predictions to be made on any quantity relating to the final score of the match such as the expected number of total goals in the match, the fair price of a home win etc. In the Stratagem model these predictions are made for future matches after each match day so that every prediction is always based on the most up-to-date data available.


Further modifications of the model include:
  • Division of goal mean parameters into competition and country to reflect the different scoring rates/home advantages across leagues
  • Separate discounting to allow team parameters to change at a different rate to the country/league parameters
  • A connected model of shots on target so that the shots on target recorded in a match will impact the attack and defence strength of the teams
  • Additional down-weighting of historical fixtures at the end of the season that have little impact on the final league standings
  • More advanced handling of teams that move between divisions to account for imperfect separation of domestic leagues
  • Truncation of goal difference at extreme levels to avoid one-sided games having a disproportionate impact on the team strengths

Competition

The competition model is an extension of the pre-match final score model. The final score model is used to estimate the probability of each score in each of the remaining fixtures in the competition. The remainder of the season is then simulated many times with the score in each match determined at random in proportion to the pre-match model probabilities. Each simulation yields a final table for the competition and by combining the results of all the simulations we can estimate quantities such as:
  • Probability of a team finishing in a range of positions (e.g. relegation places)
  • Expected points of a team
  • Expected goal difference of a team

In-play final score

The in-play model is based on the model described by Dixon and Robinson [2]. This model assumes that goals are scored via an inhomogeneous Poisson process – that is there is a goal scoring rate for each team and this rate evolves over the course of the match. Changes in the goal scoring rate of both teams are caused by:
  • Change in score
  • Passage of time (to capture scoring rates being higher as the match progresses)
  • Specific times (to capture the recording of goals at the end of halves and a depression in scoring rate at the start of each half)
  • Stratagem’s in-play model also accounts for red cards

The original version of this model fits these rate changing parameters from historical data alongside team strength and mean goal parameters similar to the pre-match model. Stratagem’s model goes in a different direction, using a pre-defined goal expectation for both teams as an additional factor, the idea being that there is a view on the match at the kick-off and then the model evolves based only on what happens in the match (and based on what has happened in the course of previous matches).

These pre-defined goal expectations are calculated from the market where possible. This is achieved by taking market odds at kick-off and reverse engineering a model similar to the pre-match model described above – so we go from predictions (odds) to home and away expected goals instead of the other way round.

The predictions from this model for a specific fixture are found by simulating the match many times. In each simulation, goals and red cards are generated according to the model and the scoring rates are updated accordingly. At the end of the simulation this yields a final score and by combining the results of all simulations we get an estimate of the probability of each score and, as in the pre-match model, can estimate expected goals for both teams or the fair odds of any goal based betting market. This method can also be applied at any game state (match time, score, red cards) so the predictions are updated live as the match progresses.

References

[1] M. Dixon and S.G. Coles, 1997, Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Applied Statistics, 46(2), 265-280
[2] Mark J. Dixon and Michael E. Robinson, 1998, A Birth Process Model for Association Football Matches, Journal of the Royal Statistical Society. Series D (The Statistician) , Vol. 47, No. 3 pp. 523-538

Trade Idea #1: Lay Liverpool for Top 4 at 3.1 or better

Position

Lay Liverpool for Top 4 at 3.1 or better (= 32% chance of finishing 4th or better).

Thesis

Our analysts’ forecast was that Liverpool’s price to finish in the Top 4 was too short. The rationale was their poor performance in 2014/15, a complete change in coaching staff over the summer, a questionable transfer policy that did not seem to address their primary needs in the squad, an incredibly difficult run of early fixtures in the first part of the league calendar and taxing involvement in the Europa League to contend with throughout the year. With all of these question marks around the club and with manager Brendan Rodgers under pressure from the outset, a decision was made to lay Liverpool when the price was at its shortest at the earliest possible time.

Plan

We set an entry around 2.8-3, with a stop loss at 2 and an upside of around 10, making the trade a more than 11/1 risk reward (lose 1 unit of risk or make 11 units if you get out at that point).

Timing

An opportunity was identified after unconvincing 1-0 wins in the opening two games and a spirited 0-0 draw at Arsenal, which led the market to overrate Liverpool much more than it had before the season began. The price was ~2.5 here, while it was ~4.5 before the opening weekend.

Quantitative Analysis / Keys to Performance

We compared Liverpool this year to:

  • Teams that finished in the top 4 in the last few seasons
  • Teams from this season that show comparative performance but variant price difference
  • Liverpool’s performance to last year, where it finished 6th.

Comparison of Liverpool’s output compared to the last two 4th placed teams:

Comparison of 2014/15 Liverpool to 2015/16 Crystal Palace, Swansea and Tottenham:

Our view is that Liverpool is mispriced and should be trading closer to 10 given their recent performance, Tottenham can be taken at 13 currently.

The data from the first 4 rounds shows that Liverpool are underperforming in attack and having to work very hard defensively to keep clean sheets, which is not typically the nature of a successful team. Our proprietary analyst data ranks them 18/20 for Attacking Efficiency and 8/20 for Defensive Efficiency, while our Fair Score Model believes their current position of 7th is inflated. Based on their creation and concession of Great and Good Chances, in addition to some other key factors, 11th is actually their “fair” ranking at present.

This season’s distribution of goals. Liverpool stand 4th from left with an average of just 0.5 goals.

Comparison of Goals

  • Liverpool scored on average 0.5 goals in their fixtures this year.
  • The average 4th team for 2014/2015 (Man Utd) scored 1.6, and for 2013/2014 (Arsenal) was 1.9.
  • Liverpool would need to score 1.79 goals (or higher) on average for the remainder of the season match what the 4th teams scored on average in the last 2 seasons.
  • They would need to score on average more than Arsenal, or Man City or Man Utd did in the last season. We think that is unlikely because they have not shown any significant improvement in the attacking unit.

We compare Liverpool to Man Utd, Arsenal, Man City of 2014/2015 and versus Swansea and Arsenal this year…

Analysis to past years

On a 10 year basis, the average number of points to finish 4th is 68.8 and the PPG is 1.8, with the spread to first being 0.5 and 0.2 to in PPGs and 0.13 to 5th.

Liverpool currently have 7 points, which means to reach the average it needs approximately 62 points, implying a 1.83 PPG. Roughly, it would translate to a record of 17 wins, 10 draws and only 7 losses in the next 34 games as the break even rate (17*3 +10*1 + 0 + the existing 7 ~68). We think that is a very tall order for any team, particularly one that has not played the stronger, higher scoring teams so far this year.

Risk Factors (Potential for the trade to go against us)

  1. Liverpool score more than 2 goals on average over the next 7-8 fixtures
  2. For this to happen there would need to be a meaningful increase in the following stats:
  • Goals Scored
  • Chances Created
  • Attacking Efficiency