I do not disagree with my colleague Dave Willoughby about many things, but for a few years now we have had very different thoughts about the value of asking our analysts to collect “key entries”. With this in mind and with a deadline for a first blog submission of the season rapidly approaching, I turned to our new football data scientist Sagar Jilka to help me prove Dave wrong once and for all.
In pursuit of this noble end I will use this piece to investigate just how well average key entries per team per game compare with other key performance indicators, such as chances and goals. I will also look into whether key entries have value when used alone, or if they need to be paired with other metrics in order to give any real insight into team performance.
However, let’s just begin with some background on how we have collected key entries to date.
Before November of last year Stratagem’s version of a key entry was an instance where an attacking team achieved possession of the ball in the final third of the pitch. This was tracked only once at the first point of contact with the ball in the final third, with “right”, “centre”, “left” locations and “pass”, “run”, “turnover” methods attached. For all intents and purposes “right” and “left” simply captured the spaces outside of the width of the 18-yard box, while the methods of entry explained how the ball crossed the imaginary “line” of the final third in the first place.
It is important to note here that we would not track the ball crossing back and forth over this imaginary line if a team was in the build-up phase, in order to avoid multiple entries during one period of sustained possession. Our belief was that capturing it this way would further inflate the figures of already strong possession-based teams like Arsenal and Barcelona, which would in turn cause our analysts and models to overrate them.
After a concession to Dave over the value of these final third entries, since November 2015 we now determine a key entry as an instance where the attacking team achieves possession of the ball in the last 18 yards of the pitch. The main reason for this move was to reduce the time burden on our data collection analysts, but we also felt that it would reduce a significant amount of “noise” around the metric, as ultimately you do not want to reward a team for just getting the ball into the final third time and time again.
The 18-yard entries function in much the same way as the final third entries did before them, with locations collected in the same way but now labelled as right/box/left and the pass/run/turnover methods retained. However, we were forced to add “shot” as an entry method in order to cover off situations where a player had an attempt from further than 18 yards that was recovered by a teammate on a rebound, or deflected out for a corner or throw-in.
Naturally, the average numbers of key entries collected across the board dropped significantly after this change but we felt that the data set we ended up would still be a lot more useful than the one we had before it. I could begrudgingly see Dave’s point on the old entries, which was that they were not useful enough versus the classic total possession statistic to justify the amount of time spent collecting them (at times you could get a game with over 150 individual entries).
So now that you are fully up to speed on the background, it is time to find out whether the move to 18-yard entries has been as successful as hoped.
To start with a decent benchmark, I investigated data from the 2015/16 English Premier League season to see how the league table would have ended up if the teams had been ranked on goal difference instead of points:
As you would expect, the correlation of goal difference and total points is strong and there are just 15 total positional differences when the whole league is taken into account. Stoke City are responsible for the biggest variance in outcome based on goal difference and points, finishing six places worse off when measured against how many they score and let in.
At the high end of the table things look pretty good, with the winners, top four and top eight all being intact and showing relatively little variance, while at the low end the bottom two stay exactly as they are. In this scenario Bournemouth would have been relegated instead of Newcastle, which would have made me a very happy man indeed.
To go deeper, I looked at data from the same season to see how the table would have finished if the teams had been ranked on great chance difference instead of points. As a reminder, in 2015/16 our great chances had an average conversion rate of 45% across the board and represented situations where the attacking player would have been expected to score:
As one might expect, the overall correlation begins to weaken a little when we use this metric, with 21 total positional differences now present. Still, great chances prove to be a very good measure of team performance, with only a quartet of significant outliers. Crystal Palace would have benefitted greatly from this sort of ranking by jumping six positions, but Manchester United (-5), Stoke City (-5) and Swansea City (-6) in particular would have been much worse off.
Stoke appearing in both lists is quite telling, especially when their performance at the beginning of this season is taken into account, because it seems they ended up significantly better off in the table than their performance metrics should have allowed. If you followed our articles last season you may remember one on Jack Butland’s amazing great chance stopping abilities, which we believed was largely responsible for the overachievement of Mark Hughes’ side in the actual table.
Away from the outliers things look generally stable across the board, with perfect placement of the top three and bottom two being a notable thing to highlight. Interestingly, Southampton would have been worthy of a Champions League berth if rated on their chance differential, while this time Swansea would have been for the drop instead of Newcastle. I’m beginning to think we were hard done by…
Finally, it was time to look at how key entry differential would have made the final league table look (again using data from the 2015/16 English Premier League):
The first obvious thing to note is a further weakening of the correlation, with a total of 39 positional differences now present, versus 21 for great chances and 15 for goals previously. The second is that we have gone from having three teams perfectly placed with goals, to five teams perfectly placed with great chances, to just one team (Everton) being perfectly placed when key entry differential is compared to points.
There is one club that sticks out like a sore thumb here, and to the surprise of absolutely nobody that club is Leicester City. The 2015/16 English Premier League Champions would have finished an enormous 12 places worse off if they were evaluated on their key entry differential instead of points won. After all of the type dedicated to Leicester in the analytics field over the last twelve months I would assume that by now you understand why this is the case, but for safety’s sake I will attempt a quick explanation:
For much of last season Leicester were able to deploy a reactionary, counter-attacking approach that relied heavily upon inviting territorial pressure and limiting their opponents’ quality chance creation. This was combined with a supreme ability to burst forward on breaks that led to a large number of high quality goalscoring opportunities and ultimately made them a very difficult proposition. Even when teams began to modify the way they approached Claudio Ranieri’s men after Christmas, the defensive element of their success continued and they began to add some crucial 1-0 wins to their repertoire.
Due to their “different” way of achieving results, Leicester were something of a puzzle for people reliant upon expected goals models last season and thankfully it also took the betting market a long time to adjust to their quality. In addition to that it seemed that a lot of people simply had a hard time accepting the fact that a team could post exceptional numbers for an entire season, especially when failing to take into account Ranieri’s ability to field an incredibly stable starting eleven over a significant period of time. The old “reversion to mean” argument was trotted out plenty and some may be pointing to Leicester’s start to this season as proof of that, ignoring elements such as Kante departing for Chelsea, his replacement Mendy succumbing to injury almost immediately, a general “after the Lord Mayor’s show” impact on overall motivation/psychology and the burden of extra football due to their Champions League participation (where they seem to be doing alright, funnily enough).
We will return to Leicester shortly, though, as there is still more to discuss in the latest table.
Bournemouth would have benefitted more than anybody else from the key entry differential method of evaluation, climbing nine places to a Europa League spot in seventh. Chelsea, Norwich and our old friends Stoke also raise some eyebrows, with the former pair jumping seven places apiece – Chelsea finishing in third and Norwich well clear of relegation. Stoke are seven positions worse off, however, falling dangerously close to the relegation spots. At this stage of the article I begin to wonder whether we missed a trick in not laying Stoke for top ten before the start of the season…
Other interesting quirks of the key entries table see Manchester City crowned as champions, Manchester United finishing runners-up and Liverpool rounding off the top four. Arsenal surprisingly finished outside of the top four AND ended up behind Tottenham, while the woeful Aston Villa survived at the expense of Watford and Sunderland. Finally, Newcastle were rock bottom by a considerable distance, so maybe we weren’t hard done by after all…
Based on these simple comparisons of goals and points, great chances and points and key entries and points it is quite clear to me that key entries have some value as a metric to independently measure team performance. However, it is also abundantly clear that they are much less valuable than goals and great chances when used alone.
Although this is no great surprise, it is not going to do me many favours in my argument with Dave and so I am not satisfied to leave things there. My next step was to see if key entries could be combined with those two other metrics to create something more powerful. I was specifically keen to try and “solve” the Leicester problem that arose in the last table, which got me thinking about “efficiency”.
This was a concept that came up in a few of my posts last season, as it became quite obvious early on that there were some metrics that showed why Leicester were far and away the best side in the division. So at Stratagem we now utilise “attacking efficiency” and “defensive efficiency” metrics to see how effective a team is at both ends of the field. The simple calculation we came up with is to take key entries (for) and divide them by goals (for) or great chances (for) when considering the attack. If thinking about the defence it is just a case of taking key entries (against) and dividing them by goals (against) or great chances (against).
Let’s start with a look at attacking and defensive efficiency using goals:
Unsurprisingly Leicester rank first in both the attacking and defensive categories, as they needed just 20 key entries for every goal scored and their opponents needed 43 key entries for every goal scored. The defensive number in particular is quite remarkable, being 5 key entries per goal more than it took teams to score against both Arsenal and Tottenham.
The presence of a number of lesser lights in the upper echelons of the attacking efficiency table is no great shock, with West Ham known for having “high shot numbers” last season and Sunderland possessing the evergreen Jermain Defoe. Everton’s attacking unit has so much quality that it still made the most of good situations more often than not, with Roberto Martinez’ issues typically being at the other end of the pitch and through the middle. There is a glut of generally strong performing teams immediately below the top four places, but one of the bigger sides worth highlighting is Manchester United. Louis Van Gaal’s love of possession above all else meant that United were the third most inefficient team when attacking last season, taking almost twice as many key entries as Leicester to score a goal.
On the other side of the coin Van Gaal did make United fairly organised and difficult to score against, however, and they only trailed the actual top three in terms of their defensive efficiency. Southampton were again good across the board, making the excellent Ronald Koeman’s move to Everton all the harder to stomach for Saints fans. West Brom were able to make up for their inefficiency in attack by being hard to score against, which is no great surprise under the leadership of Tony Pulis. Finally, Bournemouth being inefficient at both ends of the pitch is easily explainable when looking back at the original three tables posted, as they scored infrequently, conceded frequently and had many more key entries for than against.
In order to make some sense of using attacking and defensive numbers in a non-combined form, it seemed easiest to take an average of a team’s positions in both categories and make a ranking to compare against the actual league table from 2015/16. For instance, as Leicester were ranked first in attacking efficiency and first in defensive efficiency this is simply a case of doing the following calculation:
1 + 1 / 2 = 1
This puts Leicester at the top of the pile, with Aston Villa again at the bottom due to being ranked 20 in the attacking category and 19 in the defensive category for an overall average of 19.5 (the lowest average score of all 20 teams).
The interesting thing from the combined table is that it shows better overall correlation than key entry differential, with a total of 33 positional differences present. There are three teams judged to be in exactly the “right” positions, with the biggest outlier being Sunderland sitting nine places above where they actually finished. Their efficiency in attack in particular is what kept them up, as they attacked infrequently but could rely on scoring thanks to Defoe’s superior finishing. There is a lot of variance in the middle of the table, as tends to be the way, but the bottom two and top three remain intact.
Ultimately this method of evaluating teams shows an improvement over using key entries alone to judge performance, plus it also gives a great indication of why certain teams were able to confound expected goals models over a sustained period of time.
To round off, let’s see how attacking and defensive efficiency looks when using great chances:
The methodology used here is exactly the same, and while the total number of positional differences is identical to what it was using key entry differential (39), I am happier with how the extreme parts of the table look in this iteration.
There are still some huge moves visible, with strong teams like Manchester City (-11) and Manchester United (-12) being punished for having a lot of territorial pressure without converting it into incredibly regular chances to score. At the other end of the spectrum Crystal Palace (+10) make a huge charge based on their great chance efficiency numbers, while Newcastle (+8) again manage to avoid the drop, this time finishing in an improbable tenth!
As we have gone further down the rabbit hole it has become clearer and clearer that there is currently no “catch-all” metric to explain every element of team performance in football. I would contest that key entries maintain huge value when used in conjunction with other key performance indicators, but will accept that alone they do not provide enough information to be of a high value.
Essentially, a combination of the most proven measures of performance like goals scored and conceded, chances created and allowed plus territorial information like key entries or possession looks the best recipe to get an accurate read on an entire league. In my opinion it all comes down to the fact that some teams will be good at some things and other teams will be good at other things. Scoring and preventing goals is still the be-all/end-all, so we need to utilise every metric at our disposal to predict a team’s capacity at both ends of the pitch.
Rich Huggan (@AnalysisRich)
NB: I would be keen to know your opinions on the value of key entries as a standalone metric and/or as something to be combined with other performance indicators. Do you agree with me, or do you think Dave is right? Could we be using key entries in a better way? Should we get rid of them altogether? Could we replace them with something else? Let me know on my Twitter account (posted above) or at email@example.com.
Finally, I would like to send an alert to any budding data scientists out there who are frustrated with a lack of football data access. We are looking to expand upon the partnerships we have with @AnalyticsFC and @zorba138 in order to expose more people to our unique StrataData offering. We offer full API access in return for regular written content using elements of the data itself, which can be posted on external websites or any form of social media.
If you would be interested in working with us, then please get in touch via one of the suggested methods above.