My colleague Rich Huggan recently blogged on the value of collecting key entries, specifically looking at how they can be used in conjunction with other metrics to evaluate teams. In his post he mentioned how we don’t see eye-to-eye on key entries, so I wanted to use this piece as my response.
Firstly, I will begin by saying that a lot of the data we collect is outstanding and can provide a true edge, be it on trading, evaluating teams or coming up with innovative ways to look at things often missed by the casual football fan. The key area of what we store is our chance data, which is very granular and, in my opinion, one of the best pieces of information that can be gathered from a game of football. Our corner information is also extremely high level, because we track how and where corners are won, before looking for areas where this can be exploited in the market.
Really it is just key entries I’m not so fond of, but before I get into why I’ll look at one statement that Rich made in his blog:
“I could begrudgingly see Dave’s point on the old entries, which was that they were not useful enough versus the classic total possession statistic to justify the amount of time spent collecting them.”
Let’s be clear, I don’t like possession as a statistic. In fact, other than as a very arbitrary measure of who had more of the ball in a game I tend to believe that it’s almost useless. I’ll refer to a recent example that Oliver Gage (Performance Analyst at Houston Dynamo) recounted on his blog:
Football media use the possession metric extensively, generally to try and explain which team should have won a game, but what does it actually show? That one team had more of the ball than the other? It very rarely goes deep enough to tell us which areas of the pitch a team had the ball in, or how the possession numbers changed depending on game state. At times you might be lucky enough to get a passing mention of a lack of possession being a deliberate tactic by a team to suit their counter-attacking approach.
As we all know there are many ways to play football and not everybody can be Barcelona, so hopefully you can see why I’m not a fan of the base possession statistic. The way we currently collect key entries is an advancement of this, but I wanted to do some digging around to see if there are better methods available that provide similar information.
The Four Four Two Stats Zone site was my first port of call, and while they don’t have information on the variety of leagues we cover at StrataBet they do have the “big five”, as well as the MLS. I wanted to look at Tottenham versus Manchester City from the past round of Premier League games, which was Guardiola’s first loss as Man City manager.
Looking at the StrataData revealed the following:
So it looks like this was generally quite a tight game with few quality openings of note and lots of poor chances (shots with a conversion rate of around 2%). Then there are key entries, with a total of 46 between the two teams. The average of both teams combined for all games covered since the 1st of July is 46.72 so this is atypical of the type of game we would normally see them playing.
Then, looking at Stats Zone told an interesting story:
These images clearly show that Tottenham frequently used both wide areas to push into the final 18 yards, while Man City used the left and focused much more on playing directly into the box.
The Key Entry breakdown on StrataBet can give this information in addition to a breakdown by 15-minute segment of when these entries happened. The table below shows that things were generally quite even until Man City pushed more in the final 15 minutes as they chased the game:
This is all well and good for looking stylistically at what a game is like, but does it give any information that can’t be found elsewhere? My opinion is that it doesn’t.
Let me give another example of a game I personally collected the data on, which was Vålerenga versus Aalesund from the Norwegian Eliteserien.
This one had a lot of key entries, 59 in total, and almost flagged up as being two standard deviations away from the mean, which is usually when I look deeper at a game to see the reasons behind a high or low number of entries.
Although the game was fast paced, it was heavily tilted due to Aalesund scoring in the first minute of play. They then allowed Vålerenga to control the ball as they sat deep and played on the counter. The match summary and StrataData show this in great detail, with Vålerenga having more entries in each segment than Aalesund. In fact, in the final 15 minutes Aalesund didn’t get into the Vålerenga 18-yard zone at all:
Is this interesting? Yes, I’ll give you that. Is it useful? Hmm, perhaps to some extent yes. See, I’m not totally against key entries, I would just prefer them to reflect phases of the game and how teams build their attacks, rather than just counting instances when a team breaks through a certain line in possession of the ball. There are plenty of occasions when they go backwards after getting into the final 18 yards in a wide area and when the attack comes to nothing.
However, I will say that looking deeper into key entries can bring out some useful information, with one notable area being the use of dominant sides of play and style of entry. In a way this is similar to the pass maps detailed above, but with some more granular data we can look deeper into the information to see if a team favours attacking down a particular side over a series of games.
Taking the sole game in the example above we can see that Tottenham slightly favoured using the right side and using passes, while Manchester City used central entries much more and almost exclusively passed the ball rather than running with it. This does give us an extra layer of information, as the above images only show passes into the attacking third and so lose out the extra information we capture with runs, shots and turnovers.
The chart below shows these figures as percentages of each team’s total entries, so we can get a better idea of the ratio they used in each instance:
One thing that is slightly surprising is that neither team had a turnover in the 18-yard zone, given both teams employ high pressing I would have expected at least a couple of these. Shots from outside the 18-yard area that force corners were also minor factors, with both having just one each.
Overall this is a good indicator of style, although in one game it can be misconstrued. However, as the data builds up over a period of time we can see some patterns beginning to form.
I wouldn’t generally like to use a pie chart to visualise the data but in this case it gives a simple and clear pattern:
The chart above shows that Arsenal prefer to play very centrally, with almost half of their 203 entries being made into the box rather than down the channels. In fact, they have the lowest ratio of right-sided entries in the league, perhaps due to the lack of a natural wide player in that area. While it is by no means as simple as this, the chart does show that if teams played an extra man in the central area they would make it difficult for Arsenal to penetrate in the way they prefer.
In a similar fashion, the graphic below shows Chelsea’s favoured type of entry in comparison to Burnley:
The chart shows that Burnley use passing into the final 18 yards much more than Chelsea do. Antonio Conte’s ball carriers are clearly much better at making runs into the last area of the pitch than Burnley’s, and though the information here won’t tell you how to stop them, it does at least give you an indication of how they will try to play.
Generally we would expect key entries to be balanced in a game between two equal teams, and if one team dominates they would likely record a lot more key entries as the weaker team records fewer. This means we can usually take an average point and determine if a game had more or less key entries than we would typically expect.
Looking across all of the competitions we cover, there is actually very little difference between them. The lowest average is in the Austrian Bundesliga (43.1 per game), while the highest is in the Greek Super League (49.9 per game). This is a big surprise as the Super League is a competition notorious for a lack of goals (averaging just 2.35 per game so far this season). Again, this leads me to believe that there is quite a low correlation between goals and key entries, as otherwise there is no reason why Greece would be the highest.
One thing to note is that in all cases across all competitions the home team averages more key entries than the away team. This varies by amount but Spain, MLS and the Champions League all show significantly more entries for the home team on average, as shown in the graphic below:
Another interesting thing to look at here is whether the margin of victory usually correlates with an increase in key entries. To investigate this I split the wins by difference in goals in home and away games:
This data does throw up something rather surprising. While bigger home team wins usually correlate with a higher number of key entries for the winning team (though it is far from linear), the same cannot be said of large away wins. In fact in the recent French Ligue 1 game between Metz and Monaco, that Monaco won 7-0, Metz actually had three more key entries!
At the ends of the scale the sample sizes are much smaller and prone to big fluctuations, but it does seem like there remains an emphasis on home teams having more key entries. This could be because their remains a pressure to go forward no matter the score, while losses and draws are more tolerated away from home.
Game state is also likely to play a big part in these, with away teams more likely to drop deep and invite pressure when they lead, whereas home teams would be expected to continue to attack. Indeed, this is something that I might investigate in a future blog.
One final thing to look at here, though, is the impact of artificial surfaces. We have recorded 189 games since the 1st of July that have been played on an artificial pitch in the competitions we cover. These are most frequently found in Norway, Sweden and the USA. Analysts in these leagues usually talk about a faster pace of game due to the surface, but does this show true in the data?
In most cases the answer is “no”. The only league to have some correlation is Norway, which has an average of three entries per team more on artificial pitches than on grass. There are negligible differences in the Netherlands, Sweden and the USA, while the limited number of games means that it is likely the style of play of teams in those leagues that drive the entries rather than the surfaces. However, one interesting factor (admittedly over a small sample size) is in Scotland, where Kilmarnock and Hamilton both play on artificial pitches with the away sides having notably higher entries in games against them than in other cases. This is likely to be due to Kilmarnock and Hamilton being thought of as the weakest teams in the division, meaning that teams are more likely to attack them.
Looking back at Rich’s blog post and his pros for key entries, I do agree with several points he puts forward and going deeper into the data has convinced me that they can be useful, to some extent. I can see the value of Entries being combined with other pieces of information to build up a picture, and you certainly can get a feel of a team’s style by looking at this data.
My argument is this, though, if you can find this information elsewhere relatively easily, should we not just collect something else instead?
I genuinely believe that you could put our Chance and Corner Data up against anything in the market due to the depth of collection we have on both metrics. We can expand this to other areas, with one in particular that I would personally be keen to explore more revolving around the oft-overlooked defensive side of the game. Can we gather information telling us how a team defends and if will this be affected if a player is unavailable?
On the attacking side, would it not just be better to see how teams transition? Do they use a slow build up or a quick counter attack? Do they look for one particular individual to break the lines or do they rely on both full backs for penetration? This information could replace key entries and help to determine how reliant a team is on individuals or specific units. This could be very useful information to price teams more accurately if a certain player or group of players is out injured.
Similarly, would we be better gathering information on the oft-unrewarded actions of players who don’t touch the ball? Currently most data from other providers only looks at on-ball actions, so we could again have an edge in this area. Unfortunately due to time restrictions it is unlikely we’d be able to look at any of these other actions while we keep collecting key entries.
Ultimately what previously made me question the collection of key entries was its lack of value as the headline statistic we often present it as. However, when broken down further I can certainly see how key entries can be useful and appreciate that if we go even deeper we could match teams up to see potential areas of strengths and weakness. For example, it would be hugely useful to know that a team uses the right side more frequently than the centre or left, and if their opposition concede more of their chances from their left. These days any little edge like this could be all that is needed to get ahead in the betting market for certain matches in certain leagues, which is what it’s all about at the end of the day.
Dave Willoughby (@donceno)
As Rich mentioned, we would be keen to know your opinions on the value of key entries as a standalone metric and/or as something to be combined with other performance indicators. Do you agree more with me, or do you think Rich is right? Could we be utilising key entries in a better way? Should we get rid of them altogether and replace them with something else? Let me know on my Twitter account (posted above) or by contacting me at email@example.com.
Finally, the offer remains for any budding data scientists out there who are frustrated with a lack of football data access. We are looking to expand upon the partnerships we have with @AnalyticsFC and @zorba138 in order to expose more people to our unique StrataData offering. We offer full API access in return for regular written content using elements of the data itself, which can be posted on external websites or on any form of social media.
If you would be interested in working with us, then please get in touch via one of the suggested methods above.