Last week there was a lot of Tweeting from members of the analytical community in relation to the mainstream media’s use of clichés when it comes to goalscoring chances. It started with Saturday’s Match of the Day broadcast, where Jermaine Jenas and Alan Shearer bemoaned Delle Alli’s finishing ability, using lines like “he’s got to score that” and “he needs to work on his finishing”.
This prompted an interesting back-and-forth started by Ted Knutson (@Mixedknuts) about what changes could be made to the format to make it more useful for the viewing audience, without going the whole way and employing expected goals graphics. Before going any further here, you should read his post on StatsBomb if you haven’t already, as it really is very good and I agree with a lot of what he says.
Of course, I do have some sympathy for the pundits in this instance, which isn’t often the case, as many are just not worth the studio space for the level of insight that they provide. It often just comes down to semantics when they are saying a player “should” score what appears to be a presentable chance. Fundamentally they’re probably right, because they SHOULD, but by looking at the data we know that they often don’t.
It all boils down to the fact that every individual scoring chance has a different conversion rate, and these are what many expected goals models are built on.
At Stratagem we capture data on every single chance across every single game of the 21 club competitions that we currently cover.
But what makes what we do really useful and arguably better than everybody else?
I believe that it’s because context is key in every chance we rate.
In Ted’s post he provides data to back up why Theo Walcott’s chance in the 56th minute against Swansea City should have been scored around 20% of the time, but this is where I start to disagree. He himself acknowledges that the data is limited, using information on kicked shots not assisted by crosses to back this up. Quite frankly this is great stuff and once again I agree that this is much better than what was available even just two years ago. However, the lack of tracking information for every player on the pitch means that chances like this one, where Walcott was completely unmarked seven yards out, are thrown in with others where the striker is under severe pressure or has several defenders crowding him, trying to block the shot.
At Stratagem we use data combined with context to rate every chance. Attackers taking shots under heavy pressure, or shooting after a pass or cross was played slightly behind them, will see the chance rating decreased, while the opposite is true if the player has the time to pick his spot unhindered by factors like these.
In this particular instance Walcott is under no defensive pressure, the ball drops perfectly for him and he even has a teammate almost directly in front of the keeper, clearly hampering Fabianski’s line of sight to the ball. Ultimately, the chance is missed due to poor shot quality, which is something we capture as a separate measure to our initial chance rating to better explain the outcome:
While I do see Ted’s points and believe he has broken the situation down very well, by taking all the factors into account we would rate this as a Great Chance. This means that it is a chance with a conversion rate of around 40%, which is around the mark that Ted started from.
Wait a minute… “Great Chance”… What does that mean?
At the beginning of the 2016/17 season we decided to move to using six chance categories instead of three to give us better resolution during data collection. These are classified as follows:
This is a situation that a player would always be expected to score from, for example a relatively uncontested shot or a free header into an open goal.
This is a situation that a player would generally be expected to score from, for example a one-on-one with the goalkeeper or a free header that needs to beat the goalkeeper, attempted from 12 yards or less.
Very Good Chance
This is a situation that a player could score from, for example a one-on-one with the goalkeeper from a tight angle or a contested header that needs to beat the goalkeeper, attempted from 12 yards or less.
This is a situation that a player could score from but would not necessarily be expected to, for example a shot taken from a tight angle inside of the box or a heavily contested header that needs to beat the goalkeeper, attempted from 12 yards or less.
Fairly Good Chance
This is a situation that a player would not generally be expected to score from but could, for example a speculative long-range shot with a clear sight of goal or a contested header that needs to beat the goalkeeper, attempted from more than 12 yards.
This is a situation that a player would not generally be expected to score from, for example a speculative long-range shot with a diminished sight of goal or a heavily contested header that needs to beat the goalkeeper, attempted from more than 12 yards.
While the names are arbitrary, each chance has a distinct conversion rate that remains linear across all of the competitions we cover. The names of each chance are distinctive and allow our analysts to easily discuss them without directly referring to their conversion rates.
Every goal scored in these competitions undergoes quality control by at least two other members of staff, while we constantly evaluate all chances to ensure that our analysts are consistent in how they see situations. We always ask the guys to be subjective when judging chances, but we always tell them we want them to be the same kind of subjective.
In addition to this we regularly provide clips and ask our analysts to give their views on a multitude of differing situations, before offering extensive feedback so that they can apply it to chances that appear in future games they are assigned to watch. Naturally, we appreciate that we live or die by the quality of our analysts and our subsequent checking measures, but we wouldn’t have it any other way.
We now have around 15,000 games in our database with this sort of data attached, and so have the ability to drill down into different on-pitch scenarios to ensure that we keep improving our methods. For example, using Sagar Jilka’s recent data on free kicks we have upgraded the chance rating on direct free kicks in those locations that Sagar proved to be more likely to result in goals.
I would be remiss if I did not mention that all of this data is available by signing up to StrataBet, and that you can claim a free 15-day trial to gain access. However, the offer still stands for any budding data scientists out there who are frustrated with a lack of football data access. We are looking to expand upon the partnerships we have with @AnalyticsFC and @zorba138 in order to expose more people to our unique offering. We can give full API access in return for regular written content using elements of the data itself, which can be posted on external websites or on any form of social media.
If you would be interested in working with us, then please get in touch with me on Twitter (username below).
In the meantime, thanks for reading!
Dave Willoughby (@donceno)