Understanding new developments in shot-based metrics
Most regular followers of statistical science in sports know that when it comes to analytics in “team invasion sports” like hockey and football, shot data forms the basic foundation of much of the work being done today.
The reason is fairly straightforward; while goals are relatively unevenly distributed (stochastic), shots tend to follow a nice bell curve. With shots we have a lot of evenly-distributed data to work with. We know that, statistically speaking (at least in the Premier League) if you tend to take more shots than you allow, chances are you will finish higher up the table (see Corsi in hockey, Total Shots Ratio in football). We know that shot conversion rates are highly subject to luck, and can (crudely) measure that too, with PDO. We can also break down shots from location, type, direction etc and take basic conversion averages, which gives us the Expected Goal metric. As both Wayne Gretzky and Johann Cruyff are alleged to have said, “You can’t score if you don’t shoot.”
This is why, generally speaking, developments in shot-based metrics are very important. So it wasn’t a surprise when, earlier this month, noted baseball stats analyst Tom Tango sent ripples through the hockey analytics world when he proposed a weighted shot metric. Essentially, Tango assigned values or weights to specific types of shots. settling on 1.0 for a goal and 0.2 for a non goal. Tango demonstrated his shot metric had a better correlation to goal difference than the widely favoured Corsi metric. He called it the Tango. You can read all about it here.
If you’re already wondering, “Are weighted shots more effective in predicting outcomes in football?”, then the answer is “possibly.” Noted analyst Mark Taylor immediately applied Tango to football and noted a higher correlation than unweighted shots ratios. This is a very interesting development.
While some welcomed the statistic, it was also met with some skepticism in hockey circles. The reason is because Tango appears to violate the Golden Rule of shot data: introducing elements of shot quality to measurements of shot quantity. If you give greater weight to shots that are goals, you’re implying that shot-to-goal conversion is a repeatable effect, rather than something that is largely the result of random variation. As Nick Mercadante wrote in his Blueshirt Banter post in response to Tango:
When analyzing the data, shot attempt differential [Corsi] simply evens out the craziness of sh% in smaller sample sizes a lot quicker and handles the unpredictable (often lucky/random) nature of scoring chances and goal-scoring plays better. It thus makes sense to look at total shot differential as opposed to goal differential. Tom Tango and several of his analytically inclined readers who commented on his article may not have been aware of this of course. But the presumptuous and dismissive nature of his article certainly rubbed some folks, who had spent time and effort to uncover that, the wrong way.
Moreover, Mercadante makes the point that weighting for goals, which as we know are comparatively rare, is mostly pointless as they are “washed out” by the overall shot data. There are other problems too. For one, it doesn’t parse out score effects (how team’s behave when in the lead or trailing), or the effect of goalkeepers or shooters.
Yet when it comes to Tango and weighting shots, I think this where we get into a big faultline between the soccer and hockey analytics worlds.
For one, in hockey, it’s been established for a little while now that knowing the quality of shots doesn’t really tell you much more than knowing the quantity of shots. Analyst Ben Pugsley pointed out to me that back in 2012, noted hockey analyst Eric T ran “scoring chances” differentials–they’re based on average conversion rate by shot location, similar to ExpG–and shot differentials. Essentially they are roughly correlative, meaning that the difference in quality of shots taken/conceded is pretty much the same as the difference in total shots taken and conceded. This isn’t that surprising to me though; most hockey teams, even the bad ones, will only shoot if they think they have a good chance at scoring.
I’m not certain that’s the same in football, necessarily, and that’s why I wonder whether Tango would perform any better at predicting goal differential in football than Expected Goals (ExpGs), or better than James Grayson’s Team Rating method. After all, ExpGs, which calculates the average likelihood a certain type of shot will be a goal, is designed to take into account that not all shots are created equal, and correlates fairly well with goal difference on its own.
In the end, shots and goals are, to me, telling symptoms of optimal attacking situations, the very stuff of football, and something more complex than merely taking shots. In fact, sometimes these situations aren’t always fully captured by shots alone. I think Max Odemheimer over at StatsBomb made this case well recently with regard to Manchester United’s now famous “outlier” season in 2012-13, their last under Sir Alex Ferguson. Ferguson’s team was noted for scoring far more goals than shot-based models like TSR and ExpG predicted. Odemheimer makes the case that United that year crossed from more deadly positions, but that danger didn’t register in traditional shot-based models like ExpG because many of those crosses simply didn’t result in shots.
This to me is something far more inherent to the nature of football than to hockey, and it makes perfect intuitive sense. Think of how many times a team rolls a dangerous ball across the face of goal with an attacking player nearly missing the final, deadly ball. Those are dangerous situations, but they are not recorded as shots. But if you recreate those moments more often than your opponents, even with the lower rate of shots, chances are you score more goals. What matters is the volume, quality, and ultimate repeatability of attacking situations, both in creation and prevention.
I think we need to keep this at the forefront of our minds when talking about football analytics, in order to see what these regressions mean clearly. Often we get mesmerized by the data sets, even something as simple as shots and goals. You can get hung up on certain things (“shot percentages are completely random!”) which can blind you to the obvious in football: that the rarity of goals does not make them random, and that not all shots are created equal, and that dangerous attacking movements need not always result in a shot.