Get the latest news and insights from 21st Club

Please read our Privacy Policy carefully to understand how we use your data. You can opt-out of our marketing communication at any time.

In defence of one of the most controversial stats in football analytics

 

Screen Shot 2016-02-22 at 14.53.53

The above is the start of a brief exchange between two respected football analysts (which, full disclosure, I took a brief part in) over the popular ice hockey stat PDO, which in recent years has come into use in some football analytics circles.

Before I get in that more, a quick PDO primer. The stat, which is isn’t an acronym but the online moniker of its inventor, is simply a team’s save percentage plus their shot percentage, or more properly 10x(Sh% + Sv%).  

Why is it useful? Simply because the statistic tends to regress quickly to the mean. A high or low club PDO may help explain good or bad performances in the short term, but will regress toward the mean over time. To help understand why that’s important in football statistics, we can use a very simple example.

Consider Hull City. They’ve played nine games of the 38 game Premier League season. In that time, the team has been consistently outshot, with the league’s lowest Total Shots Ratio, and have also posted the league’s lowest Expected Goal Ratio (they concede far more dangerous chances than they create). We know, within reason, these factors tends to correlate well with goal differential and total points. On these metrics alone, Hull fans should have some cause for concern.

You wouldn’t know that based on the PL table alone, however. Hull has earned a decent 2-5-2 record to start the season, and are currently in 10th place. Hull City fans should have reason to feel confident heading into November-December, right?

Enter PDO. A quick glance at Hull’s stats shows a 71% save percentage and a 41% shot on target conversion. Add them together (and x10 if you’re a purist) and you get 1112. In terms of PDO, where the rough mean is around 980-1020, that’s fairly high. Chances are that number will come down over time, meaning Hull City will likely post results more in line with their underlying key performance indicators as the season progresses.

This is of course a convenient, clear example, but it gets to the core of how PDO tends to be used. So why would such an ostensibly effective statistic be so controversial?

For one, as Ted Knutson eloquently notes above, it “smooshes” two different statistics together. That would be fine if they regressed at an equal rate, except they don’t. James Grayson’s 2011 post notes that save percentage regresses at a higher rate than shot percentages. There may be a number of reasons for that, including the influence of shot quality (ie, taking the same number of shots from more dangerous positions). And, as Dan Altman wrote back in 2013, some strikers do in fact convert a higher percentage of shots as a rule. Even so, it doesn’t change PDO’s overall rate of regression.

However, there’s another key aspect of PDO which I’ve been purposely dancing over, which involves the ‘L’ word. When we say something regresses quickly to the mean, we’re essentially saying it varies a lot from game to game. Some days it’s high, some days it’s low. The temptation therefore is to compare team PDO to a coin toss. If you get five heads in a row on your first five flips, you might think the universe is trying to tell you something. Except after a hundred or a thousand more tosses, you will inevitably move closer to 50% tails, 50% heads. Your “lucky” streak in the early going will eventually get balanced out.

In other words, the sequence of coin flips is entirely random, and your chances of getting several heads or tails in a row comes down entirely to–here it comes–luck.

So are shot and save percentages similar to random coin flips?

We already know the answer is: not exactly. Skill does play a small yet reliable role in both. Yet Dan Altman, in the same post linked above, goes further. He lays out a set of plausible explanations for why both shot and save percentage would regress each year. Critically, they are anything but random.

Perhaps, Altman surmises, save percentage is affected by shot percentage, so that as one goes up, say with more players pushing forward, the other comes down as more space is left at the back, a bit like borrowing from Peter to pay Paul. Or perhaps it involves the the rarity of elite footballers, who will post a very high shot percentage for a single season before either losing form or moving on.

Now, as far as I know, these are untested theories. However, Altman makes a key point:

Imperfect information is likely a critical component of luck’s role in PDO. Yet for precisely this reason, the rise of soccer analytics may reduce PDO’s regression to the mean. The more data we have and the savvier we are in using them, the more predictable players’ performances will be. Every team will be able to identify the top strikers and goalkeepers early on in their careers. The only question will be who can pay them the most.

I will confess, I have yet to see any PDO regressions from recent seasons which would suggest the metric is any more repeatable than when Grayson first did his work three years ago. But the point remains: regression to the mean may not always indicate “blind, random luck”, and shot and save percentages may show more signal overall (after accounting for differences in team spending) as analysts learn more about what affects them from game to game. Certainly applying Contextual Intelligence will be instrumental in that regard.

The question I will ask here is: for the time being, does it matter? Should we discard PDO as a statistic based on philosophical concerns over whether it is well and truly “random noise” without a hidden cause?

I think it was Nassim Nicholas Taleb who dismissed concerns over his definition of the word ‘random’ in Black Swan, writing to the effect that something can still be effectively random even if we haven’t yet determined a cause for the variation. Right now, a good part of shot and save percentages are effectively random.

What matters is whether PDO works for whatever you’re trying to use it for. Grayson for example has included it in his Team Rating model. Alternatively, you might use PDO in an article as a means to warn readers delighted over the good form of Hull City FC that the party will one day come to an end.  If the rate of regression remains stable, PDO simply what it says on the tin, nothing more. I wouldn’t use it yet to make general claims about the random nature of shots and saves in football, or to argue that Hull WILL with 100% certainty regress directly toward the mean, as if all teams finish the season smack at 1000. Sports statistics is about probabilities and tendencies, rather than flag-in-the-sand predictions of impending doom.

Does the use of PDO prevent further attempts to look into factors that may influence team shot and save percentages? No. Are there more effective, elegant guages of luck in team performance, like comparing ExpGs to Goals, or shot on target percentage for/against differentials? Maybe. Has PDO been misused by people who equate a tendency to regress toward the mean with 100% regression to the mean? Perhaps, but PDO can’t be faulted for that.

PDO is a dumb number. It’s one tool of many. It may be replaced by better, publicly available and just as easy-to-use tools. But in the same way you can do horrific things with a nail gun (happy Halloween!), so too can you wreak havoc with a simplistic, one-size-fits-all approach to PDO.

About Richard Whittall

Richard Whittall has created 28 entries.

Changing The Conversation Volume III – Available to order now

Order from Amazon