Trouble ahead in football analytics?

The public football analytics scene has enjoyed a fruitful four or so years now. The work of a few talented analysts is garnering some mainstream media attention, and a few are being hired by clubs either as consultants or full-time staff.

As with any cultural “scene” however, the analytics movement will be prone to a few existential challenges over time, some of which can at best slow progress for a few months, and at worst derail work in the field for years. Whilst I think the work of public statistical analysis in football is as good as ever, it’s best to identify key problems it will face in the near future.

1. Stratification

A few years ago when there was only a handful of independent stats analysts self-publishing serious work, the football analytics scene was manageable. All one needed to participate was a Twitter account and some time to read. While this often came at the expense of excluding some pioneering work locked away in academic journals, the independent analysts posts came at a rate that allowed for discussion, feedback, thoughtful responses. A lot of progress was made, especially in developing various Expected Goals models.

Now, as some analysts either get hired or move on to work for publications which may not prefer to publish linear regression graphs and standard deviations—and as several newer voices enter the fray—the old model doesn’t work as well. While there are a few notable analytics “hub” sites like Statsbomb, these alone can’t provide the same sense of community as, say, a well-curated Twitter feed. Something new may have to emerge, either a forum like a subreddit, a regular publication, or a permanent, more off-the-grid hangout on a site like Slack. The issue here will be gaining a critical mass of voices, and avoiding duplication/competition. It will also require time and dedication for little financial return.

2. Loss of key influencers

This is obviously closely related to number 1, but deserves a little unpacking. When a once influential analyst goes silent for whatever reason—busy day job, hired by club etc.—their stellar work isn’t the only thing to disappear. Some of the best stats writers act as catalysts that move the entire field forward. These are the people who will politely but firmly argue their position with ready-at-hand tables and graphs to prove the point. They can often spark a community wide discussion, and gradually the whole edifice makes a leap forward.

This kind of change is often cyclical; losing even one influencer can slow progress for a while. Losing three or four in the span of several months—not a far-fetched scenario—can be devastating.

3. Resting on a few established metrics

With the success of both Total Shots Ratio (TSR) and Expected Goals (ExpG) in their various iterations, there is a sense that a lot of the work in the analytics field right now is a) using either to prove a point about a current team or player, or b) adding as many variables (game states, shot type etc.) to either to see if correlation to goal difference or points slightly improves. Many might be inclined to beg off the latter pursuit entirely in the absence of sophisticated X,Y data, or because they’ve reached the limits of reasonable standard deviation.

I think part of the problem is a lack of discussion of the kind of problems stats analysis may be best suited to address in football. It’s as if analysts collectively sought to answer a single question, one perhaps most pressing for bettors, ie: What is the best way to predict how well a team will do over the course of a league season?

Some enterprising minds might do well to spend some time identifying common footballing questions for which data analysis might provide a partial answer, like how to judge one league against another? Or how much of a role can individual coaching play in improving performance? Sometimes this requires a bit of ingenuity, or it may require more data than what is currently available, which brings us to…

4. Data chill

With so much public interest in sports statistics, more and more sports data companies will emerge and competition will increase. Additionally, a push to legalize sports gambling in the United States and Canada could open up lucrative new markets, and there may be pressure to increase the price of even basic game stats like shots, passes etc. There is no guarantee the current popular sources for much of the data that fuels basic independent analysis will remain available in the future, and friendly requests to use cited numbers in the interests of research may not always be granted.

Unless some kind soul pulls a Charles Reep and exhaustively records their own game statistics (which itself may not be on legal sure-footing), then a future push by providers to make public data private will almost certainly severely limit, if not destroy, the online football analytics community.  

Overcoming these challenges will take the conscious effort of a community that has, until now, been fairly hands off. But it is essential to ensure the work of the past four years becomes a foundation, not an aberration.

