How to forecast the World Cup
Predicting the outcome of the World Cup is difficult. There aren’t that many international matches, and the squads are constantly changing anyway. At 21st Club, we have two different approaches for forecasting international competitions.
Our League of Nations model incorporates results from national team matches, including tournaments, qualifiers and friendlies, to estimate how good every national team in the world is. It makes it easy to compare across different confederations: Peru have never played Denmark before, but we have an idea of how good Peru are relative to Brazil, and Denmark relative to Portugal. If we can understand how good Brazil are relative to Portugal through competitive and friendly matches, we can infer the relative quality of Peru and Denmark.
Our Player Contribution model condenses many disparate information sources to give every player in the world attack and defense ratings, which can then be used to predict match outcomes. Factors that inform the model include: the strength of the clubs the player has played for, his utilisation, his goals and assists tallies, his position, and his effect on goals scored and goals conceded when he’s on the pitch. A team is, in the first instance, a collection of 11 players, and if we know how good they are we can make a rough prediction about how good the team is likely to be.
These two ways of forecasting the World Cup have their strengths and weaknesses. A prediction based solely on team performance will be slow to adapt to injuries to key players and squad changes. A player-based approach will miss the role of team cohesion and tactical organisation, which can make teams be better or worse than the sum of their parts. Therefore, we have combined the two approaches to compose our final predictions for the 2018 World Cup.
An ensemble of forecasts tends to outperform any individual forecast because combining different approaches prevents against overconfidence – the biggest danger when making predictions. To understand the danger of overconfidence, take the example of Peru. Our team-based ratings have Peru as the 10th-best team in the tournament. They did extremely well in the South American qualifiers – their goal difference of +1 was very close to Argentina’s (+3). However, their squad is made up of relative unknowns, at least when compared to some of the other favourites for the tournament. They have a single player from Europe’s big 5 leagues, Watford’s André Carrillo, who’s not even a starter. Predictions based just on team performance would be overconfident about Peru’s chances, whereas predictions based just on lineups would be overconfident that they don’t stand a chance. By averaging different methods, we reach a more nuanced conclusion.
This approach is applicable to predicting any sort of event. For instance, football clubs might want to know whether a given transfer will be a success or a failure. They could choose to rely on scouting reports, on the league the player is coming from, or on his technical data – however, they would be making a big mistake if their analysis were based on a single source, rather than a combination of many different sources of information. In football, as in life, everyone can benefit from an additional point of view.