Here is what we learned from the Artificial Intelligence's failure to predict the results of the FIFA World Cup 2018 Russia.
The FIFA World Cup 2018 (Russia) ended on Sunday, July 15th, with France as the champion, followed by Croatia and Belgium. Just like the previous World Cup 2014, many researchers tried to predict the outcome of the tournaments in Russia in advance. This year, the FIFA World Cup 2018 was no exception, and researchers and scientists tried to exploit Artificial Intelligence (AI) and statistics to predict the outcomes of all 64 matches in FIFA World Cup.
Artificial Intelligence (AI) has made a lot of noise recently, and it is known as the future technology. Nowadays, AI is becoming a part of every large and medium business. But how reliable would it be? In this article, I talk about the performance of AI in predicting the results of the World Cup 2018 as a sample use-case. Either you are an expert in AI or not, I try to keep this article as simple and understandable as possible.
There are different approaches to predict the results of the FIFA World Cup. One approach is to simulate every single match in a paired comparison in terms of the team's capabilities and the winning odds. Zeileis, Leitner, and Hornik (2018) used the same technique, and they predicted that Brazil would win the FIFA World Cup 2018 with a probability of 16.6%, and it is followed by Germany (15.8%) and Spain (12.5%) .
Swiss Bank UBS also predicted the same three teams as the top 3 teams but in a different order. They predicted Germany (24.0%) as the champion, followed by Brazil (19.80%) and Spain (16.1%). Their generated model was based on four factors: 1) the Elo rating; 2) the teams' performances in the qualifications preceding the World Cup; 3) the teams' success in previous World Cup tournaments, and 4) a home advantage. The model was calibrated by 10,000 Monte Carlo simulations to determine winning probabilities and the results of the last five tournaments .
On June 8, 2018, four researchers (A. Groll et al.) from Technical University of Dortmund (Germany), Ghent University (Belgium), and Technical University of Munich (Germany) published a research paper on arXiv predicting the results of the FIFA World Cup 2018 using a well-known algorithm of Artificial Intelligence, Random Forest, and Poisson ranking algorithm . This paper was published online days before the opening game of the world cup between Russia and Saudi Arabia, on June 14. They used a dataset covering all matches of the last four FIFA World Cups (2002‚??2014). They predicted Spain as the champion, followed by Germany and Brazil as runner-ups.
These three mentioned research came up with the same top 3 teams in Spain, Germany, and Brazil, in different orders. They used three different methods, data and data features, but they came with almost a similar result. Now, the world cup is over, and we can see that all those models failed to predict the world cup results correctly, and none of the predictions happened.
Among these research, the methodology of A. Groll et al. is my favorite in this area. First of all, they used a good data source. Secondly, they considered many features and parameters for training. Thirdly, they employed the algorithm of Random Forest. In the rest of this article, I discuss its data features, error, and the reason for its failure in this area.
A. Groll et al. considered various features related to the team itself, such as 1) Economic factors (GDP per Capita, Population); 2) Sportive factors (ODDSET probability, FIFA Ranking); 3) Home advantage (Host, Continent, Confederation); 4) Team's structure factors (Maximum number of teammates for each squad, Average Age, Number of Champions League players, Number of Legionnaires); 5) Team's coach factors (Age, Duration of tenure, Nationality). In total, they had 16 features for each team and each world cup.
As I mentioned earlier, they used "Random Forest" that is one of the well-known algorithms in Machine Learning. This algorithm works based on "Decision Tree", and it has shown high performance in data classification in many cases. They also used Poisson models to rank the teams based on their current abilities.
After running the tournaments' simulations for 100,000 times, Spain was predicted to be the champion reaching final with 28.9% of chance, followed by Germany (26.3%), and Brazil (21.9%).
As we observed in the FIFA World Cup 2018, none of the predicted top 2 teams could reach to quarter-finals, let alone the final games (Brazil reached quarter-finals). Based on the actual results of the world cup, and the predictions, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) of the model is calculated as below:
These two metrics show the error of the model, and how far it could predict the team rankings accurately in overall. Both RMSE and MAE values are so high, which make the model unreliable, and despite using 16 features and large data set (past four world cups), AI machine learning (Random Forest in particular) still failed to predict the results reliably. In this World Cup, Russia, Japan, and Iran played significantly better than what was predicted, and on the other hand, Germany failed to qualify.
Why did AI fail?
In Machine Learning (- Supervised learning in this case), it is very important to have proper data for training and modeling. But in this case, despite having proper data (16 features, cleaned), relatively large data (past four world cups), and good algorithms with the right parameters, the trained model failed terribly. The reason for this failure relies on the nature of what we are predicting.
FIFA World Cup like any other human-based incidents is dependent on too many factors (not only 16) before and during the match (for minimum 90 minutes), which are known as confounding variables. In order to predict the results correctly, every single minute of each match should be simulated. The result of each state (every minute/second) of the match depends on the preceding states. This is also known as Markov Chain Process. An incorrectly simulated state can easily result in unreliable outcomes for the proceeding states of the game.
Besides the internal factors, the results of a football match may also be significantly influenced by some external factors as well, such as an unfair referee, weather, political situation, even personal problems of players, etc. These important features are usually very difficult to be measured and collected. In addition, there is always some chance of exploration, and uncertainty, for instance having a critical mistake or scoring an own goal, which is not easily predictable.
In a nutshell, stochastic and dynamic environments such as FIFA World Cup or human activities are those areas that the today's technology of AI cannot perform very well. This is a very good example to note that we have to be very careful about the applicability of AI in the similar dynamic fields. Also, by having a very complex data structure, it might be very difficult to audit the trained models for any potential bias. The existence of bias in AI can simply lead to discriminative decisions against a particular group. The implementation of such systems responsible as the sole decision maker may cause huge problems for both individuals and companies. Governments and companies are recommended to use AI for stochastic and dynamic environments only as a supplementary decision-making platform.
2018‚??09, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universit√§t Innsbruck.