EECS349

Information

Team Members: Matthew Duque and Tavasya Agarwal
Contact: [email protected]
Course: EECS349, Northwestern University

RandomForest Decision Tree implementation with less attributes (left) and more attributes (right). As can be seen, adding some attributes contributed to achieving a more accurate model.

Abstract

The Machine Learning task our team chose to undertake is to find out what exactly makes a soccer player’s value go up in the future. We aim to create a model that successfully predicts, based on given soccer statistics, whether a player’s value is likely to increase in the future or not. This model is particularly relevant in today’s sporting climate – today, because of the massive amounts of money being pumped into sports (and specifically the sport of soccer), it is extremely hard for teams with less resources to compete with their richer counterparts. Having a way of identifying players that are currently undervalued by the market, therefore, would be an extremely useful tool with which teams could spend less, but buy more quality.

The first approach we took was to implement decision trees. Given a multitude of statistics, we attempted to see whether we could accurately predict whether or not a player’s value would increase or decrease based on their performance in a given season. We also attempted to use a linear regressor and neural networks in order to improve the accuracy of our predictions (using 10-fold cross validation). We found that minutes played was the strongest indicator of a player's value increasing in the future -- which intuitively makes sense!

Unsurprisingly (given that a number of teams and people have attempted to tackle this problem), it proved very difficult to achieve a high success rate in predicting which players’ prices would rise and whose wouldn’t. However, we did manage to achieve an accuracy greater than ZeroR, and found that using decision tree implementations and a linear regressor worked best on our data. We made a lot of progress in changing with features of the data we included, however, and we suspect that it is in this idea that the most success can be found in the future.

Moneyball, Machine Learning, and Soccer

Information

Abstract