Sports Data Mining
Sports Data Mining has experienced rapid growth in recent years. Beginning with fantasy league players and sporting enthusiasts seeking an edge in predictions, tools and techniques began to be developed to better measure both player and team performance. These new methods of performance measurement are starting to get the attention of major sports franchises including baseball’s Boston Red Sox and Oakland Athletics as well as soccer’s AC Milan. Before the advent of data mining, sports organizations relied almost exclusively on human expertise. It was believed that domain experts (coaches, managers and scouts) could effectively convert their collected data into usable knowledge. As the different types of data collected grew in scope, these organizations sought to find more practical methods to make sense of what they had. This led first to the addition of in-house statisticians to create better measures of performance and better decision-making criteria. The second step was to find more practical methods to extract valuable knowledge using data mining techniques.
Sports organizations are sitting on a wealth of data and need ways to harness it. Our Sports Data Mining research highlights current measurement inadequacies and showcases data mining, web mining, web mashups, and cloud computing techniques to make better usage of collected sports data. Properly leveraging Sports Data Mining techniques can result in better team performance by matching players to certain situations, identifying individual player contribution, evaluating the tendencies of opposition, and exploiting any weaknesses.
Incredible amounts of data exist across all domains of sports. This data can come in the form of individual player performance, coaching or managerial decisions, game-based events and/or how well the team functions together. The task is not how to collect the data, but what data should be collected and how to make the best use of it. By finding the right ways to make sense of data and turning it into actionable knowledge, sports organizations have the potential to secure a competitive advantage over their peers. This knowledge-seeking approach can be applied throughout the entire organization. From players improving their game-time performance using video analysis techniques, to scouts using statistical analysis and projection techniques to identify what talent will provide the biggest impact, data mining is quickly becoming an integral part of the sports decision making landscape where managers and coaches using machine learning and simulation techniques can find optimal strategies for an entire upcoming season.
The first part of the problem is to identify the metrics of performance. Many existing sports metrics can be easily misused or worse, do not measure performance in the context of scoring more points than their opponents, which is the ultimate goal of any sports organization. There are often problems with using these statistics as performance measures and it can be important for coaches, players, managers, and owners to consider newer sport-specific statistics that take into account point scoring behavior in performance assessment.
The second part of the problem is to find interesting patterns within the data. These patterns could include the trends and tendencies of opposing players/teams, determine the onset of injury through the monitoring of workout performances or make sport-related predictions based on historical data. Knowing different machine learning and simulation techniques that can be applied will be an important part of the solution.
Professional sports organizations can be multi-million dollar enterprises with millions of dollars spent on a single decision. With this amount of capital at stake, just one bad or misguided decision has the potential of setting an organization back by several years. With such a large array of risk and a critical need to make good decisions, the sports industry is an attractive environment for data mining applications.
- R. Schumaker, O. Solieman, and H. Chen, Sports Data Mining, Springer, 2010.
- R. Schumaker, O. Solieman, and H. Chen, “Sports Knowledge Management and Data Mining,” Annual Review of Information Science and Technology (ARIST), Volume 44, 2009.
- H. Chen, X. Li, M. Chau, Y. Ho, and C. Tseng “Using Open Web APIs in Teaching Web Mining,” IEEE Transactions on Education, Volume 52, Number 4, Pages 482-490, 2009.
- Solieman, O. 2006. Data Mining in Sports: A Research Overview [Masters thesis]. Dept. of Management Information Systems. The University of Arizona. Tucson.
- H. Chen, P. Buntin, L. She, S. Sutjahjo, C. Sommer, and D. Neely, “Expert Prediction, Symbolic Learning, and Neural Networks: An Experiment on Greyhound Racing,” IEEE Expert , Volume 9, Number 6, Pages 21-27, December 1994.
The following selected projects were implemented in Dr. Chen's Web Computing and Mining class. Detailed project reports, slides, and screen shots are included.
100Yards.com is a one stop information site for the football enthusiasts to get all information related to this sport-game, information, statistics, player and team details, videos and photos of the players, teams and stadiums etc. It also provides RSS feeds giving all the latest and relevant news for football enthusiasts. Our website can also be helpful for statisticians and game followers. (Chandrasekhar Manda , Rahul Chahar, Karishma Khalsa). [Slides] [Final Report]
BetSmart is a football prediction and betting engine based on Google App Engine, BetExplorer, Italia, Google Translate, Youtube, flickr, Weka, and Neuoph. (Ximing Yu, Ying Jin, Cai Chen). [Slides] [Final Report]
CricWeb.com is a one stop site for all cricket enthusiasts. Cricweb targets all cricket lovers who browse the internet for cricket related information and interact with other people online. It provides rich and variety of multimedia content; combination which is not offered by any of the competitors. The system includes: Cricket News from Major web sites (RSS); Fixtures & Live Scores (RSS); Player Profiles; Google Map & Weather Information; Cricket Highlights, Videos, Pictures; Blogs; CricWeb Prediction System; Players Clustering; Cricket Shopping (Sriram Srinivasan, Kalpesh Jain, Shankar Venkataraman). [Slides] [Final Report]
MLB Predictor is a smart web portal for baseball fans to have a one stop information website regarding the baseball games, player information and game prediction. MLB Predictor includes the following web social media: Amazon web services, YouTube, Flickr Photo search API, Kayak, Stub Hub, Weather bug, Facebook, Google Maps, Google charts API, RSS feeds fetching News from Yahoo and other sports related websites. (Yang Kenneth Jiang, Priya Matai, Kunal Shah, Michael Zolli). [Slides] [Final Report]
XTREME F1 is a one stop portal for all F1 Fans which includes: F1 News, Images, Trivia, Games, Videos; Player, Team and Track Profiles; Upcoming Season Races, Discussion Forums; One stop shop for F1 associated products; Prediction System: Predict Race Outcome. Parag Bhalerao, Deepali Muddebihal, Vinay Kabde, Shruti Khanna). [Slides] [Final Report]