Tennis Gold Medal Predictions
Our mission at Invrea is to make machine learning easy to use and accessible to everyone. Our Scenarios plugin can be used to make decisions about current events and every-day decisions. To demonstrate this, we replicated the Olympics tennis draw in a spreadsheet. Using the plugin, we predicted probabilities for each of the ATP tour players to win based on their ATP points and ranking. This posts details how we did it in Excel, a brief introduction into the Invrea Scenarios plugin, and who we think you should put your money on this year. You can download the alpha product of Scenarios for free, and follow along as the tournament continues.
Invrea Scenarios Demo for Rio Gold Medal Predictions
Our predictions for Gold medalist given all match results from round 1
Made using the Invrea Scenarios Excel Plugin
In the Olympics tennis Mens Tournament there are 64 players, each of whom will face-off one-on-one with another player. The winner will continue to the next round and the loser is eliminated. This continues until the Finals, where the winner is crowned with the Olympic Gold medal. The question is, who is more likely to win gold? It would be nice to get probabilities for each player in the draw. Tennis has a lot of uncertainty in it. Just because Murray is higher ranked than Nishikori doesn't guarantee that Murray will prevail. Like in any sport, there are often upsets and surpises around every corner.
Luckily, machine learning was built to handle this uncertainty. With it, you can encode some sort of randomness when deciding whether Murray or Nishikori will win such that Murray has a slightly larger probability since his track record is better but it's still fairly plausible that Nishikori could win. And... with Invrea Scenarios, you don't need a PhD in statistics or know how to code to be able to do this sort of prediction yourself. You just need Excel.
Here's a spreadsheet We've created to figure out the odds of each player winning in Rio. The workbook has two sheets: the first one lists each player, his ATP score, and the log of the score (log helps keep things nice for us). We added a little bit of randomness to each of the scores since numbers don't do a perfect job describing a player. For example, Del Potro only has 410 ATP points but that is mainly because of limited playing time; his record in the past makes us think he's a better player than his score indicates. This randomness helps to account for these small intricacies.
The second sheet contains the tournament draw. Column C actually is just the first round draw that you can find on the official Olympics webpage. But you can see that we've also filled out the rest of the rounds all the way to the victor. How did we do that? The trick is that we didn't really -- the values there are placeholders. If you refresh the spreadsheet (by pressing F9), you can see that all the players in rounds 2 and after change. In other words, the cells that determine who moves on to next round are random!
But, they are random based on a rule: imagine that you pick a random value close to Player A's ATP score, and pick another random value close to Player B's ATP score. Sometimes these values will be lower than the real score... sometimes they will be higher. Then, the rule is that whoever has the higher random value wins. This way, having a higher ATP score does mean that you have a higher chance of beating your opponent but you could just have a bad day / incur an injury and draw a low random value. From there things just repeat. Round 3 does the same equation for the players who moved past round 2. Etc... That's why if you refresh the spreadsheet a lot, different people end up winning the tournament.
What Invrea Scenarios allows you to do is define these random cells using functions like GAUSSIAN, and it allows you to generate thousands of scenarios automatically and display it for you. You can see what the distribution of each random cell looks like: who might win in round 1? round2? semifinal? finals? You can look at the odds for any cell you want.
The histogram you see above is the posterior probabilities for each player to win without knowing what happened in round 1. The higher the bar, the more likely it is for that player to win. Just eyeballing it, we see that Djokovic does have a pretty good chance. The only others who can really stop him are Murray, Nadal, and Nishikori (Federer is not participating). With this information, you can be more confident when you say you expect Djokovic to win and maybe it will help you make some decisions like who to bet on.
You can actually do even better. Once results do come out for a round, you can incorporate who won into the spreadsheet using an ACTUAL (special Invrea function). What that is essentially doing is showing me scenarios of who could win in the finals (or any round) given the fact that we already know what happened in previous rounds. For example, we extracted all the match results from round (from the official Olympics website). There were some unforeseen outcomes, like Djokovic losing to Del Potro.
After running the plugin with the new data incorporated, we can see that the distribution for who will win the finals has changed: there are fewer players who are in the running for the Gold medal, and Djokovic's probability is now 0 while Murray's, Nishikori's and Nadal's have increased to reflect their victories in round 1. In fact, the distribution of every random cell has changed since the information provided by round 1 results helps our plugin better estimate who is more likely to win in each matchup. Judging from these results, we're now putting our money on Murray or Nishikori. You can follow along as the tournament continues. Once round 2 results come out, add them to the spreadsheet using ACTUALS, and the predictions should get even better.
There's actually a lot of information you can learn about the matches in between round 1 and the finals. For example, who is likely to make it into bracket 4's quarterfinal? Or bracket 2's semifinal? Looking at the graphs below, we can get sense for the answers to each of these questions. (Del Potro proceeds to round 2 with 100% probability because we know for a fact that he won in round 1).
Invrea Scenarios helps a lot with with making these kind of predictions but it doesn't stop there. The plugin can model uncertainty and make predictions given our assumptions and new data for business decisions, insurance claims, payment plans, etc. If you can model your decision as relationships between cells in an Excel spreadsheet, then it's quite likely that Scenarios can help. The team at Invrea is dedicated to opening this kind of machine learning to every industry possible. If you would like more information, a more detailed demo, or some help setting up a worksheet of your own, we'd love to lend a hand. You can find us at this email.
Also! We are offering the alpha version of Invrea Scenarios for free. You can request a download link here. Next, we are looking to predict which country is going to bring home the most gold medals! Stay tuned.