When it comes to the Olympics, there's a lot of uncertainty. Predicting the number of gold medals per country is actually a fairly difficult problem. There's some heuristics that serve as a guide, like past Olympic success, domestic wealth, population count, or the degree of sports culture. But even that seems over-simplified. There must be countless additional factors that contribute to the performance of each individual athlete, all of which sum together to affect a country's count of gold medals. At Invrea, we spent some time thinking about how to best model this seemingly bottomless source of uncertainty such that we can make better predictions.

What we came up with is to take advantage of how reliable market prices are. There are a lot of online sites that handle betting for individual Olympic events, of which BetFair is one of the largest. Their market prices to support or bet against a certain athlete/country are fairly robust and probably tuned by more parameters than I could possibly list. Better yet, these prices adjust to changes in the status quo, like athlete injuries or disqualifications, making these prices dependable and fair. So we scraped BetFair's prices for every athlete and every public event related to Olympics, and translated prices to probabilities. By that I mean, the more expensive the price, the more likely it is that athlete/country will win. Likewise, athletes that were never bet on, with reasonable doubt, won't win. There's a straightforward linear function to map every price to a probability between 0 and 1.

Spreadsheet of probabilities and distributions for gold medal counts

We then dumped all these probabilities into an Excel spreadsheet, retrieved the native country of every athlete, and for each event, created a distribution of which country is most likely to win using Invrea Scenarios. Check out the video to see how we did that using a CHOICE function. Finally, we kept track of the number of Gold medals won by each country at the top of the spreadsheet. Because there is uncertainty, everytime we recalculate the spreadsheet (by pressing F9), we essentially draw a new winner for every single event.

Using Invrea Scenarios, we can automatically generate thousands of different scenarios in which countries win different amounts of gold medals based on the prior probabilities we extracted. From these scenarios, we can a histogram of all the possible number of gold numbers any country can win.

Example Scenario

(left) without Australia's gold medal, (right) with Australia's gold medal

We can actually get even more accurate with our predictions. Once results come out, and we find out which countries won which events, we can incorporate that data into our spreadsheet. By doing that, Invrea Scenarios is able to learn which scenarios are more likely than others. Having learned that, the distribution of possible number of medals per country changes to reflect the fact that we now know the result of a couple of events. Therefore, as the games continue, if you follow with it and add that data into the spreadsheet using the Invrea plugin, you can get more and more accurate estimates of gold medal counts.

Let's take a look at the expected medal counts for the top 11 countries before and after Australia won the Men's 400m Freestyle. We can see that after adding the result, Australia's expected number of medals has increased notably while every other countries' expected count fluctuated to a lesser degree. Again, as more results are added to the spreadsheet, these counts will continue to change in response. That way, these predictions will improve over time.

Country Expected Number of Medals (Before) Expected Number of Medals (After)
United States 31.041 30.9
China 25.735 25.067
Great Britain 17.058 16.974
Russian Federation 12.144 12.277
Japan 10.957 10.93
Germany 12.592 12.556
Australia 14.834 15.907
Brazil 5.367 5.258
France 11.322 11.499
Italy 7.253 7.219
South Korea 8.506 8.554

We can also explore the distributions and expected values for every single cell (Olympic event) in the spreadsheet, giving us probabilities for which countries will win. Below, we see the predictions for Women's Tennis and Women's 50m Freestyle Swimming. Given that the current number 1 ranked female tennis player is from the United States, the probabilities appear believable.

Example Event Predictions

(left) Women's 50m Freestyle, (right) Women's Tennis