Let's first start with an abstract example and then we can apply it to Excel. Generally speaking, the problem Invrea Scenarios helps solve is the following: We have a variable that is unknown. Let's call it x. We also have a variable that we do know. Let's call it y. The goal is to use y to help us learn as much as possible about x.

How do we do that? Well we have to provide two important pieces of information:
  • We need to provide some guess about what values x can possibly take.
  • We also need to provide some information about a relationship between y and x. More specifically, assuming we know x, what values could y possibly take?

Let's actually define these two pieces of information a little more rigorously. The first thing we need, or the guess about what x could be, is called a prior distribution for x, p(x). This prior distribution helps to model what we believe x could be. For example, if we really don't know anything about x except that it could be anywhere from 0 to 100, we would consider a uniform distribution as our prior. In other words, every single value between 0 and 100 is equally likely so every value has the same probability of being chosen. On the other hand, imagine we believe that x should be 100 but we aren't really sure and it's possible that it's 90 or 80 or 101 or 112. The point is we are pretty sure it's close to 100. Here, we might consider a normal (or gaussian) distribution as our prior for x where the peak of it is centered at 100. That way 100 has the highest probability of being chosen but as you move away from 100, the probability of that value drops. As a side note, we can see now that x is a random value, since it can take a lot of different values dependent on what distribution you pick.

(left) Uniform; (right) Normal / Gaussian

The second thing we need, or the values y could take knowing x, is known as the conditional distribution of y given x: p(y|f(x)). Usually, you just write p(y|x). This thing describes the distribution of y given we know the entire distribution of values that x could ever take.

As you can imagine, this assumes that y is dependent on x. If it wasn't, then the fact we know x doesn't matter! But assuming that there is a some relationship between y and x, knowing x could change the values that y could take. The point of the conditional distribution is to describe uncertainty in what y could be even though we know the distribution of x. In a perfect world, what would happen instead is that if we knew that y is precisely f(x), then we calculate the value of unknown variable x by just computing the inverse function of f(x). For example, if we knew that y = f(x) = x + 7 and that our y = 10, then x is derived from the equation 10 = x + 7, thus x = 3. However, in the real world, we never can be that sure of what f(x) really is.

There are always imprecisions to the measurements tools used and human error and a bunch of factors that prevent y from being a deterministic (fixed) value given x. Instead, even if we know the distribution of x, then y itself has a distribution because we aren't really sure. This conditional distribution can also be normal, uniform, poisson, etc. In our applications, we generally use a normal distribution since we can be fairly sure what y is (since as defined above, y is a known variable), just not 100%.

What we interested in knowing is actually p(x|y). This is called the posterior distribution of the unknown variable x given our known variable y. Luckily, there's a pretty famous formula to help us calculate this: it's called Bayes Rule.

FYI, the bottom thing, p(y) is called the marginal distribution. Usually, we don't care about the marginal distribution. We are only interested in the numerator. In other ways we just want p(x|y) to be proportional to p(x)p(y|x).

So what is so special about the posterior distribution? Well, it tells us more information about the unknown variable x taking into account all the information we do know, which is y. This distribution provides an educated and updated guess about what values x could be. In other words, think of this process as similar to the scientific method. You start out with a hypothesis about something, then you conduct experiments and get data, which in turn modifies your hypothesis into something more accurate. Similarly here, we start with a guess for the distribution of x, we see data, y, and we use it to update what we think the distribution of x really is.

Okay, let's look at an example in a spreadsheet. Imagine you have one cell describing your prior for a unknown variable x. The cell contains the formula GAUSSIAN(1, 5), describing a normal distribution centered at 1 with a standard deviation of 5.

Excel Spreadsheet

Now let's use Invrea Scenarios to generate 100,000 scenarios and see what this prior distribution looks like. We should expect a peak around 1.

Invrea Scenarios Excel Ribbon

Indeed, it looks as expected, and we have an expected value for x at 0.997. Now let's add our data, y. This is a known variable, and let's set it to something far away from our prior, like 9. What this is suggesting is that our prior was a bad guess. Let's see if Bayes' rule can help us fix it. In Invrea Scenarios, we use the following formula to define y: ACTUAL(9, "gaussian", x, std), where 9 is the known data point, and the rest of the parameters define the conditional distribution for y given x. Let's rerun 100,000 scenarios and see what the posterior distribution looks like after considering this new data point.

Excel Spreadsheet

Invrea Scenarios Excel Ribbon

As we expected the distribution shifted over closer to 9. The expected value for x is now 8.418. Notice that the new expected value is not exactly 9, which means that although the new data point shifted our beliefs about x a lot, our prior assumptions still have affect. That's important to remember! When using Invrea Scenarios, you should try to define as accurate priors as possible!

If you are well-versed in statistics, you know that for simple examples like this, you can calculate an equation for what the posterior is going to be using probability density functions for Gaussian distributions. But what happens with more complex functions? Like calculating internal rate of return or predicting competition results or insurance rates or market prices? With more complex problems, you run into a wall depending on pure mathematics or simpler simulation techniques. Invrea Scenarios is built on our own custom engine that runs state-of-the-art probabilistic programming techniques in the backend in order to generate these statistically-sound diagrams. You can use any Excel functions you want, even your own user-defined functions, and our statistical inference engine will still be compatible. To the best of our knowledge, our engine is one of the only commericially available products, if not the only, that allows you to provide ACTUALS in Excel that condition prior assumptions. With this ability, you can do some really powerful things. Check out our other blogs for examples. If you're curious, here's how to calculate the posterior distribution for our simple gaussian toy example by hand:

$$x \sim Normal(1, \sqrt{5})$$ $$p(x) = \frac{1}{\sqrt{10 \cdot \pi}} \ e^{ -\frac{(x - 1)^2}{10} }$$ $$\sigma = \sqrt{2}$$ $$y_i | x, \sigma \sim Normal(x, \sigma)$$ $$p(y_i | x, \sigma) = \frac{1}{\sqrt{4 \cdot \pi}} \ e^{ -\frac{(y_i - x)^2}{4} }$$ $$y_1 = 9.0$$ $$p(x | y_1, \sigma) \propto p(x) \cdot p(y_1 | x, \sigma) = \\ = \frac{1}{\sqrt{10 \cdot \pi}} \cdot \frac{1}{\sqrt{4 \cdot \pi}} \ e^{ -\frac{(x - 1)^2}{10} } \cdot e^{ -\frac{(9.0 - x)^2}{4} } \propto \\ \propto e^{ - \left( \frac{(x - 1)^2}{10} + \frac{(9.0 - x)^2}{4} \right) }$$

Want more tutorials? We prepared a second one and you can find the video below and the spreadsheet here. This one is about seven different scientists of different skill levels all trying to figure out the value of a compound X. But the catch is that because they all have different skill levels, some of their measurements are really bad, but some are really good. Check it out to see how you can use Invrea Scenarios to help quantify the uncertainty among the scientists!

Seven Scientists Demo

Invrea Scenarios helps a lot with making these kind of predictions but it doesn't stop there. The plugin can model uncertainty and make predictions given our assumptions and new data for business decisions, insurance claims, consulting cases, etc. If you can model your decision as relationships between cells in an Excel spreadsheet, then it's quite likely that Scenarios can help. The team at Invrea is dedicated to opening this kind of machine learning to every industry possible. If you would like more information, a more detailed demo, or some help setting up a worksheet of your own, we'd love to lend a hand. You can find us at this email.

The alpha version of Invrea Scenarios is free. You can request a download link here.