Let's first start with an abstract example and then we can apply it to Excel. Generally speaking, the problem Invrea Scenarios helps solve is the following: We have a variable that is unknown. Let's call it x. We also have a variable that we do know. Let's call it y. The goal is to use y to help us learn as much as possible about x.
How do we do that? Well we have to provide two important pieces of information:
-
We need to provide some guess about what values x can possibly take.
-
We also need to provide some information about a relationship between y and x. More specifically, assuming we know x, what values could y possibly take?
Let's actually define these two pieces of information a little more rigorously. The first thing we need, or the guess about what x could be, is called a prior distribution for x, p(x). This prior distribution helps to model what we believe x could be. For example, if we really don't know anything about x except that it could be anywhere from 0 to 100, we would consider a uniform distribution as our prior. In other words, every single value between 0 and 100 is equally likely so every value has the same probability of being chosen. On the other hand, imagine we believe that x should be 100 but we aren't really sure and it's possible that it's 90 or 80 or 101 or 112. The point is we are pretty sure it's close to 100. Here, we might consider a normal (or gaussian) distribution as our prior for x where the peak of it is centered at 100. That way 100 has the highest probability of being chosen but as you move away from 100, the probability of that value drops. As a side note, we can see now that x is a random value, since it can take a lot of different values dependent on what distribution you pick.
(left) Uniform; (right) Normal / Gaussian


The second thing we need, or the values y could take knowing x, is known as the conditional distribution of y given x: p(y|f(x)). Usually, you just write p(y|x). This thing describes the distribution of y given we know the entire distribution of values that x could ever take.
As you can imagine, this assumes that y is dependent on x. If it wasn't, then the fact we know x doesn't matter! But assuming that there is a some relationship between y and x, knowing x could change the values that y could take. The point of the conditional distribution is to describe uncertainty in what y could be even though we know the distribution of x. In a perfect world, what would happen instead is that if we knew that y is precisely f(x), then we calculate the value of unknown variable x by just computing the inverse function of f(x). For example, if we knew that y = f(x) = x + 7 and that our y = 10, then x is derived from the equation 10 = x + 7, thus x = 3. However, in the real world, we never can be that sure of what f(x) really is.
There are always imprecisions to the measurements tools used and human error and a bunch of factors that prevent y from being a deterministic (fixed) value given x. Instead, even if we know the distribution of x, then y itself has a distribution because we aren't really sure. This conditional distribution can also be normal, uniform, poisson, etc. In our applications, we generally use a normal distribution since we can be fairly sure what y is (since as defined above, y is a known variable), just not 100%.
What we interested in knowing is actually p(x|y). This is called the posterior distribution of the unknown variable x given our known variable y. Luckily, there's a pretty famous formula to help us calculate this: it's called Bayes Rule.
