Imagine a factory where the widget-maker makes a stream of widgets, and the widget-tester removes the faulty ones. You don’t know what tolerance the widget tester is set to, and wish to infer it.
n
widgets:Sample a widget from a distribution of widgets. Condition on the widget passing the test.
n-1
widgets recursivelyn
widgets recursivelyn
widgets:Sample n
widgets at the same time from a distribution of widgets. Condition on all widgets passing the test.
Way 1:
Way 2:
Rather than thinking about the details inside the widget tester, we are now abstracting to represent that the machine correctly chooses a good widget
We don't know the behavior of widget testing machine. So we think of testing at an abstract level and infer to maximize what we want.
An agent tends to choose actions that she expects to lead to outcomes that satisfy her goals.
Let's say Sally wants to buy a cookie from a deterministic vending machine. This is how it works:
Here is how actions are chosen:
where transition
is the output of taking action
in state
.
She is clearly always press b
to get the cookie. Now let's say the vending machine is not realistic:
We see Sally still presses b
most of the time (but not every time).
Technically, this method of making a choices is not optimal, but rather it is soft-max optimal (also known as following the “Boltzmann policy”).
Here is how we would represent the whole thing:
Let's say we don't know what Sally wants but we observe her pressing b
. How can we infer what she wants?
Here we don't know what the goal is. So we our goalSatisfied
becomes probabilistic instead of deterministic:
We randomly sample goalSatisfied
and then for that goal, infer the best action.
We draw an inference on goal
by observing that the chosen action was b
:
Note how we are doing inference inside an inference here.
Now let's say the button b
gives any of the two options equally probably:
Despite the fact that button b is equally likely to result in either bagel or cookie, we have inferred that Sally probably wants a cookie. This is a result of the inference implicitly taking into account the counterfactual alternatives: if Sally had wanted a bagel, she would have likely pressed button a.
Let's say we observe Sally pressing b
several times. We don't know what she wants but we do know she has some preference. In this case this is how we define the goal
:
… and this is how we condition:
When we defined the vending machine like so:
we made the assumption that we knew how the vending machine worked. What if don't know how it works? Then we can replace ps: [.9, .1]
and ps: [.5, .5]
by a distribution:
Now if we assume that Sally knows how it works, she does not need to Infer
it. Thus
We can capture this by placing uncertainty on the vending machine, inside the overall query but “outside” of Sally’s inference:
Observe the below line:
We are basically asking:
Assuming Sally knows how the machine works and she wants a cookie and she is seen pressing
b
, what is the probability of action 'b' giving cookie?
Now imagine a vending machine that has only one button, but it can be pressed many times. We don’t know what the machine will do in response to a given button sequence. We do know that pressing more buttons is less a priori likely.
The vending machine is now defined as:
We first condition on Sally pressing a
to get a cookie
:
Then we compare it with Sally pressing aa
to get a cookie
:
Why can we draw much stronger inferences about the machine when Sally chooses to press the button twice? When Sally does press the button twice, she could have done the “easier” (or rather, a priori more likely) action of pressing the button just once. Since she doesn’t, a single press must have been unlikely to result in a cookie. This is an example of the principle of efficiency—all other things being equal, an agent will take the actions that require least effort (and hence, when an agent expends more effort all other things must not be equal).
In these examples we have seen two important assumptions combining to allow us to infer something about the world from the indirect evidence of an agents actions. The first assumption is the principle of rational action, the second is an assumption of knowledgeability—we assumed that Sally knows how the machine works, though we don’t. Thus inference about inference, can be a powerful way to learn what others already know, by observing their actions.
Suppose we condition on two observations: that Sally presses the button twice, and that this results in a cookie. Then, assuming that she knows how the machine works, we jointly infer that she wanted a cookie, that pressing the button twice is likely to give a cookie, and that pressing the button once is unlikely to give a cookie.
Probability of cookie given a
was pushed:
Probability of cookie given a
was pushed:
Notice the U-shaped distribution for the effect of pressing the button just once.
How do we explain this?
Note that the probability that she wanted a cookie is 0.65. Thus there is a 0.35 chance that she did not want a cookie. But we saw her press aa
. Thus it is likely that pressing aa
gives a bagel. Thus pressing a
gives a cookie.
This very complex (and hard to describe!) inference comes naturally from joint inference of goals and knowledge.
Key idea
We have two entities communicating to each other. First entity tries to infer how second entity thinks. Second entity tries to infer how first entity thinks. They keep on updating their beliefs.
However this is an infinite loop. To prevent this we find a way to get out of it after a certain depth.
Say we have two dice with the probabilities as shown:
On each round the “teacher” pulls a die from a bag of weighted dice, and has to communicate to the “learner” which die it is by showing them faces of the die. Both players are familiar with the dice and their weights.
Teacher has a prior over sides. Student has a prior over dice.
A simple roll of a die:
This is how teacher and student communicate and infer about each other:
assume that there are two dice, A and B, which each have three sides (red, green, blue) that have weights like so:
Now let's say the learner is shown a green
side.
B
since it has a higher chance of showing green.A
. Why? because “if the teacher had meant to communicate B, they would have shown the red side because that can never come from A.”
probabilistic-models-of-cognition