We learn generalized concepts naturally:
How do we build models that can learn these abstract concepts?
Each bag can learn its own categorical distribution. It explains previously observed data well but fails to generalize.
Let's say this is what we observe:
Human observation: All bags have blue as predominant color. This is an abstract (generalized) notion of distribution of colors in bags. The below approach does not work:
As you can see, it predicts poorly the distribution of bags 3 and N.
But if we try to learn a shared prototype, it works:
It predicts the distribution of an unseen bag N very well.
Suppose that we have a number of bags that all have identical prototypes: they mix red and blue in proportion 2:1. But the learner doesn’t know this. She observes only one ball from each of N bags. What can she learn about an individual bag versus the population as a whole as the number of bags changes?
If the data comes from different bags, the generalized prototype learns well but the specific one does not:
But if all samples come from a single bag, the specific prototype learns well but the generalized one does not:
Suppose that we observe that
bag1
consists of all blue marbles,bag2
consists of all green marbles,bag3
all red, and so on. This doesn’t tell us to expect a particular color in future bags, but it does suggest that bags are very regular—that all bags consist of marbles of only one color.
Suppose we have the following data:
Note that we only have one sample from bag4
and no sample from bag N
.
bag4
are orange.This can be modeled by defining our prototype as:
After observing the data, alpha
will end up being significantly smaller than 1.
This means roughly that the learned prototype in phi should exert less influence on prototype estimation for a new bag than a single observation.
Now let's say we have the following data:
The marble color is instead variable within bags to about the same degree that it varies in the population as a whole.
In this case alpha
is significantly greater than 1.
It is the preference to generalize a novel label for some object to other objects of the same shape, rather than say the same color or texture.
Let's say each object category has four attributes: 'shape', 'color', 'texture', 'size'
. Let's say the following data is observed:
Let's define the range of values of attributes:
One needs to allow for more values along each dimension than appear in the training data so as to be able to generalize to novel shapes, colors, etc.
Here each attr
has its own phi
and alpha
:
The program above gives us draws from some novel category for which we’ve seen a single instance. In the experiments with children, they had to choose one of three choice objects which varied according to the dimension they matched the example object from the category.
In this study, the authors found that:
to what extent people generalise depends on beliefs about the homogeneity of the group that the object falls in with respect to the property they are being asked to generalize about.
Let's say on a new island you encounter one male person a tribe T.
Obesity: If he is obese, how likely are other male members of tribe T to be obese?
Intuition: Not so likely because obesity is a feature with heterogenous distribution within a tribe.
Skin color: If he is brown, how likely are other male members of tribe T to be brown?
Intuition: Quite likely because skin color varies across tribes but is uniform in a single tribe.
Bag: tribe
Color: obesity or skin color
Here is what they found:
Again, a compound Dirichlet-multinomial distribution was used to model this experiment.
Motivation: Humans are able to categorize objects (in a space with a huge number of dimensions) after seeing just one example of a new category. For example, after seeing a single wildebeest people are able to identify other wildebeest, perhaps by drawing on their knowledge of other animals.
Read this paper (pdf).
probabilistic-models-of-cognition