Chapter 11: Hierarchical models

# Chapter 11: Hierarchical models #### Key idea We learn generalized concepts naturally: - poodle, Dalmatian, Labrador → dog - sedan, coupe, convertible, wagon → car How do we build models that can learn these _abstract_ concepts? ### Example 1: Bags with colored balls Each bag can learn its own categorical distribution. It explains previously observed data well but fails to generalize. Let's say this is what we observe: ```javascript var observedData = [ {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'black'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag2', draw: 'blue'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'blue'}, {bag: 'bag2', draw: 'blue'}, {bag: 'bag2', draw: 'blue'}, {bag: 'bag2', draw: 'red'}, {bag: 'bag3', draw: 'blue'}, {bag: 'bag3', draw: 'orange'} ] ``` ![](https://github.com/vinsis/math-and-ml-notes/raw/master/images/hierarchical_models1.jpg) Human observation: All bags have __blue__ as predominant color. This is an abstract (generalized) notion of distribution of colors in bags. The below approach does not work: ![](https://github.com/vinsis/math-and-ml-notes/raw/master/images/hierarchical_models2.jpg) As you can see, it predicts poorly the distribution of bags 3 and N. But if we try to learn a shared prototype, it works: ![](https://github.com/vinsis/math-and-ml-notes/raw/master/images/hierarchical_models3.jpg) It predicts the distribution of an unseen bag N very well. --- ### Example 2: Learning generalized vs specific prototypes > Suppose that we have a number of bags that all have identical prototypes: they mix red and blue in proportion 2:1. But the learner doesn’t know this. She observes only one ball from each of N bags. What can she learn about an individual bag versus the population as a whole as the number of bags changes? > If the data comes from different bags, the generalized prototype learns well but the specific one does not: ```javascript var data = [{bag:'bag1', draw:'red'}, {bag:'bag2', draw:'red'}, {bag:'bag3', draw:'blue'}, {bag:'bag4', draw:'red'}, {bag:'bag5', draw:'red'}, {bag:'bag6', draw:'blue'}, {bag:'bag7', draw:'red'}, {bag:'bag8', draw:'red'}, {bag:'bag9', draw:'blue'}, {bag:'bag10', draw:'red'}, {bag:'bag11', draw:'red'}, {bag:'bag12', draw:'blue'}] ``` ![](https://github.com/vinsis/math-and-ml-notes/raw/master/images/hierarchical_models4.jpg) But if all samples come from a single bag, the specific prototype learns well but the generalized one does not: ```javascript var data = [{bag:'bag1', draw:'red'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'blue'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'blue'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'blue'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'red'}, {bag:'bag1', draw:'blue'}] ``` ![](https://github.com/vinsis/math-and-ml-notes/raw/master/images/hierarchical_models5.jpg) --- ### Learning Overhypotheses: Abstraction at the Superordinate Level > Suppose that we observe that `bag1` consists of all blue marbles, `bag2` consists of all green marbles, `bag3` all red, and so on. This doesn’t tell us to expect a particular color in future bags, but it does suggest that bags are very regular—that all bags consist of marbles of only one color. > Suppose we have the following data: ```javascript var observedData = [ {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag4', draw: 'orange'}] ``` Note that we only have one sample from `bag4` and no sample from `bag N`. - We can confidently say that all samples from `bag4` are orange. - For bag N, any color is equally probable. This can be modeled by defining our prototype as: ```javascript // the global prototype mixture: var phi = dirichlet(ones([5, 1])) // regularity parameters: how strongly we expect the global prototype to project // (ie. determine the local prototypes): var alpha = gamma(2,2) var prototype = T.mul(phi, alpha) ``` After observing the data, `alpha` will end up being significantly smaller than 1. > This means roughly that the learned prototype in phi should exert less influence on prototype estimation for a new bag than a single observation. > Now let's say we have the following data: ```javascript var observedData = [ {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'red'}, {bag: 'bag1', draw: 'green'}, {bag: 'bag1', draw: 'black'}, {bag: 'bag1', draw: 'red'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag2', draw: 'red'}, {bag: 'bag2', draw: 'black'}, {bag: 'bag2', draw: 'black'}, {bag: 'bag2', draw: 'blue'}, {bag: 'bag2', draw: 'green'}, {bag: 'bag3', draw: 'red'}, {bag: 'bag3', draw: 'green'}, {bag: 'bag3', draw: 'blue'}, {bag: 'bag3', draw: 'blue'}, {bag: 'bag3', draw: 'black'}, {bag: 'bag3', draw: 'green'}, {bag: 'bag4', draw: 'orange'}] ``` > The marble color is instead variable within bags to about the same degree that it varies in the population as a whole. > In this case `alpha` is significantly greater than 1. --- ### Example: The Shape Bias It is `the preference to generalize a novel label for some object to other objects of the same shape, rather than say the same color or texture.` Let's say each object category has four attributes: `'shape', 'color', 'texture', 'size'`. Let's say the following data is observed: ```javascript var observedData = [{cat: 'cat1', shape: 1, color: 1, texture: 1, size: 1}, {cat: 'cat1', shape: 1, color: 2, texture: 2, size: 2}, {cat: 'cat2', shape: 2, color: 3, texture: 3, size: 1}, {cat: 'cat2', shape: 2, color: 4, texture: 4, size: 2}, {cat: 'cat3', shape: 3, color: 5, texture: 5, size: 1}, {cat: 'cat3', shape: 3, color: 6, texture: 6, size: 2}, {cat: 'cat4', shape: 4, color: 7, texture: 7, size: 1}, {cat: 'cat4', shape: 4, color: 8, texture: 8, size: 2}, {cat: 'cat5', shape: 5, color: 9, texture: 9, size: 1}] ``` Let's define the range of values of attributes: ```javascript var values = {shape: _.range(11), color: _.range(11), texture: _.range(11), size: _.range(11)}; ``` > One needs to allow for more values along each dimension than appear in the training data so as to be able to generalize to novel shapes, colors, etc. > Here each `attr` has its own `phi` and `alpha`: ```javascript var categoryPosterior = Infer({method: 'MCMC', samples: 10000}, function(){ var prototype = mem(function(attr){ var phi = dirichlet(ones([values[attr].length, 1])) var alpha = exponential(1) return T.mul(phi,alpha) }) var makeAttrDist = mem(function(cat, attr){ var probs = dirichlet(prototype(attr)) return Categorical({vs: values[attr], ps: probs}) }) var obsFn = function(datum){ map(function(attr){observe(makeAttrDist(datum.cat,attr), datum[attr])}, attributes) } mapData({data: observedData}, obsFn) return {cat5shape: sample(makeAttrDist('cat5','shape')), cat5color: sample(makeAttrDist('cat5','color')), catNshape: sample(makeAttrDist('catN','shape')), catNcolor: sample(makeAttrDist('catN','color'))} }) ``` > The program above gives us draws from some novel category for which we’ve seen a single instance. In the experiments with children, they had to choose one of three choice objects which varied according to the dimension they matched the example object from the category. > --- ### Example: Beliefs about Homogeneity and Generalization [In this study](https://scholar.google.com/scholar?q=%22The%20use%20of%20statistical%20heuristics%20in%20everyday%20inductive%20reasoning.%22), the authors found that: > to what extent people generalise depends on beliefs about the homogeneity of the group that the object falls in with respect to the property they are being asked to generalize about. > Let's say on a new island you encounter one male person a tribe T. __Obesity__: If he is obese, how likely are other male members of tribe T to be obese? __Intuition__: Not so likely because obesity is a feature with heterogenous distribution within a tribe. __Skin color__: If he is brown, how likely are other male members of tribe T to be brown? __Intuition__: Quite likely because skin color varies across tribes but is uniform in a single tribe. #### Analogy to bags with color balls Bag: tribe Color: obesity or skin color Here is what they found: ![](https://probmods.org/assets/img/nisbett_model_humans.png) Again, a compound Dirichlet-multinomial distribution was used to model this experiment. --- ### ToDo: One-shot learning of visual categories > __Motivation__: Humans are able to categorize objects (in a space with a huge number of dimensions) after seeing just one example of a new category. For example, after seeing a single wildebeest people are able to identify other wildebeest, perhaps by drawing on their knowledge of other animals. Read [this paper (pdf)](http://proceedings.mlr.press/v27/salakhutdinov12a/salakhutdinov12a.pdf). --- ### ToDo: Get some ideas using overhypotheses from [this](https://sci-hub.tw/https://www.cambridge.org/core/journals/journal-of-child-language/article/variability-negative-evidence-and-the-acquisition-of-verb-argument-constructions/D62EDBFF5A8F1ACC821451FEAD3C88FB) paper. ###### tags: `probabilistic-models-of-cognition`