03 - Aggregation === ###### tags: `Model Thinking` `Courses` ## Introduction In complicated models, aggregation is tricky. __Objective__: Construct toy models to understand the process of aggregation. Phillip Anderson, Nobel Prize: More is different. A water molecule is simple, and we can understand all their properties. But a molecule can not be wet, wet is a property of water as a whole. We can never understand wetness from studying a water molecule. The same goes for neuron and cognition. This are examples of emergence properties at the macro level. How to think about aggregation? * Actions (Central Limit Theorem) * Single Rule (Game of Life) * Famly of Rules (One Dimensional Celular Automata models) * Preferences (...) Because social scientists are interested in groups of people, aggregation is important. Recall, why do we model? * Predict points (Predictability of Bell curves) * Understand data * Understand patterns (The Game of Life) * Understand class of outcomes (Celular Automata) * Work through logic ## Central Limit Theorem Suppose we want to model a situation where we have a group of people where each one makes a decision independently of the others. What we want to understand how the group will behave as a whole when each of its members makes their decisions independently. To characterize this situation, we are gonna use the concept of *probability distribution*: > ...a probability distribution is a mathematical function that describes possible outcomes and the likelihood of those values... ##### Example A simple example, take a family of 4, want to know what is the likelihood that any Saturday $0$ members of the family go for a walk or only $1$ member of the family go for a walk and so on. Suppose that we keep track of the data and we found that the results are: <center> | # of People | Likelihood | |:--------:|:--------:| | 0 | $10\%$ | | 1 | $15\%$ | | 2 | $40\%$ | | 3 | $15\%$ | | 4 | $20\%$ | </center> we can also represent this by an histogram: ![](https://i.imgur.com/9xfKFk0.png =450x) So, what the Central Limit Theorem tells us is that if a whole bunch of people makes a whole bunch of independent decisions, the distribution that we get has this nice bell-shaped curve: > [name=MitchV34] simulate a normal from a binomial distribution. ##### Simulation ![](https://i.imgur.com/mFnGWNE.gif) This means that like the most likely outcome is the one right in the middle, that means that we can predict a lot of things. We can also use the standard deviation to know how spread out is the possible outcome. Knowing the mean and the standard deviation, it's always gonna be the case that $68\%$ of all outcomes will be between $-1$ and $+1$, there is a $95\%$ chance I'm within $2$ standard deviations and so on. ![](https://i.imgur.com/JPWixhD.png =550x) Next, we show how we can use this to predict. ##### Example Suppose that there is an airline that wants to sell a plane with $380$ seats. Empirical data shows that $90%$ of the time people show up, so the airline sells $400$ tickets. We want to know What's the likelihood that if the airline sells $400$, more than $380$ people show up. Here's where the model can help us. If the airline sells $400$ tickets and the probability that any one of those ticket holders will show up (assuming that their decisions are independent): $$\mu = N\cdot p = 400 \cdot 0.9 = 360$$ That's less than $380$ seats, but it should be fine, but $360$ doesn't say too much. We are really interested in the probability that more than $380$ people show up, we need to take account of the standard deviation. $$\sigma = \sqrt{p(1-p)\cdot N} = \sqrt{0.1 \cdot 0.9 \cdot 400}=\sqrt{36} = 6$$ Now I get a bell-curve with a mean of $360$ and a standard deviation of $6$: ![](https://i.imgur.com/ABEgUeD.png =550x) That means $68\%$ of the time; we're gonna between $354$ and $366$. That's great. It means that $95\%$ of the time, we'll be between $348$ and $372$; also great It means $99.75\%$ of the time, we'll be above $378$. So this means that more than $99.75\%$ of the time the airline won't overbook. ##### Statement __Central Limit Theorem statement is the following: We got a whole bunch of random variables (that could represent decisions, measures or something else); if those variables:__ * __are independent__ * __have finite variance (bounded variables)__ __then when we add those variables, we are gonna get a normal distribution which means a bell-curve.__ This is useful because this means that we can predict stuff. > Not everything is predictable. **For example,**: Stock returns are notoriously unpredictable; in this case, the central limit theorem fails because actions are not independent. ### Six Sigma [Six Sigma](https://en.wikipedia.org/wiki/Six_Sigma) is a quality control introduced in 1986 by Motorola; this is a very famous application of the central limit theorem. We know that $68\%$ of the time the outcome falls inside of one standard deviation, and $95\%$ of the times between $2$ standard deviations. How many times will the outcome be outside 6 standard deviations? The answer is $3.4$ times in a million. ##### Example: Suppose a banana seller knows that the average banana sales are $500$, and the standard deviation is $10$. The seller wants to know how many bananas he has to have each day to not fall short. Six Sigma methodology tell us that: $$6\sigma = 60$$ If the seller packs $560$ bananas, she will fall short $3.4$ times in a million days. ## Game Of Life A very simple model shows how things aggregate. Toy model, not about anything. [John Conway]() The game of life: * Each cell has 8 neighbors. * Can be alive or dead (on or off). Simple rules: * If off you can only become on if 3 neighbors are on * if on and stay on only if 3 o 2 neighbors alive >[name=MitchV34] Make the 1st example of a class step by step. >[name=MitchV34] Simulate this thing in Agents.jl >[name=MitchV34] Replicate examples of the class. >[name=MitchV34] Explain emergence with simple glider. It could also produce systems that grow and move, as the F-Pimento, that looks alive. In general, we can obtain: I) Fix game (equilibrium, stable) II) Blinks (simple pattern) III) Complex pattern (Complexity) IV) Chaos What have we learned? __Self organization__: Patterns appear without a designer. __Emergence__: Functionalities appear, glider, glider guns, counter, computers __Logic Right__: Simple rules produce incredible phenomena. __=> Simple rules produce macro complexity__ ## Celular Automata Game of life is a particular cellular automata model. Now, we will look at a simpler class of Cellular Automata (the original.) We are interested in predicting the *class outcome*, in particular, how we get complex outcomes. Von Newman *Book Recommendation:* A new kind of science #### Model: Now things are in a one-dimensional line. The possible outcomes (classes) are stable (I), patterns (II), complexity (III), and chaos (IV). * Only two neighbors (simple line) * Times moves horizontally. Note that there are only eight configurations or cases to study. For clarity of exposition, we assume 0 is off, and 1 is on. Each cell $X$ have one neighbor to the left and one to the right, the possible configurations are 000,001,010,011,100,101,110,111; where the second number represent the state of $X$. For each one of these 8 configurations, $X$ remains in the same state or change, that is, will be on or off in the next period. Every configuration must be set to define a rule. Thus, that makes $2^8=256$ possibilities of rules; therefore, the rule space has 256 elements. For example, if we make a list of what happens to $X$ from 000 to 111, 10000000, represents that $X$ only will be on in the next period if it is off and his neighbors are off as well. > [name=MitchV34] Make a table of possible configurations, possible behaviors, and number of rules. The interesting question is why some rules produce different outcomes(random stable or complex). > John Weeler, IT from BIT He presents the idea that every item of the physical world, has at his interior a yes or no decision. ### Langton's $\lambda$ Langton's $\lambda$ represents the % of things that are on. What we see is that when the Langton's $lambda$ is 0, nothing happens, when it is small (1), we see mostly blinks and simple patterns. We see *complexity* in the in-between region. **Explain** In markets, we have a lot of interdependence -> complexity. What have we learned: * Simple rules combine to form anything * It from bit (Profound idea) * Complexity and randomness require interdependency ## Aggregation of preferences CLT aggregate number or actions Game of life aggregate rules Now, we aggregate preferences are different mathematical objects. {%hackmd G-uuuRi2RyKS_IyjBJS3Kw %}