# Is High or Low Entropy More Intelligent?
> :warning: *I am sleep deprived and this post is probably riddled with imprecisions and unsubstantiated claims.*
It’s often understood that, because the universe is tending towards high entropy because of the second law of thermodynamics, that entropy is synonymous with chaos, nothingness, and therefore lacking in intelligence. I wonder if this is actually true? Is it true that low entropy is good and high entropy is bad (with respect to order, structure, life, and intelligence)?
## College Is Hot (Or not?)
Someone told me an interesting analogy, which makes a good case that high entropy is actually more intelligence; Why do we go to college? College (in theory) opens doors to more opportunities in life – you get access to more jobs, more friends, more knowledge, and all of those things have further and further opportunities to give you a happy, healthy, and fullfilling life. On top of that, you also have virtually all of the same opportunities as if you were to not go to college! Want to work at McDonalds? You have about the same probability, if not; higher, for becoming a patty flipper than you did before becoming an alumni… Is this high or low entropy? Interpreted as a state space like in a markov decision process, you are increasing the probability of reaching certain states, thus giving you a larger number of items to choose from for your future trajectory.
## Caught Red Handed
Consider you just committed an armed robbery at a convenience store to feed your family, but you were caught and arrested in the midst of the act. There were surveillance cameras everywhere, you didn’t wear a mask, and there were many eye witnesses who were able to positively identify you and will be testifying at your trial. It’s guaranteed you’ll be serving 5-10 years in prison unless you get lucky and somehow are able to escape. You are in the back of the police car handcuffed and alone. What choices do you have in the near-future, ie. 5-10 years from now? You have plenty to make inside the jail, but you are confined to a certain life in this jail. Of course, you can move your arms, legs, vocal cords, etc. in infinite directions and can take an infinite amount of actions so effectively your immediate action space is the same (hopefully), but in terms of the states you can reach, ie. going to the movies on a Friday night with friends is significantly reduced by virtue of your crimes. You have reduced your state space in size. If you hadn’t committed these crimes, you have a lot more states you can visit and wouldn’t be constrained – however, you would still have the ability to perform the actions that lead you to where you are now! So your 5-10 year trajectory horizon has virtually only shrunk by making those decisions. Had you not robbed that store (or gotten away with it) your 5-10 year trajectory horizon would have remained effectively the same. You’ve trapped yourself to a much worse position than you occupied previously.
## What does it mean to have high entropy in a markov decision process?
From the perspective of RL, you want to explore your state space by taking actions that are balanced w.r.t exploration and exploitation – you want to have high entropy in the actions (more uniform randomness) for exploration, but also want some degree of exploitation so you don’t spend too much time exploring useless states (less uniform randomness – more bias). Uniform distributions have the highest amount of entropy possible, whereas the lowest amount of entropy possible is a distribution that has the highest probability centered around a single value (dirac delta distribution). You can think of this as the variance-bias tradeoff in machine learning. In a uniform distribution you have maximally high variance and minimally low bias – your value could be anything within your range. However a Dirac delta distribution has minimally low variance and a maximally high bias since your value can be only 1 thing.
So is it more intelligent to have only 1 decision to perform, or an infinite (or large discrete) amount of decisions to perform? If you have the optimal policy, then there may only be exactly 1 option, maybe it is the Dirac delta distribution where all states have exactly one action and it leads to the most optimal outcome. An important distinction here is that — a Dirac delta distribution policy is actually not enough to be intelligent right? I mean think about it you could make a dirac delta distribution that literally is horrible, it always presses the left stick in cart pole. So that’s not enough to make it intelligent. The intelligence comes from the fact that the action you take will necessarily result in the highest cumulation of future rewards possible. In the case of a uniform action space — you have states at your disposal with equal probability. So maybe you will just as ineffectively get there as the dirac distribution for a single left value in practice, but you at least have the possibilitity of reaching the optimal trajectory (however minuscule the probability). Also, the dirac distribution can be thought of as an instantiation of a commitment to an action sampled from a uniform distribution. So the uniform distribution contains, with equal probability, all possible dirac delta distributions. That’s weird to say, and may seem to add nothing to the conversation, but I think it’s extremely important to take note of that.
So that’s where the exploration and exploitation and/or bias and variance tradeoffs come into play. There’s obviously a reason they’re so well known in the machine learning sphere. The balance is important!
## Martial Intelligence
It seems you can still take this idea further with analogy — the idea of positioning in martial arts, with better positions you are able to inflict more damage/protect yourself from your opponent. In a high entropy state w.r.t the state space, you have more power and can enact your will more thoroughly, however in a low future-entropy state, you are subject to the system and it’s within your opponent’s power to enact their will upon you. Obviously you want to have a distribution of actions that are more likely to put you into these advantageous positions and be able to choose the correct actions while in those advantageous positions (or disadvantageous ones), however you want the entropy of your future to be high — in other words you want all of the possible states you can occupy to be as broad as possible, and to be able to choose the goal states you want to reach, and accurately find the right path to said goal state with maximal probability.
## Conclusion
Maybe the true intelligence lies in the ability to select a goal state and then apply a policy that maximizes the probability to occupy that goal state in as short of a path as possible.
Well, I guess I’m right back where most of the common literature is — intelligence seems to be somewhere in the balance of the two — high and low entropy. Having too high of entropy means that, even over your future state probabilities, if you have a very high entropy that means your probability of visiting all trajectories is uniform — ie. You have the same probability to visit all states as every other (within the same timeline). So in effect you will never reach a goal state that someone could set in practice. In a similar but orthogonal perspective — having a dirac delta distribution where you can only visit one trajectory is basically the same where you’re doomed to an unintelligent existence, unless that one trajectory is the optimal, in which case it would reach your goal state every time. But properly selecting the optimal trajectory from all possible ones every time is just as unlikely as following a uniformly random policy. Unless you run it over many episodes — in which case obviously the uniformly random policy would win — even better if you have some rules to orchestrate these trajectories to find the one that gets you closest to your goal state.
Also I need some sleep.
###### tags: `exploring-intelligence`