Properties of Dynamic Systems Suitable for Learning Posterior Distribution over Transition Hypergraphs

# Properties of Dynamic Systems Suitable for Learning Posterior Distribution over Transition Hypergraphs ###### tags: `Mila`, `Causal Discovery`, `Hypergraph-Latent` In the project *Learning latent hypergraphs* we are interested in learning the explanations, encoded in hypergraphs, of transitions between discrete observations of an evolving system. The said hypergraphs encode the interactions between the causal variables of the system as well as the types of their interactions all known as mechanisms. Let consider $x_t$ and $x_{t+1}$ as two consecutive observations a system $S$, $H_t$ as a hypergraph potentially explaining the transition from $x_t$ to $x_{t+1}$. Instead of only trying to learn the true hypergraph $H_t$ through the maximum likelihood, we sampled hypergraphs from the posterior distribution, which embeds all information from the observation $(x_t, x_{t+1})$. That is, we sampled $H_t$ from $P(H_t|x_t, x_{t+1})$ $$\begin{align*} P(H_t|x_t, x_{t+1}) \propto P(H_t)P(x_t, x_{t+1}|H_t) \end{align*}$$ Given the composite structure of hypergraphs and that there might be many ways of sampling them, we sampled these hypergraphs from the posterior distribution using generative flow networks. Sampling from the posterior distribution can yield several different hypergraphs. However, for some of them, depending on the environment, it may be impossible to properly compute the likelihood because the mechanisms encapsulated in the sampled hypergraph are not practically feasible. We experienced this issue in a bouncing ball environment, and we were obliged to hard-code the likelihood, and consequently, it did not faithfully apply the mechanisms sampled because they are not physically possible. Only the mechanisms that are physically possible given the initial state $x_t$ are rendered in the computation of $P(x_t, x_{t+1}|H_t)$. ## Deterministic and Hard-coded Environments Favour the Maximum Likelihood Training A system is hard-coded when given an initial state $x_t$, only the mechanisms affordable by the laws driving changes in the system are executable. A bouncing ball environment evolving under the perfect elastic collision conditions is a typical case of a hard-coded system. Precisely, given the positions and velocities of balls at a time $t$, we can analytically figure out whether a collision between two balls is actually possible in the system. Hence, when a sampled hypergraph contains mechanisms telling that there is a non-physically possible collision between two balls, the likelihood associated with the sampled hypergraph will not genuinely reflect the application of the collision thereof, and this could mislead the sampling process because the likelihood will not be consistent with the application of mechanisms. Generally, in such a hard-coded system, it is complicated to have a multimodal posterior mainly because the mechanisms are too rigid and only permit the execution of the true mechanisms generating the observation. Therefore, this might lead to a unimodal posterior, which will be more simple to fit with a maximum likelihood. ## Properties of Mechanisms for MultiModal Posterior * **Stochastic mechanism**. The stochasticity in the execution of mechanisms will introduce uncertainty in the outcome. This uncertainty may lead to several explanations for the same observations. * **All-time applicable mechanism**. The main inconvenience with mechanisms that are not executable in all situations is that, given $x=(x_t, x_{t+1})$, the posterior distribution will be a piecewise function of $x$, and in cases where the mechanisms encoded in $H_t$ are not realisable, the GFlowNet will receive a reward resulting from the non-execution of mechanisms it is supposed to learn the behaviours. * **Simple and distinctive mechanisms**. Generally, each system has its specific dynamics. Approximating these dynamics with highly complex functions may lead to a hard-coded environment where only a handful of mechanisms are realisable under the inherent system behaviour. For example, let's suppose we want to sample hypergraphs encapsulating the set of transformations carried out to go from one molecule to another. One may try to use very complex functions to represent the possible transformations. For example, one mechanism transforming two atoms could be **an ionic bond followed by a covalent bond**. Such a complex mechanism could harden the constraints necessary for its realisation. However, some constraints can be alleviated by breaking down compounded mechanisms into simple and significantly distinct functions. In the bouncing ball environment, we mainly identified three mechanisms: the inertial motion, the ball-to-wall and the ball-to-ball collision. However, appearing as simple functions, these mechanisms are highly constrained and are not executable all the time. Moreover, they are actually compositions of simple and discernable functions, namely, inertial motion, reversing of velocity's coordinates, and changes in amplitude and direction of velocities. The simple functions, which are the building blocks of the three complex mechanisms mentioned earlier, are, indeed, applicable all the time, and using them as the basis of mechanisms would lift some constraints regarding the rigidity of the studied system. ## Combination of Elementary Mechanisms and Complex Explanatory Structures