owned this note
owned this note
Published
Linked with GitHub
###### tags: `risk measure`, `counterfactual`, `coupling`, `verification`, `calibration` `decision`, `gametheory` ,`homeostasis` ,`selfconsistent`
# System calibration
## coupling, copula, calibration
I am developing verification techniques based on coupling and copula calibration concepts whose goal is as follws:
### what?
verify system state (target state could be self-consistent (SC), optimal (OPT), monotone (MT), non-interfering (NI) etc)
### how?
using copula function structure and coupling which can balance information between joint and marginal through model bootstrap
- (OPT) iterated projection on the dual manifold which converge to the closest point of dual manifold (based on pythagoras theorem)
![](https://i.imgur.com/gY38wLX.png)
- (OPT) coupled chains that start from the typical set would have shorter meeting time compared to those which start from non-optimal initial points.
![](https://i.imgur.com/nMZEKyb.png)
- (OPT) Pareto optimal or on portfolios on efficient frontie would easily return to its state whereas those that aren't robust to perturbation
- (SC) iterated update from u1 to u1' via u1 -> C(u1, u2) -> C(u1, u2)|u2=1 ~ u1' based on Copula function definition 1-2 [here](http://www.columbia.edu/~mh2078/QRM/Copulas.pdf). Alternate update (u1->u1', u2->u2') is possible.
### why?
assumption that certain states (SC, OPT, MT?, NI?) can be attractors i.e. system values that get close enough to attractor values remain close even if slightly disturbed. Examples are:
- (OPT) typical set, minimum distance points between dual manifold, [self-consistent-field](https://aapt.scitation.org/doi/abs/10.1119/10.0002644?journalCode=ajp) (below) to solve Schrodinger equation, many other that use fixed point iteration technique.
![](https://i.imgur.com/ZhVoOOz.png)
- (SC) state where prior and data-averaged posterior have the same distribution in SBC
- need examples for MT and NI having attractor-flavor
### Related literature
- Barthe [here](https://youtu.be/pae2t5lPupk?t=557) introduces three relational properties of probabilistic programs or conditions for privacy algorithms (monotonicity, stability, non-interference) and used coupling to verify them.
- [Haykin Neural networks and learning machine](https://g.co/kgs/UgXnvh) ch.10 on "Iterative Algorithm for Computing the Optimal Manifold Representation of Data". The chapter also includes copula.
- [Computational Information Geometry with Frank Nielsen](https://www.youtube.com/watch?v=X3cBhBA1nNw&ab_channel=MLSSSydney2015)
![](https://i.imgur.com/n2sJWIJ.jpg)
- [Design of Risk Weights](https://www0.gsb.columbia.edu/faculty/pglasserman/Other/OFRdesignofriskwghts.pdf) Adaptive Risk Weights, iteration to equilibrium
## coupling, copula, calibration ends here
---
The role of #augmenting #lifting #dual
1. Explore augmenting techniques: Origin X $\xrightarrow{\text{Lift}\; f}$ Augmented (X, f(X))
- newly created structure as following on augmented space prompts 2 i.e. efficient update.
- Convex structure through dual variable: Lagrangian,
- Vector flow structure through dual variable: HMC,
- Markov structure through extra state: CVaR MDP, stopping time, augmented Neural ODE
- Sampling structure through latent var: EM, imputation
- Efficiency and diversity through copy: coupling, splitting
| algorithm | Original, <br> Aug (=Ori., New), New | New | Effect | Constrain/Invariant/Conserve | Ref |
| ----------------------- | ------------------------------------------- | ---------------------------- | ------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| SBC | $\theta, (\theta, y)$ | data | verification, calib. | $(\theta, \theta')$ symmetry | |
| Coupling | $\theta(w), (\theta_1(w), \theta_2(w))$ | rng-share process | unbias, var. reduction | | [Maximal Couplings of the Metropolis–Hastings Algorithm](http://proceedings.mlr.press/v130/wang21d/wang21d.pdf) |
| Splitting | $\theta(w), (\theta_1(w), \theta_2(w)?)$ | ? | | | |
| Copula | $x_a, (x_a, x_b)$ | param. with dependence | dependence inf, calib. | Uniform marginal | |
| Info. bottleneck | $x, (x,\tilde{x})$ | encode of the source | $p(\tilde{x}/x)$ optimal assignment | self-consistent eq. for $X \rightarrow \tilde{X},\tilde{X} \rightarrow Y$ coding <br> marginal for p$(y/\tilde{x}), p(\tilde{x})$ | [Tishby2000](https://arxiv.org/abs/physics/0004057)(Thm.4) |
| EM | $\theta, (\theta, z)$ | latent variable | sampling is easier for $p(\theta/z)$ than $p(\theta )$ | $p(\theta) =\int p(\theta/z)dz$ | |
| Imputation | $\theta, (\theta, z)$ | data augmentation | | | |
| CVaR MDP1 | $X, (X , C)$ | running cost state | | | Bäuerle(2011), Huang(2016), Miller(2017), Chow(2018), Backhoff-Veraguas(2020) from [Min20](http://www.mskyt.net/wp-content/uploads/2020/11/cvar-exec.pdf) |
| CVaR MDP2 | $X, (X , Q)$ | risk aversion quantile state | CVaR dual | | Pflug(2016), [Chow et al. (2015)](https://papers.nips.cc/paper/2015/file/64223ccf70bbb65a3a4aceac37e21016-Paper.pdf), Chapman(2018), Li(2020), [Min20](http://www.mskyt.net/wp-content/uploads/2020/11/cvar-exec.pdf) |
| Lagrangian dual | $x, (x,\lambda)$ | coeff. dual variable | dual's convex strc. | dual ineq. infsup>=supinf | |
| Aug. Lagrangian or ADMM | $x, (x, \lambda)$ or $x,z, (x,z, \lambda)$ | coeff. dual variable | convex strc.,better convergence | dual ineq. | [A generalized risk budgeting approach to portfolio construction ](http://www.columbia.edu/~mh2078/A_generalized_risk_budgeting_approach.pdf) |
| Stopping time | $X, (\alpha,X_\alpha)$ | stopping time | Markovian strc. | | Chung, Intro to prob. Ch.9 |
| HMC | $q, (p,q)$ | momentum in phase space | Vector flow structure | Hamiltonian symplectic vol. | |
| Augmented Neural ODE | $x, (x_1, x_2)$ | feature | flexible feature mapping | | [Augmented Neural ODEs](https://arxiv.org/abs/1904.01681) |
| Augmented preference | ? | augmented preference | infer preference w.o. util. ftn. | partial order | Boyd, Convex Optimization Ch.6 |
2. Apply them for state verification
- Origin $X_t$ $\xrightarrow{\text{Lift}\; f}$ Augmented ($X_t, f(X_t)$) $\xrightarrow{\text{Project/Update}\; f^{-1}}$ Origin $X_{t+1}$
- similar to Source ($X_t, P$) $\xrightarrow{\text{Compressor}\; f}$ Augmented ($X_t, f(X_t)$) $\xrightarrow{\text{Decompressor}\; P(X_{t+1}|f(X_t))}$ Receiver $X_{t+1}$
| algorithm | Update, <br>Calibration target | Lift f | Projection $f^{-1}$ |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | --------------------------------- |
| SBC | $\theta_t \xrightarrow{(\theta_t, y_t)} \theta_{t+1}$,<br>joint dist.simulator | $p(\theta/y)$ data simulator | $p(y/\theta)$ posterior simulator |
| Copula | $x^a_t \xrightarrow{(x^a_t, x^b_t)} x^a_{t+1},x^b_t \xrightarrow{(x^a_t, x^b_t)} x^b_{t+1}$,<br>copula ftn. $C(x_a,x_b)$ | | |
| Info. bottleneck | $x_t \xrightarrow{(x_t, \tilde{x}_t)} x_{t+1}$,<br> maximum rate $p^{*}(\tilde{x}/x)$ | | |
| EM | $\theta_t/Y \xrightarrow{(\theta_t, z_t)/Y} \theta_{t+1}/Y$,<br> $p(\theta/Y)$ | $p(\theta / z)$ | |
| Imputation | $\theta_t \xrightarrow{(\theta_t, z_t)} \theta_{t+1}$,<br> $p(\theta)$ | $Z_{t+1} \sim P (Z / Y, θ_t)$ | $θ_{t+1} \sim P (θ / Y, Z_{t+1})$ |
| CVaR MDP1 | $X_t \xrightarrow{(X_t, C_t)} X_{t+1}$,<br>optimum | | |
| CVaR MDP2 | $X_t \xrightarrow{(X_t, Q_t)} X_{t+1}$,<br>optimum | | |
| Lagrangian dual | $x_t \xrightarrow{(x_t,\lambda_t)} x_{t+1}$,<br>optimum | $f(x) +\lambda g(x)$ | |
| Aug. Lagrangian or ADMM | $x_t \xrightarrow{(x_t, \lambda_t)} x_{t+1}$ or $x_t,z_t \xrightarrow{(x_t,z_t, \lambda_t)} x_{t+1}, z_{t+1}$,<br>optimum | | |
| HMC | $q_t \xrightarrow{(p_t,q_t)} q_{t+1}$,<br>typical set | $q \rightarrow \pi^{-1}(q)$ | |
| Augmented Neural ODE | $x^a_t \xrightarrow{(x^a_t, x^b_2)} x^a_{t+1}$ (Incorrect),<br> ? | | |
1. Do you know what it means to "augment the known preferences (6.22) with the inequality u(ak) ≤ u(al)" from Boyd's convex opt. textbook p.341? Does augment mean sampling? augmented set of preferences is infeasible, it means that any concave nondecreasing utility function that is consistent with the original given consumer preference data must also satisfy u(ak) > u(al); conclude that basket k is preferred to basket l, without knowing the underlying utility function.
2. Do you know why Markov chain direction is $\tilde{X} ← X ← Y$ from Information bottleneck theorem (footnote 3 from the paper)?
3. What do you think is the difference between splitting and coupling? Do you agree they have similar flavor; the only difference being the direction of the bifurcation --= vs =--?
I have a quick question on augmenting methodology. I wish to understand more on the quote from your paper (A generalized risk budgeting approach to portfolio construction),
Quotes from papers
- "combining the augmented Lagrangian approach with MCMC sampling to generate a point in the proximity of the global optimum of the GRB problem. sample points with a higher objective function value and simultaneously drive the sample path in the direction of the feasible region using the augmented Lagrangian terms".
- by introducing an extra state variable, an optimal policy can be sufficiently characterized as a Markov process defined on this augmented state space (Min's [paper](http://www.mskyt.net/wp-content/uploads/2020/11/cvar-exec.pdf)
## augmentation techniques
Homeostasis := systems maintain stability
immune system := how to remain stable in the face of 'extreme' situation of an acute infection. This needs understanding on immune operate in steady-state conditions.
- underlying mechanism belied by stability (well-functioning homeostatic immune responses are barely visible)
- consequences of #dysfunction
- resilience to #perturbation
- #robustness vs #invariant : variability within the bound, diversification gives robustness
The Review articles in this Focus describe several physiological systems in which the immune system contributes to homeostasis. In the gut, intestinal epithelial cells co-ordinate an immunological environment that enables beneficial host–commensal relationships. During wound healing in the liver, macrophages and other immune cells control the fibrotic cascade to ensure a self-limiting response. In all tissues, regulatory T cells balance adaptive immune responses and are themselves under the control of homeostatic processes that ensure context-specific activity. Tissue homeostasis also depends on the clearance of apoptotic cells from the body by phagocytic immune cells.
Ref
[Nature issue on the connection between homeostasis and immune system](https://www.nature.com/collections/mxwslsscsf)
# Calibrating system
Every proposal is approximation to the unkown which needs to be `measured` `improved` `traded-off`
efficient = lifting to time axis then marginalized out time = plug Ez to z
without external force (guide) it could take much longer to reach optimal
### recap from the paper
### 1.information balance btw latent-nonlatent
- Static self-consistency as no **information is lost in its marginalization to p(θ, y)**, just as no information is lost by **saving the upper half of a symmetric matrix**.
- counterfactual $\in\sigma(parameter) = \sigma(observable) = \sigma(observable, paramter)$
- learning from generated data until information is balanced with full data
#### Q
- test for the system optimal?
- all the counterfactuals are adapted to $\theta_{opt}$
- Self-Consistent Field (SCF) Method
- ![](https://i.imgur.com/KIgz8K8.png)
![](https://i.imgur.com/a0yF7N0.png)
![](https://i.imgur.com/s8aoO4P.png)
![](https://i.imgur.com/yAiPGsG.png)
![](https://i.imgur.com/dI7H7aW.jpg)
- Hartree required final field computed from the charge distribution to be "self-consistent" with the assumed initial field.
- wave function too complex to be found directly, but can be approximated by a simpler wave function. This then enables the electronic Schrödinger equation to be solved numerically. The self-consistent field method is an iterative method that involves selecting an approximate Hamiltonian, solving the Schrödinger equation to obtain a more accurate set of orbitals, and then solving the Schrödinger equation again until the results converge.
- goal: converge to Schrödinger equation solution through
i) specify class
ii) optimize (allocate, parameter estimation)
iii) compare iteration (project to $\sigma(T)$) $T$ is quantities of interest.
iteration as joint is too complex
- convergence: E_t[f(y, theta, t)] = f(y, theta)
- same value on measurable sets
$f(\theta_1, ., \theta_k, ., \theta_n) = f(\theta_1,., \theta'_k,., \theta_n)$ E_{}\theta_k/\theta_k' := E_{p(\{\theta)\}\theta_k}\theta_1, \theta_2, ..., \theta_n)$
## optimal verification
algorithm to test/bring to the system optimal
- what is the state of a ruined gambler? Send N gamblers and average their state indexed at their ruined time ($\alpha_1..N$) i.e. E[state[$\alpha_i$]]
- ergodic: time average on the orbit of x converges to the space average of f: $\lim _{k \rightarrow+\infty}\left(\frac{1}{k+1} \sum_{i=0}^{k} f\left(T^{i}(x)\right)\right)=\int_{X} f d \mu$
- while($f(\theta_1,..,\theta_p)$ converge)
- repeat $\theta_i$ with $E[\theta_i]$
- i <- i + 1(mod p)
- convr's invisible hand (Adam smith) failed..
- with the help of`time` or`random` as the n+1th player which gives the effect of substituting z with Ez, can the system refine itself with iterated update and converge to system optimal?
- genetic algorithm, multiverse, digital twin,
- EM algorithm:
generated data holds information of the model which is constructed with real data and intention. Iterated refinement the system would reach the
2. definition of robust prior
I am trying to argue that widest self-consistent prior is robust;
Is robust having a unique solution under variances given certain axioms (from your paper online advertising)? What
self-consistency axiom: unique solution under symmetry, loc-scale-invariant (efficiency and linearity? - computation)
The relative entropy D_KL (p || q) measures <font color ="blue"> how many bits per symbol are wasted </font> by using a code whose implicit probabilities are q, when the ensemble’s true probability distribution is p.
![](https://i.imgur.com/KYhkgKd.png)
3. relation with variational free energy
3 agents with 2 dms
- motivation for change (adopt block chain)
![](https://i.imgur.com/5WDu8I4.png)
transfer welfare from the third to two dms
- (bidding - dist.robust mechanism design)
would transfer happen as a result of iterated update (not overseeing force but from local entity)
counterfactual data holds info.
- game theory