DNBP Simplified

# DNBP Simplified # Index [TOC] - If you are unable to understand few terms in this hack or want to know more, refer this hack https://hackmd.io/@thrib513/rJ7HEsxhJg # Markov Random Field (MRF) Representation ## Graph - Undirected Graph $G=\{V,E\}$ - V - Vertices - E - Edges ```json= "graph": [[0, 1, 1, 2], [1, 0, 2, 1]], ``` - The first row represents source nodes. - The second row represents destination nodes. - Each column represents an edge between two nodes. ```json= "edge_set": [0, 2], ``` - This specifies the indices of edges that are considered for belief propagation. - In this case, Edge 0 (0 → 1) and Edge 2 (1 → 2) are included ```json= "inc_nghbrs": [[1], [0,2], [1]], ``` - This lists the neighbors for each node: - Node 0 → Connected to Node 1 ([1]). - Node 1 → Connected to Nodes 0 and 2 ([0, 2]). - Node 2 → Connected to Node 1 ([1]) - (b) is MRF ![image](https://hackmd.io/_uploads/ByQ6Lpzaye.png) - Circle represents Node (Random Variable) - $Y$ grey represents observed variable - $X$ white represents unobserved variable - Each edge represents pairwise relationship between 2 random variables in V ## Joint probability distribution for Graph G $$ P(X,Y)=\frac 1 Z \space \prod_{(s,d) \in E} \space \psi_{sd}(X_s,X_d) \space \prod_{(d) \in V} \space \phi_{d}(X_d,Y_d) $$ - $X=\{X_d|d \in V\}$ - set of unobserved variables - $Y=\{Y_d|d \in V\}$ - set of observed variables - Z - normalizing constant - $\phi_d$ - **Unary potential**, describes the compatibility(likelihood) of $X_d$ with $Y_d$ - $\psi_{sd}$ - **Pairwise potential**, describes the compatibility(likelihood) of neighboring varibles $X_s$ and $X_d$ ## Belief Propagation (BP) - BP provides an algorithm for inference of marginal posterior distribtion $\left(P(X|Y)\right)$ known as beliefs $bel_d(X_d)$. - BP designs message passing algorithm for calculation of beliefs $$ bel_d(X_d) \propto \phi_d\left(X_d,Y_d\right) \prod_{s\in p(d)} m_{s\to d}\left(X_d\right) \tag{eq.2} $$ - p(d) denotes set of neighboring nodes of d - message from node s(source) to d(destination) is defined as: $$ m_{s\to d} \left(X_d\right) = \int_{X_s} \phi_s(X_s,Y_s) \psi_{sd}(X_s,X_d) \times \prod_{u\in p(s)} m_{u\to s}(X_s) dX_s \tag{eq.3} $$ - Performing this on continuous space is `intractable (need efficient algorithm for message passing)` # Differentiable Non Parametric Belief Propagation (DNBP) - DNBP is new method for BP that aims to improve how to handle uncertainity in estimates by efficiently approximating marginal posterior distributions in MRF. - For potentials, instead of designing functions for problem specific,`Neural Networks` are used for making it more flexible and adaptable - DNBP uses an iterative, differentiable message passing schemem to infer beliefs over hidden variables in an MRF - DNBP approximates belief & message in Eq 2&3 at iteration t by sets of `N & M weighted particles`. $$ \begin{align} \text{bel}_{d}^{t}(X_d) &= \left\{ \left( \mu_d^{(i)}, w_d^{(i)} \right) \right\}_{i=1}^{N} \\ m_{s \to d}^{t} &= \left\{ \left( \mu_{sd}^{(i)}, w_{sd}^{(i)} \right) \right\}_{i=1}^{M} \end{align} $$ where: - $\mu$ represents particle - particle information is basically `position coordinate` (x,y) - $w$ represents weight for various modes described below - **PULL Strategy** is defined for message passing.: - A message, $m^t_{s\to d}$ is generated by first sampling M independent samples from $bel^{t-1}_d(X_d)$ then resampling and reweighting from this set. # Loss Function - Generate density estimate with weighted gaussian kernels - Function in `diffBP/networks/dnbp_synthetic/dnbp.py/density_estimation(line 645)` 1. **Density Estimation Equation** - The density of the belief can be expressed as a mixture of Gaussians, with a component centered at each particle. $$ l(x_i|\theta)=\sum^N_{j=1}w_{\theta,i,j}\times\frac 1 {\sigma\sqrt{2\pi}}\times \exp\left(\frac{-1} 2 \left(\frac {x_i-p_{i,j}}{\sigma}\right)^2 \right) $$ - Where: - $x$ is **ground truth** (position of node) - $i$ is **node id** - $j$ is **particle id** - $p_{i,j}$ is **belief particles** of a node i - $\theta$ represents mode: - **belief weights likelihood** (belief_weights_lik (or) w_lik) - **message weights unary** - **message_weights_neigh** (message weights neighborhood) - **belief weights** - This is not used (in code although it exists) as - belief weights = normalized (belief_weights_lik * message_weights_unary * message_weights_neigh)) - $w$ represents weight(value) of a particular mode $\theta$ for $j^{th}$ particle id of $i^{th}$ node. 2. **Final Loss** - The final loss is for a particular node of a sequence. Loss Resets to 0 for calculating for other node - Basically **calculate mean over all modes losses**. $$ L_{i}=\frac 1 N \sum^N_{k=1} -log(l(x_i|\theta_k)+10^{-50}) $$ - Where - N=4 - k represents mode($\theta$) described above in density estimation - i represents node id ## Loss function formulation in paper - Supervised training is used - Objective function: $$ \begin{align} \overline{bel}_{d}^{t}(x_d) &= \sum_{i=1}^{N} w_{d}^{(i)} \cdot \mathcal{N}(x_d; \mu_{d}^{(i)}, \Sigma) \\ L_{d}^{t} &= -\log\left(\overline{bel}_{d}^{t}(x_{d}^{t,*})\right) \end{align} $$ - where, - t is time/frame number - d represents node id - i represents mode - w represents weight for that particular mode of that node id i - $x_d$ is ground truth - $\mathcal N$ represents Gaussian eqn ![image](https://hackmd.io/_uploads/By9D8ue6Jl.png) # Particle Information # Modes $\left (\theta\right)$ / Weight Types ## 1. belief_weights_likelihood - belief_weights_lik - Shape(batch_size, num_neighbors,particle count) - This stores likelihood computation for beliefs ![image](https://hackmd.io/_uploads/Bycxh_la1l.png) <details> <summary>Initialization Code</summary> ```python= # diffBP/networks/dnbp_synthetic/dnbp.py self.belief_weights_lik = [torch.ones(self.batch_size, len(self.inc_nghbrs[i]), self.particle_count).to(device=self.device).type(self.type) / (len(self.inc_nghbrs[i]) * self.particle_count) for i in range(self.num_nodes)] ``` </details> - Used in **belief_update** function 1. Pass global featues ($Y_d$) & message particles ($X_d=\mu_{sd}^{(i)}$) into node likelihood (**Unary Potential Function** $\phi_d$) to get message likelihood (message) 2. Normalize message likelihood to get belief_weights_lik ![image](https://hackmd.io/_uploads/SksUyOlpke.png) (from 2. **Belief Update**) ## 2. message weights unary - message_weights_unary ($w_{unary}$) - Shape (batch_size, num_neighbors,particle count) - This stores unary density - ![image](https://hackmd.io/_uploads/SkJUjdeT1l.png) <details> <summary>Initialization Code</summary> ```python= # diffBP/networks/dnbp_synthetic/dnbp.py # diffBP/networks/dnbp_synthetic/dnbp.py self.message_weights = [torch.ones(self.batch_size, len(self.inc_nghbrs[_]), self.particle_count).to(device=self.device).type(self.type) / (len(self.inc_nghbrs[_]) * self.particle_count) for _ in range(self.num_nodes)] ``` </details> - Updated in **message_update** - ![image](https://hackmd.io/_uploads/SkdSGOxTke.png =60%x) (from **1. Message Update**) 1. Resampling Particles - A fraction of the belief particles (frac_resamp) are resampled to improve performance. - Uses either low-variance resampling **(IMPORTANCE SAMPLING)** or random resampling. - Resampled particles are perturbed by a small time delta to add diversity. - The remaining particles are replaced with new random samples <details> <summary>Code</summary> ```python= if self.mode=='low_var': rand_chance = ((torch.arange(resamp_particles) / float(resamp_particles)).repeat(self.batch_size,len(self.inc_nghbrs[dst_]), 1) + (torch.rand(self.batch_size, len(self.inc_nghbrs[dst_]), 1) / float(resamp_particles))) else: rand_chance = torch.rand(self.batch_size, len(self.inc_nghbrs[dst_]), resamp_particles) # batch x incoming x 1 x resamp_particles rand_chance = rand_chance.unsqueeze(2).type(self.type).to(device=self.device) ``` </details> 2. Computing Message Unary Weights ($w_{unary}$) - Likelihood scores for particles are calculated using the likelihood network. - Scores are normalized to get probability values. <details> <summary>Update Code</summary> ```python= for src_i, src_ in enumerate(self.inc_nghbrs[dst_]): # Determine edge index of message pass from src_->dst_ edge_i = ((self.graph == torch.tensor([[min(src_,dst_)], [max(src_,dst_)]])).all(dim=0).nonzero().squeeze(0) == self.edge_set).nonzero().squeeze(0) # Isolate outgoing particles from src_->dst_ for w_unary calculation # These outgoing particles are in the frame of dst_ msgs = self.message_particles[dst_][:,src_i].contiguous().view(-1,1,self.particle_size) # Generate delta samples to translate particles from dst_ frame to src_ frame if self.multi_edge_samplers: if src_ < dst_: s = self.edge_samplers[edge_i][0](msgs.shape[0]*num_int) else: s = self.edge_samplers[edge_i][1](msgs.shape[0]*num_int) else: if src_ < dst_: s = self.edge_samplers[edge_i](msgs.shape[0]*num_int) else: s = -self.edge_samplers[edge_i](msgs.shape[0]*num_int) # Translate particles into src_ frame using sampled deltas samples = msgs + s.view(msgs.shape[0],num_int,-1) # Change view for feeding into network samples = samples.view(msgs.shape[0]*num_int,-1) # Concatenate features with particles and feed into likelihood network of src_ # Remember, samples are now in src_'s frame # This is the key step that causes us to turn off likelihood gradients; we don't want the src_ likelihood # factor learning to weight particles based on feedback from dst_ ground truth if self.shared_feats: wgts = self.node_likelihoods[src_](torch.cat((samples, glbl_feats.unsqueeze(1).repeat((1, self.particle_count * num_int, 1)).view(self.batch_size * self.particle_count * num_int, -1)), dim=1)).view(self.batch_size, self.particle_count, num_int) else: wgts = self.node_likelihoods[src_](torch.cat((samples, glbl_feats[src_].unsqueeze(1).repeat((1, self.particle_count * num_int, 1)).view(self.batch_size * self.particle_count * num_int, -1)), dim=1)).view(self.batch_size, self.particle_count, num_int) # Average over the num_int samples wgts = wgts.mean(dim=-1) # Normalize scores of outgoing particles, these are the w_unary scores wgts = wgts / wgts.sum(dim=1,keepdim=True) # Store unary scores for use later self.message_weights_unary[dst_][:,src_i] = wgts ``` </details> ## 3. message weights neighborhood - message_weights_neigh - Shape (batch_size, num_neighbors,particle count) - This stores neighbor messages ![image](https://hackmd.io/_uploads/SJyqodlaye.png) <details> <summary>Initialization Code</summary> ```python= # diffBP/networks/dnbp_synthetic/dnbp.py self.message_weights_neigh = [torch.ones(self.batch_size, len(self.inc_nghbrs[i]), self.particle_count).to(device=self.device).type(self.type) / (len(self.inc_nghbrs[i]) * self.particle_count) for i in range(self.num_nodes)] ``` </details> - Updated in message_update - ![image](https://hackmd.io/_uploads/r1sqL_eTJl.png) (from 1. **Message Update**) - Computing Neighbor Weights ($w_{neigh}$) - The effect of neighboring nodes on message particles is computed. - For each neighboring node: - The function finds the correct edge index in the graph. - Translates particles into the neighbor’s frame using **Edge Densities** (**pairwise density function** $\psi_{sd}$) . - Computes probability scores based on the likelihood network. - Normalizes and stores the final neighbor influence weights. <details> <summary>Update Code</summary> ```python= # diffBP/networks/dnbp_synthetic/dnbp.py self.message_weights_neigh = [torch.ones(self.batch_size, len(self.inc_nghbrs[i]), self.particle_count).to(device=self.device).type(self.type) / (len(self.inc_nghbrs[i]) * self.particle_count) for i in range(self.num_nodes)] ``` </details> # Message Update ![image](https://hackmd.io/_uploads/r1VP2ch2Jg.png) # Belief Update ![image](https://hackmd.io/_uploads/ryFf6qh3Jl.png)