HackMD - Collaborative Markdown Knowledge Base

**I have done a lot of work trying to convert your notation to the adjacency matrix. Your generation model in Section 3.2 essentially generates an adjacency matrix with a particular row order. I think a good practice is to follow notations established by previous work (GraphRNN, OM, and others). If you do so, then your derivation error becomes obvious.** We cannot use exactly the same notation of previous work (GraphRNN, OM). The previous works first use the node ordering $\sigma$ to permute the adjacency matrix and then autoregressively model the probability of the permuted adjacency matrix $p(A^{\sigma})$. In their notation, $A_{t}^{\sigma}$ represents the first $t$ rows of the permuted adjacency matrix $A^{\sigma}$. However, the permuted adjacency matrix $A^{\sigma}$ does not contain the exact node ordering information $\sigma$ since multiple node orders can lead to the same adjacency matrix. While in our model, we predict the edge connectivity of the exact node $v_{\sigma_t}$ at each step and thus generate/denoise the graph in the exact node order $\sigma$. An equivalent way to use adjacency matrix is to first fix the adjacency matrix $A$ with a random row order and assign each row with a node in the graph. Then we unmask the $\sigma_t$-th row of the fixed matrix $A$ at each step $t$, starting from the fully masked matrix $A_n$. $A_t$ represents rows $\sigma_i, i=t,\cdots,n$ are unmasked. **However, this is exactly the same as our graph notation** --- predicting the edge connectivity of node $v_{\sigma_t}$ in $G_n$ at each step. By replacing $G_t$ with $A_t$, we will obtain the same deriving procedure. Why can we use the exact node ordering information while previous work cannot? This is because, in our framework, the node ordering $\sigma_{1:n}$ is sampled in the forward diffusion process which essentially assigns a unique ID to each node of $G_0$. In the generation procedure, our starting point is the masked graph $G_n$ with $n$ nodes; our generator network sequentially denoises the node $v_{\sigma_t}$ in $G_n$ in the reverse order of the diffusion ordering during training. In contrast, the previous work can only sequentially grow the permuted adjacency matrix but such permuted matrix loses the exact node ordering information. **For example, you mention that $p(G_{1:n-1}|\sigma_{1:n})=0$ when $\sigma_{1:n} \notin \Pi(G_{1:n})$. However, with any node order $\sigma$, the generative model in section 3.2 can always generate a graph sequence $G_{1:n-1}$ but with nodes labeled according to $\sigma$.** Yes, every node ordering $\sigma_{1:n}$ can lead to a corresponding graph sequence $G_{1:n}$. However, during the marginalization, you need to integrate over all $\sigma_{1:n}$ for a particular graph sequence $G_{1:n}$. For $\sigma_{1:n}\notin \Pi(G_{1:n})$, $p(G_{1:n-1}|\sigma_{1:n})=0$ since this node ordering cannot lead to this particular graph sequence $G_{1:n}$. **Furthermore, you may need to explain this observation: if we use $p(G_{1:n})=(n!)p(G_{1:n},\sigma_{1:n})$, then your variational lower bound match exactly to that of (Chen et al., 2021).** First, as we illustrate before, your marginalization and your assumption about $p(\sigma_{1:n}|G_n)$ is incorrect. Therefore, we cannot obtain this equation. Second, even this equation was correct, you still could not obtain the exact VLB in Chen et al. Your VLb does not need to compute $|\Pi(G_{1:n})|$ but $n!$. ## Bringing important concerns about Reviewer dB1j Dear AC, We wish to bring your attention to our following concerns about Reviewer dB1j. We have discussed with reviewer dB1j about the derivation of VLB for multiple rounds and addressed every issue he/she proposed. However, he/she changed the score to 1 without convincing reasons which is extremely unfair. **"I find a critical derivation error in the submission. In particular, the submission bases its derivation on equation 8, which is from (Chen et al., 2021), but the term $p(G_{1:n},\sigma_{1:n})$ in this submission is different from that in the previous work, so the derivation is incorrect."** We have clearly illustrated in https://openreview.net/forum?id=98J48HZXxd5&noteId=5C49W4uFKP that the reviewer conducts an incorrect marginalization and thus obtains an incorrect conclusion. The major error the reviewer makes is that he/she assumes the generation probability $p(G_{0:n-1}|\sigma_{1:n})$ is the same for all $\sigma_{1:n}$. However, during the marginalization, we need to integrate over all $\sigma_{1:n}$ for a particular graph sequence. Following Chen et al., we define $\Pi(G_{1:n})$ as the set of all node orders that lead to the graph sequence $G_{1:n}$. When $\sigma_{1:n}\notin \Pi(G_{1:n})$, $p(G_{0:n-1}|\sigma_{1:n})=0$ since this node ordering cannot lead to this particular graph sequence. Therefore, their maginalization is incorrect. **"The notation system in this submission is not well defined. It should follow previous work and use adjacency matrices to derive probabilities."** Our notation system is well defined using the graph notation $G=(V,E)$ with node set $V=\{v_1,\cdots, v_n \}$ and edge set $E=\{(v_i,v_j)|v_i,v_j\in V \}$. We cannot use exactly the same notation of previous work (GraphRNN, OM). The previous works first use the node ordering $\sigma$ to permute the adjacency matrix and then autoregressively model the probability of the permuted adjacency matrix $p(A^{\sigma})$. In their notation, $A^{\sigma}_t$ represents the first $t$ rows of the permuted adjacency matrix $A^{\sigma}$. However the permuted adjacency matrix $A^{\sigma}$ does not contain the exact node ordering information since multiple node orders $\sigma$ can lead to the same adjacency matrix. While in our model, we predict the edge connectivity of the exact node $v_{\sigma_t}$ at each step and thus generate the graph in the exact node order $\sigma_{1:n}$. An equaivalent way to use adjacency matrix is to first fix the adjacency matrix $A$ with a random row order and assign each row with a node in the graph. Then we unmask the $\sigma_t$-th row of the fixed matrix $A$ at each step $t$, starting from the fully masked matrix $A_n$. $A_t$ represents rows $\{ \sigma_{i}\}_{i=t}^n$ are unmasked. **However, this is exactly the same as our graph notation** --- predicting the edge connectivity of node $v_{\sigma_t}$ in $G_n$ at each step. By replacing $G_t$ with $A_t$, we will obtain the same deriving procedure. To summarize, we strongly believe reviewer dB1j is biased and is not able to objectively evaluate our paper.