[Causal Discovery with Attention-Based Convolutional Neural Networks](https://www.mdpi.com/2504-4990/1/1/19)

# [Causal Discovery with Attention-Based Convolutional Neural Networks](https://www.mdpi.com/2504-4990/1/1/19) *MDPI, 2019, machine learning and knowledge extraction open access journal* ###### tags: `references` 1. Problem statement: Given a dataset $X$ containing $N$ observed continuous time series of the same length $T$ (i.e., $X = \left\{ \ X_1, X_2, ...,X_N \right\}$ $\in$$\mathbb{R}^{N\times T}$) the goal is to discover the causal relationships between all $N$ time series in $X$ and the time delay between cause and effect ![](https://i.imgur.com/6Z5BFoQ.png) 2. Workflow: ![](https://i.imgur.com/H5ExHiL.png) 3. Model architecture ![](https://i.imgur.com/9rGMyMf.png) 4. Model implementation: * [Temporal Convolutional Network (TCN)](https://medium.com/@cyeninesky3/%E6%99%82%E9%96%93%E5%8D%B7%E7%A9%8D%E7%B6%B2%E7%B5%A1-tcn-%E9%97%9C%E6%96%BC%E5%BE%9E%E9%A2%A8%E6%8E%A7%E9%A0%85%E7%9B%AE%E7%95%B6%E4%B8%AD%E7%9A%84%E5%AD%B8%E7%BF%92-11693d762f5) based * Adaption for Multivariate: Since TCN is for univariate time series modeling, the study modified TCN to a one-dimensional depthwise separable architecture in which the input time series stay separated. The TCDF is consist of $N$ channels, each for an input series. * Attention mechanism: * Eacn network $N_j$ has its own attentions $a_{j} = \left\{ \ a_{1,j}, a_{2,j}, ...,a_{N,j} \right\}$ * The initialization for each attention scores $a_{i,j}$ is 1 * The learned $a_{i,j}$$\in$$R$ * Tranform $a_{i,j}$ into causalities: $$ h_{ij}=\left\{\begin{aligned}sigma(a_{ij}) & & \ if & & a_{ij}\geq t_j\\0 & & \ else \\\end{aligned}\right.$$ $t_{j}$ is determined by the largest gap between $a_{j}$. * The set potential cause $P_j$ contains $X_i$ whose $h_{ij} > 0$ ![](https://i.imgur.com/QH0Hpk1.png) 5. Permutation Importance Validation: for each $X_i$$\in$$P_j$, * Randomly permute $X_i$ to form a new dataset $I_i$ * Denote $L^1_G$ as the loss of epoch1, $L^{final}_G$ as the loss in the final epoch, where $G$ is the original dataset * $\triangle$$L_G=L^1_G-L^{final}_G$, $\triangle$$L_{I_{i}}=L^1_G-L^{final}_I$ * if $\triangle$$L_{I_i}<0.8\triangle$$L_G$, $X_i$ is a true cause of $X_j$ 6. Delay discovery: for a true cause $X_i$ of $X_j$, from the $i th$ channel of model $N_j$, ![](https://i.imgur.com/BHbKCC4.png)