--- title: HP Discrete time formulation --- **Objective**: To model the overall influence of each venue in a latent network of computational linguistic venues from the timestamped usages of semantic innovations as cascades on this latent network. We model a cascade as a multivariate [Hawkes process](https://mathworld.wolfram.com/HawkesProcess.html) --- a special type of inhomogenous Poisson process, where the instantaneous rate of events depends on the history of past events, each of which can "excite" future events. The instantaneous rate of events in Poisson processes (and point processes, in general) is determined by a time-varying intensity function. The outline of this note is as follows. * First, we'll start with the assumption that a set of venues (or sources) emit events whose timestamps are recorded in continuous time. We'll parameterize the intensity function as dyadic influences between the sources. We'll call this as **C**ontinuous time **V**anilla **H**awkes **P**rocess (CVHP) model. * Next, we'll relax the assumption of continuous time and extend the vanilla HP model to discrete time. We'll call this as **D**iscrete time **V**anilla **H**awkes **P**rocess (DVHP) model. * Then, we'll extend DVHP model to include overall individual influence instead of dyadic influence. Effectively, this is a coarsened model since it aggregates pairwise influences. We'll call this as **D**iscrete time **C**oarsened **H**awkes **P**rocess (DCHP) model. * Finally, we'll adjust DCHP model to account for some venues not being active at some of the discrete times. We'll call this **A**djusted DCHP (A-DCHP) model. ## Basic setup and notation {%hackmd b-i7Qp9rTDaurB_IhqJ3Cw %} ## Continuous time Vanilla HP (CVHP) In this model, there is one and only one event at a given timestamp. Hence, we see discrete events but in continuous time. ### Intensity In the multivariate case, the intensity for a source $m \in \{1\dots M\}$ is defined as: $\lambda_{m} (t)=\mu_{m} + \sum\limits_{\{i:t_i < t\}}^{N}\alpha_{m_i \rightarrow m}~\kappa(t - t_i)$, where $\mathbf{\mu} \in \mathbb{R}^{M}$ and $\mathbf{\alpha} \in \mathbb{R}^{M \times M}$ are the parameters of the model. The intensity per source $m$ can be interpreted as an addition of a constant base rate ($\mu_m$) and the total "excitation" from the sources of past events on $m$ ($\alpha_{m_i \rightarrow m}$) weighted by the time decay. ### Log likelihood The log likelihood is given as: $$ \begin{equation} \mathcal{L(\mathbf{\theta)}} = \sum\limits_{i=1}^{N}\log\left(\lambda_{m_i}(t_i)\right) - \sum\limits_{m=1}^{M}\int\limits_{0}^{T}\lambda_m(s)ds, \end{equation} $$ where $\mathbf{\theta}=\{\mathbf{\mu},\mathbf{\alpha}\}$. The likelihood can be thought of as a function whose value increases every time a source gets activated but falls while the sources fail to get activated in the interval $[0,T)$. $$ \begin{align} \int\limits_{0}^T\lambda_m(s)ds&=\int\limits_{0}^T\left(\mu_{m} + \sum\limits_{\{i:t_i < s\}}^{N}\alpha_{m_i \rightarrow m}~\kappa(s - t_i)\right)ds \\ &= \int_{0}^{T}\mu_m~ds + \int_{0}^{T}\sum\limits_{\{i:t_i < s\}}^{N}\alpha_{m_i \rightarrow m} \kappa(s - t_i)ds \\ &=\mu_mT + \Lambda_m(0,T) \end{align} $$ **Notes** * The overall integral $\Lambda_m(0,T)$ can be broken down into a sum of piecewise integrals bounded by the event timestamps, _i.e._, $$ \Lambda_m (0,T) = \Lambda_m (0,t_1) + \Lambda_m (t_1, t_2) + \Lambda_m (t_2, t_3) + \dots + \Lambda_m (t_n,T) $$ * $\Lambda(t_{n-1}, t_{n})$, _i.e._, the integrated intensity between two consecutive timestamps is: $$ \begin{align} \Lambda_m (t_{n-1},t_{n}) &= \int\limits_{t_{n-1}}^{t_n}\sum\limits_{\{i:t_i < s\}}^{N}\alpha_{m_i \rightarrow m} \kappa(s - t_i)ds \\ &= \int\limits_{t_{n-1}}^{t_n}\sum\limits_{\{i:t_i \le t_{n-1}\}}^{N}\alpha_{m_i \rightarrow m} \kappa(s - t_i)ds \\ &= \sum\limits_{\{i:t_i \le t_{n-1}\}}^{N}\alpha_{m_i \rightarrow m}\left(\kappa(t_{n-1}-t_i) - \kappa(t_n-t_i)\right) \end{align} $$ * Adding consecutive integrated intensities leads to further simplification canceling out all but the boundary $\kappa(\cdot)$ evaluations. The overall integrated intensity is given as: $$ \Lambda_m(0,T) = \sum\limits_{i=1}^{N}\alpha_{m_i \rightarrow m}\left(1 - \kappa(T-t_i)\right) $$ Now the complete log likelihood in its glorius form can be rewritten as follows. $$ \begin{align} \mathcal{L}(\{m_i,t_i\}_{i=1}^{N};\mathbf{\mu},\mathbf{\alpha}) = \sum\limits_{i=1}^{N}\log\left(\mu_{m_i} + \sum\limits_{\{j:t_j<t_i\}}^{N}\alpha_{m_j \rightarrow m_i}\kappa\left(t_i - t_j\right)\right) \\ - \sum\limits_{m=1}^{M}\left(\mu_mT + \sum\limits_{i=1}^{N}\alpha_{m_i \rightarrow m}\left(1 - \kappa(T-t_i)\right)\right) \end{align} $$ ## Discrete time Vanilla HP (DVHP) One assumption in CVHP is that there can only be one event at a distinct timestamp. We'll now consider if multiple events occur at the same timestamp. Let's assume that all events are distributed over $n$ timestamps, _i.e_, $\{ t_1,t_2,t_3,\dots,t_n \}$. As a first cut, assume two discrete consecutive events that happen at $t_{i-1}$ and $t_i$ (so a single event at each timestamp exactly as the CVHP setting). The intensities for a source $m$ at these timestamps are as follows. $$ \begin{align} \lambda_m(t_{i-1}) &= \mu_m + \sum\limits_{t_j < t_{i-1}}\alpha_{m_j \rightarrow m}~\kappa(t_{i-1} - t_j) \\ &= \mu_m + \zeta(m,t_{i-1}) \\ \\ \lambda_m(t_{i}) &= \mu_m + \sum\limits_{t_j < t_i}\alpha_{m_j \rightarrow m}~\kappa(t_i - t_j) \\ &= \mu_m + \zeta(m,t_{i}) \\ \end{align} $$ Notice the definition of $\zeta(\cdot,\cdot)$. The relationship between $\zeta(m,t_{i-1})$ and $\zeta(m,t_{i})$ is given as follows: $$ \begin{align} \zeta(m,t_{i}) &= \sum\limits_{t_j < t_{i}}\alpha_{m_j \rightarrow m}~\kappa(t_{i} - t_j) \\ &= \left(\sum\limits_{t_j<t_{i-1}}\alpha_{m_j \rightarrow m}~\kappa(t_i-t_j)\right) + \alpha_{m_{t_{i-1}}\rightarrow m}\kappa(t_i - t_{i-1}) \\ &= \left(\sum\limits_{t_j<t_{i-1}}\alpha_{m_j \rightarrow m}~\kappa(t_i-t_{i-1}+t_{i-1}-t_j)\right) + \alpha_{m_{t_{i-1}} \rightarrow m}\kappa(t_i - t_{i-1}) \\ &= \left(\sum\limits_{t_j<t_{i-1}}\alpha_{m_j \rightarrow m}~\kappa(t_{i}-t_{i-1})\kappa(t_{i-1}-t_j)\right) + \alpha_{m_{t_{i-1}} \rightarrow m}\kappa(t_{i} - t_{i-1}) \\ &= \left(\kappa(t_i-t_{i-1})\sum\limits_{t_j<t_{i-1}}\alpha_{m_j \rightarrow m}~\kappa(t_{i-1}-t_j)\right) + \alpha_{m_{t_{i-1}} \rightarrow m}\kappa(t_i - t_{i-1}) \\ &= \kappa(t_i-t_{i-1})\zeta(m,t_{i-1}) + \alpha_{m_{t_{i-1}} \rightarrow m}\kappa(t_i - t_{i-1}) \\ &= \kappa(t_i-t_{i-1})\left(\zeta(m,t_{i-1}) + \alpha_{m_{t_{i-1}} \rightarrow m}\right) \\ \end{align} $$ If a single event happens at $t_{i-1}$ then the above expression stands. If multiple events have $t_{i-1}$ as their timestamp then the above expression gets modified to: $$ \zeta (m, t_i) = \kappa(t_i - t_{i-1}) \left(\zeta(m,t_{i-1}) + \sum\limits_{t^{'}=t_{i-1}} \alpha_{m_{t^{'}}\rightarrow m}\right) $$ * The summation of excitation accounts for all events whose timestamps are $t_{i-1}$. * The intensity for any source $m$ can now be calculated iteratively by updating $\zeta$. * The integral in the log likelihood can be replaced by a stepwise integral or a sum of intensities at these discrete times. ## Discrete time Coarse HP (DCHP) We'll replace $\alpha_{m_{t^{'}}\rightarrow m}$ by $b_{m_{t^{'}}}\cdot c_{m} + s_{m_{t^{'}}}1(m_{t^{'}} = m)$ so that the intensity equation becomes: $$ \lambda_m (t_i) = \mu_m + \kappa(t_i - t_{i-1}) \left(\zeta(m,t_{i-1}) + \sum\limits_{t^{'}=t_{i-1}} b_{m_{t^{'}}}\cdot c_{m} + s_{m_{t^{'}}}1(m_{t^{'}} = m)\right) $$ Here, $\mathbf{b} \in \mathbb{R}^{M}$ are the influence transmission parameters; $\mathbf{c} \in \mathbb{R}^{M}$ are the influence reception parameters; and $\mathbf{s} \in \mathbb{R}^{M}$ are the self-excitation parameters. ## Adjusted Discrete time Coarse HP (A-DCHP) Finally, it's possible that not all venues are active at all discrete times: for example, if some venues are not held every third year such as NAACL. One way to adjust for this possibility is by defining a per source discretization. As before, let the events be distributed over discrete timestamps $\{ t_1,t_2,t_3,\dots,t_n \}$. But let's assume that $t_{i}^{(m)}, \forall i=\{1,\dots, n\}$ are the discrete timestamps at which the source $m$ is activated. The adjustment can be made by rewriting the way the integral is calculated in the log-likelihood. We can rewrite the integrated intensity as: $$ \Lambda_m (0,T) = \Lambda_m (0,t_1^{(m)}) + \Lambda_m (t_1^{(m)}, t_2^{(m)}) + \Lambda_m (t_2^{(m)}, t_3^{(m)}) + \dots + \Lambda_m (t_n^{(m)},T) $$ **Notes**: * The crucial difference now is that the overall integral in the log-likelihood is not made up of pieces evaluated at every discrete time ($t_i$) but instead only at the times when the source $m$ was active ($t_i^{(m)}$). * We would also want to do the same modification to the base rate which would change from $\mu_mT$ to $\mu_m\sum\limits_{i=1}^{N}\mathbb{1}(t_i^{m})$; _i.e._, the base rate contributes to the integral only at the times when the source is active. * We calculate the integral numerically --- because the closed form is not possible --- using the iterative updates from DCHP. ###### tags: `HP models` `PhD` `css` `science of science` `semantic leadership`