**Summary of the key revisions:**
We thank all reviewers for the constructive feedback and suggestions. Please find a point-by-point response to how these suggestions have been addressed. The key revisions include:
1. **Inclusion of three more baselines:** We have incorporated comparison with three more baseline algorithms, namely, Graphwavenet, STGODE and LocaleGN. Frigate outperforms all of them.
2. **Inclusion of two more metric:** For a more holistic quantification of performance, we have added RMSE and symmetric MAPE.
3. **Clarifications:** We have clarified/addressed the various misunderstandings and presentation issues.
With these changes, we hope the reviewers will find our manuscript acceptable. We would be happy to engage further with the reviewers in case there are queries/suggestions to improve our work further.
**Rating: BR**
**Q1. The authors list limitations of existing research (based on three strong hypotheses). Then, the author mentioned the characteristics of inductive bias in dynamic graph prediction applications, but there is a lack of sufficient discussion or experimental comparison of related works, such as [1] Li M, Tang Y, Ma W. Few-Sample Traffic Prediction With Graph Networks Using Locale as Relational Inductive Biases[J]. IEEE Transactions on Intelligent Transportation Systems, 2022.**
*Response:* We have added three more baselines including the referred work above (denoted as LocaleGN). The table below presents the results. Frigate continues to outperform all baselines. The implementation of all baselines have been obtained from the respective authors.
**MAE Values:**
| **Model** | **Chengdu** | **Harbin** | **Beijing** |
|------------------|--------------------|------------------|------------------|
| Frigate | **3.547** | **73.559** | **5.651** |
| STNN [24] | 5.633 | 126.305 | 6.215 |
| DCRNN [13] | 4.893 | 124.109 | 6.122 |
| STGCN [27] | 5.712 | OOT | OOT |
| LocaleGN | 4.597 | 128.674 | 5.759 |
| GraphWavenet [20] | 5.308 | OOM | OOM |
| STGODE [6] | 4.693 | 122.493 | OOM |
In the above table OOM denotes "Out of Memory" and OOT denotes "Out of Time". With respect to the newly added baselines, GraphWavenet and STGODE fails to scale to large graphs since they perform expensive matrix multiplications over the adjacency matrix of the graph. On large road networks this imposes prohibitive demans on GPU memory. It is worth noting that the largest graph on which GraphWavenet and STGODE are evaluated in their respective papers contain 325 nodes and 1026 nodes respectively.
**Q2. The experiments in this paper are insufficient, even if some of the methods listed in Table 1 do not meet the conditions of inductive ability, the corresponding experimental results should be given in the experiment to facilitate the comparison of the proposed models from different perspectives.**
*Response:* As discussed above, we have now added three more baselines. The proposed algorithm Frigate continues to outperform all of them. In addition, we have now also measured accuracy using two additional metrics: RMSE and symmetric MAPE. The results are provided below. As visible, Frigate continues to outperform all baselines across all metrics.
Table: Performance of Frigate and all baseline algorithms measured using RMSE and symmetric MAPE. The lowest error for each dataset is highlighted in **bold**.
| Model | Dataset | MAPE | RMSE |
|--------------|---------|-----------------|-----------------|
| Frigate | Chengdu |**131.17 +- 3.39**|**5.934 +- 0.31**|
| Frigate | Harbin |**93.03 +- 1.89**|**105.012 +- 3.51**|
| Frigate | Beijing |**169.73 +- 0.45**|**11.872 +- 0.30**|
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| LocaleGN | Chengdu | 192.80 +- 0.19 | 8.573 +- 0.33 |
| LocaleGN | Harbin | 169.54 +- 1.43 | 207.358 +- 6.58 |
| LocaleGN | Beijing | 172.73 +- 0.33 | 12.103 +- 0.03 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STNN | Chengdu | 199.860 +- 0.00 | 9.501 +- 0.50 |
| STNN | Harbin | 199.682 +- 0.02 | 206.912 +- 6.93 |
| STNN | Beijing | 199.990 +- 0.01 | 12.819 +- 0.27 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| DCRNN | Chengdu | 138.171 +- 2.07 | 8.099 +- 0.48 |
| DCRNN | Harbin | 185.731 +- 0.68 | 204.750 +- 6.36 |
| DCRNN | Beijing | 195.564 +- 0.15 | 12.631 +- 0.24 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STGODE | Chengdu | 147.12 +- 2.25 | 7.725 +- 0.39 |
| STGODE | Harbin | 152.28 +- 1.27 | 198.811 +- 6.74 |
| STGODE | Beijing | OOM | OOM |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| GraphWavenet | Chengdu | 147.772 +- 1.81 | 8.889 +- 0.43 |
| GraphWavenet | Harbin | OOM | OOM |
| GarphWavenet | Beijing | OOM | OOM |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STCGN | Chengdu | 198.382 +- 0.00 | 9.472 +- 0.50 |
| STGCN | Harbin | OOT | OOT |
| STGCN | Beijing | OOT | OOT |
**Q3. Some key relevant literature has been ignored, such as the following two methods MTGNN and GMSDR: [1] Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). [2] Dachuan Liu, Jin Wang, Shuo Shang, and Peng Han. 2022. MSDR: Multi-Step Dependency Relation Networks for Spatial-Temporal Forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22).**
*Response:* We thank the reviewer for pointing out these works. We will cite and include them in our related work discussion.
**Q4. The paper needs to be well polished, for example, Figure 6 takes up a lot of space, but there is not much information.**
*Response:* Thank you for the constructive feedback. We will present figure 6 in a more concise form and use the extra space for more comprehensive related work discussion and analysis of the additional baselines discussed in Q1. In addition, we promise to do a thorough review of the entire paper and explore every possible mechanism to further improve the presentation quality.
If the reviewer has further recommendations for improving the presentation, we will be happy to engage in a discussion and incorporate all of them.
**Appeal to the reviewer:** We have added three more baselines and additional metrics to address Q1 and Q2. We will cite the referred works in Q3 and incorporate the presentation suggestions in Q4. We hope these changes will convince the reviewer of the merits of this work. **If the reviewer agrees, we humbly appeal to please consider increasing our rating.**
-------------------------
**Rating: Accept**
**The expression of the article is somewhat difficult to understand. Maybe it could be reorganized to make the article more smoothly.**
*Response:* Thank you for the constructive feedback. We will do a thorough re-evaluation of the presentation and simplify the expressions.
**This paper used DCRNN, STGCN and STNN as the baselines. Are there other deep learning methods that can be also used for this task? Please give more experimental analysis.**
*Response:* To make the empirical evaluation more comprehensive, we have added three more baselines. The table below presents the results. Frigate continues to outperform all baselines. The implementation of all baselines have been obtained from the respective authors.
**MAE Values:**
| **Model** | **Chengdu** | **Harbin** | **Beijing** |
|------------------|--------------------|------------------|------------------|
| Frigate | **3.547** | **73.559** | **5.651** |
| STNN [24] | 5.633 | 126.305 | 6.215 |
| DCRNN [13] | 4.893 | 124.109 | 6.122 |
| STGCN [27] | 5.712 | OOT | OOT |
| LocaleGN [citation below] | 4.597 | 128.674 | 5.759 |
| GraphWavenet [20] | 5.308 | OOM | OOM |
| STGODE [6] | 4.693 | 122.493 | OOM |
In the above table OOM denotes "Out of Memory" and OOT denotes "Out of Time". With respect to the newly added baselines, GraphWavenet and STGODE fails to scale to large graphs since they perform expensive matrix multiplications over the adjacency matrix of the graph. On large road networks this imposes prohibitive demans on GPU memory. It is worth noting that the largest graph on which GraphWavenet and STGODE are evaluated in their respective papers contain 325 nodes and 1026 nodes respectively.
*LocaleGN: Li M, Tang Y, Ma W. Few-Sample Traffic Prediction With Graph Networks Using Locale as Relational Inductive Biases[J]. IEEE Transactions on Intelligent Transportation Systems, 2022.*
--------------------------
**Rating: BR (has promised to revisit score if our answers are convicing)**
**Q1. There are vulnerabilities in Lemma 2. What do identical embeddings mean in line 635? (Does it mean that embedding torchs are the same?) Why DCRNN or STGCN would not be able to distinguish between them in line 655? This sentence is hard to convince, I agree that GCN layers have an over-smoothing problem, but this is different from being indistinguishable. In addition, GCNs are over-smoothed in some large graph tasks but seem to be unexplored and well-defined in the field of traffic prediction, which further increases my suspicion. It is better to give an experiment to explain over-smoothing in traffic flow prediction task. Why is the embedding of v1 [2,2]? Why do traffic patterns have strong positional context? Many studies have found that location-independent sensor traffic is also similar. Also, the following example is not very appropriate, residential districts and offices seem to be POI.**
*Response:*
* **Identical embeddings:** Let $\mathbf{h}_v$ and $\mathbf{h}_u$ be the embeddings of nodes $v$ and $u$ respectively. Identical embeddings means $\mathbf{h}_v=\mathbf{h}_u$. ***We will state this explictily in the revised version.***
* **Why DCRNN and STGCN would not be able to distinguish?** In Lemma 2 of GIN[21], it has been established that if the $\ell$-hop neighborhood of two nodes $u$ and $v$ are isomorphic, then they are guaranteed to have the same embeddings, i.e., $\mathbf{h}_v=\mathbf{h}_u$ under any $\ell$-layered message-passing GNN that aggregates representations of their neighbors. Since both DCRNN and STGCN use message-passing GNNs, lemma 2 of GIN applies to them. Consequently, they will not be able to distinguish nodes $v_1$ from $v_2$ in Fig. 4, i.e., it is guaranteed that $\mathbf{h}_{v_1}=\mathbf{h}_{v_2}$. Note that this is a ***provably correct assertion*** and hence does not require an experiment to validate. Lemma 2 of GIN is a seminal result that forms the core of the pursuit to develop more expressive graph neural networks. For further details, we point to the survey at [1]. ***To ease the comprehension of this discussion, we will include these references and include a more detailed version of the proof in the apendix.***
* **Over-smoothing:** The above result is unrrelated to over-smoothing. Over-smoothing is the phenomemon where the iterative message passing process used to update node representations leads to all nodes having similar or identical representations. We agree that over-smoothing is not common in traffic networks since traffic networks have large diameters. Over-smoothing is more common in small-world networks, where diameters are small. Hence, a deep GNN results in all nodes receiving similar messages and hence similar embeddings. In Fig. 7b, we empirically demonstrate that over-smoothing is not an issue with Frigate since the performance continues to improve till 10 layers.
* **Why is the embedding of $v_1$ $[2,2]$?** First, we note that $[2,2]$ is the **Lipschitz** embedding of $v_1$ and ***not*** the GNN embedding (this is mentioned in line 658). As defined in Def. 3, the Lipschitz embedding of a node is the distance to the anchors. In this figure, the anchors are nodes $a_1$ and $a_2$ (line 657). The embedding of $[2,2]$ follows from Def. 3. We note, however, that there is an error in the Lipschitz embedding of $v_2$. It should be $[5,5]$ instead of $[3,3]$. ***We apologize for this error and will correct it in the revised version.***
* **Why do traffic patterns have strong positional context?** The traffic pattern in a node is a function of both its neighborhood traffic as well as local factors (presence of PoIS, width of road, etc.), which may not be explicitly available to us. A message passing GNN is expressive enough to model only the neighborhood factors (as outlined above from lemma 2 of GIN). Through positional embedding, we make the model expressive enough to also pick-up local signals and ignore/consider neighborhood impact as a function of the node position. For example, consider two nodes that have isomorphic neighborhoods. However, the first node corresponds to an airport and the second corresponds to an office. With positional information, the GNN can differentiate between these nodes and learn that the position associated with the airport has a different correlation with its neighborhood than the node corresponding to office. ***We will include this discussion to more clearly surface the importance of positional information.***
* **Empirical impact of positional information:** To substantiate the impact of positional information, we point to Fig. 7a and the associated discussion in Sec 4.5. Here, we conduct an ablation study and show that if the positional embeddings are removed, then there is a significant drop in accuracy. This result empirically substantiates the importance of capturing positional information in node embeddings.
[1] Weisfeiler and Leman go Machine Learning: The Story so far. Christopher Morris, Yaron Lipman, Haggai Maron, Bastian Rieck, Nils M. Kriege, Martin Grohe, Matthias Fey, Karsten Borgwardt, Arxiv, last updated: March 2023.
**Q2. There are a few vulnerabilities in Lemma 3. The author seems to only discuss the expressive ability of spatial module, however, the spatio-temporal model is the joint operation of the spatial model (GCNs) and the temporal module (TCN or LSTM). In addition, the expressiveness of the spatiotemporal model does not seem to be a simple superposition of A(temporal module )+B(spatial module), so it is not convincing to infer that the proposed model is more expressive simply based on the strong expressiveness of the spatial module for traffic prediction task. In addition, I checked the original manuscript of GIN and agreed that the strong expressiveness of GIN lies in the injection aggregation, which can better distinguish the structure of the graph, so the performance is better on the graph classification task. I am not sure that it is still true for the node flow prediction task. Can the author provide relevant references or experiments?**
*Response:* We agree with the above comment that the expressivity analysis only applies to the GNNs employed by the various algorithms. The GNN models only the spatial components, while the temporal components are modeled using auto-regressive architectures such as GRU in DCRNN, LSTM in Frigate and, 1-dimensional causal CNNs in STGCN. Hence, based on higher-expressivity of the GNN, we cannot call the entire algorithm more expressive. We realize that this means we may have over-claimed in some places. We have done a thorough review of the text, and the key points are as follows:
* Lemma 3 in itself is fine since it only proves that the proposed GNN is as powerful as 1-WL. 1-WL only characterizes the spatial component.
* Corollary 1 should be rephrased. It should be altered as follows:
> The **GNN in** Frigate is strictly more expressive than **the GNNs in** DCRNN and STGCN.
* Our claims in the abstract is fine since it says:
> We prove that the proposed GNN architecture is provably more expressive than message-passing GNNs used in state-of-the-art algorithms.
* Lines 178-181 needs to be rephrased as follows:
> We establish that augmenting node representations with Lipschitz embeddings makes **the GNN** in Frigate provably more expressive **than solely message-passing GNNs used in existing forecasting methods such as DCRNN, STGCN, etc.**
* Remove the sentence "As we established...more expressive" in lines 863-864.
In terms of experiments, Table 3 shows that Frigate outperforms DCRNN, STGCN and STNN. We have now further enhanced the empirical comparison by incorporating three more baselines (LocaleGN, GraphWavenet, STGODE). The results are provided below. Note that the fact that positional embedding helps is established empirically in our ablation study (Sec 4.5)
**MAE Values:**
| **Model** | **Chengdu** | **Harbin** | **Beijing** |
|------------------|--------------------|------------------|------------------|
| Frigate | **3.547** | **73.559** | **5.651** |
| STNN [24] | 5.633 | 126.305 | 6.215 |
| DCRNN [13] | 4.893 | 124.109 | 6.122 |
| STGCN [27] | 5.712 | OOT | OOT |
| LocaleGN [citation below] | 4.597 | 128.674 | 5.759 |
| GraphWavenet [20] | 5.308 | OOM | OOM |
| STGODE [6] | 4.693 | 122.493 | OOM |
In the above table OOM denotes "Out of Memory" and OOT denotes "Out of Time". With respect to the newly added baselines, GraphWavenet and STGODE fails to scale to large graphs since they perform expensive matrix multiplications over the adjacency matrix of the graph. On large road networks this imposes prohibitive demans on GPU memory. It is worth noting that the largest graph on which GraphWavenet and STGODE are evaluated in their respective papers contain 325 nodes and 1026 nodes respectively.
*LocaleGN: Li M, Tang Y, Ma W. Few-Sample Traffic Prediction With Graph Networks Using Locale as Relational Inductive Biases[J]. IEEE Transactions on Intelligent Transportation Systems, 2022.*
**Q3. The experimental results do not involve the node embedding discussion, and the previous discussion has not been fully verified. For example, I fail to get any answer about the claimed point in lines 633-635 ‘Hence, given nodes……identical embedding’.**
*Response:* We have clarified this in the response to Q1 above. We would be happy to engage with the reviewer if any further clarification is needed.
**Q4. The necessary description of the dataset is missing. The experimental results only use one evaluation metric, and the baseline for comparison is not enough only two baselines (STGCN and DCRNN), it may be because the covered traffic models are insufficient in table I. For some approaches, such as if GraphWavenet removes a graph matrix, it can meet the conditions "Partial sensing" and "Absorb network updates" in the paper, and compare with FRIGATE.**
*Response:*
* **Comparison with GraphWavenet:** As mentioned in our response to Q2, we have added GraphWavenet, LocaleGN and STGODE as baselines. The codebases of all algorithms have been obtained from the respective authors. As visible in the presented table, Frigate continues to outperform all baselines.
* **Metrics**: We have reviewed all the related work again and identified MAE, RMSE and MAPE as the metrics used to evaluate performance of traffic forecasting. We only use MAE since RMSE and MAPE are highly correlated to MAE. Nonetheless, we present below the performance of Frigate and other baselines on these two additional metrics. The conclusion remains unchanged; Frigate continues to outperform all baselines across all datasets and metrics. We will include these tables in the appendix with a reference from Table 3 caption in the main paper.
Table: Performance of Frigate and all baseline algorithms measured using RMSE and symmetric MAPE. The lowest error for each dataset is highlighted in bold.
| Model | Dataset | MAPE | RMSE |
|--------------|---------|-----------------|-----------------|
| Frigate | Chengdu |**131.17 +- 3.39**|**5.934 +- 0.31**|
| Frigate | Harbin |**93.03 +- 1.89**|**105.012 +- 3.51**|
| Frigate | Beijing |**169.73 +- 0.45**|**11.872 +- 0.30**|
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| LocaleGN | Chengdu | 192.80 +- 0.19 | 8.573 +- 0.33 |
| LocaleGN | Harbin | 169.54 +- 1.43 | 207.358 +- 6.58 |
| LocaleGN | Beijing | 172.73 +- 0.33 | 12.103 +- 0.03 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STNN | Chengdu | 199.860 +- 0.00 | 9.501 +- 0.50 |
| STNN | Harbin | 199.682 +- 0.02 | 206.912 +- 6.93 |
| STNN | Beijing | 199.990 +- 0.01 | 12.819 +- 0.27 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| DCRNN | Chengdu | 138.171 +- 2.07 | 8.099 +- 0.48 |
| DCRNN | Harbin | 185.731 +- 0.68 | 204.750 +- 6.36 |
| DCRNN | Beijing | 195.564 +- 0.15 | 12.631 +- 0.24 |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STGODE | Chengdu | 147.12 +- 2.25 | 7.725 +- 0.39 |
| STGODE | Harbin | 152.28 +- 1.27 | 198.811 +- 6.74 |
| STGODE | Beijing | OOM | OOM |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| GraphWavenet | Chengdu | 147.772 +- 1.81 | 8.889 +- 0.43 |
| GraphWavenet | Harbin | OOM | OOM |
| GarphWavenet | Beijing | OOM | OOM |
|--- --- --- ---|--- --- --- |--- --- --- --- --- --- --- ---|--- --- --- --- ---|
| STCGN | Chengdu | 198.382 +- 0.00 | 9.472 +- 0.50 |
| STGCN | Harbin | OOT | OOT |
| STGCN | Beijing | OOT | OOT |
* **Dataset description:** We humbly point to Section 4.1 that discusses the datasets. In Table 2(a), we present various statistics of the datasets used. All datasets are real-world; both the trajectories of vehicles and the road network. If the reviewer feels any particular dataset aspect is missing from our description, we request the reviewer to please share it. We will be happy to include those details. Note that the datasets are also accessible from our anonymous code repo mentioned in the paper (https://anonymous.4open.science/r/Frigate-3BA5).
**Q5. Minor point. Lipschitz Embedding and Distortion seem be not your contributions, however, they still cost a large space. The spatial module of the proposed model seems to transfer GIN.**
*Response:* Lipschitz embedding and the related theory on distortion is a mechanism to embed metric spaces into vector spaces. This is not our contribution. Our contribution lies in **(1)** recognizing lipschitz embedding as a mechnism to encode positional information in unsupervised node embeddings, **(2)** the integration of positional embeddings with a GNN to capture both positional and neighborhood characteristics, and **(3)** the resultant consequences on theoretical expressivity of such a GNN.
With respect to GIN, we have only used lemma 2 from GIN paper to establish the limits of expressivity for message passing GNNs. **We do not use the architecture of GIN**. The proposed GNN architecture is different and makes three key innovations: **(1)** the usage of lipschitz embedding to charactrize nodes in layer 0, **(2)** the usage of *sigmoid gating* over positional embeddings to learn the long-range paths that affect the behavior of a node (Eq. 9-11), and **(3)** a *bi-directional convolution* operation (Eqs. 5-6 and 7-8).
**The above weakness seem prompt me that this paper is just a naive reference to some mature solutions for graph learning. I am open to the score, if the author could adress my concerns as much as possible.**
**Reponse:** We thank the reviewer for the constructive feedback. They indeed help us further improve our work. As clarified in Q5, the proposed architecture is not borrowed from GIN or any existing GNN architecture. In our humble opinion, the architecture of Frigate has several novel components that are developed specifically with traffic forecasting in mind. We substantiate the design with strong theoretical grounding in terms of expressive power, inductive ability to generalize to unseen nodes, and predictive power on nodes without any temporal history. These properties place Frigate in an unique position that most baselines do not satisfy (Table 1). Empirically, Frigate surpasses all baseline algorithms. **We hope the proposed changes, additional experiments, and clarifications would convince the reviewer of the merits of our work. If the reviewer agrees, we humbly appeal to please consider increasing the score.**
--------------------------------
**Rating: BA**
**The model should be compared with more state-of-the-arts. In the experiment, only 3 baseline models are compared. Even though some models like GMAN and STSGCN cannot deal with partial sensing and updated networks, they should also be compared with Frigate in terms of the capability of predicting without temporal history.**
*Response:* As suggested, we have added 3 more baselines, namely GraphWavenet, STGODE, and LocaleGN. We note here that STNN, which is one of the baselines we have compared to, have outperformed GMAN in the reported literature. Similarly, we have now included STGODE, which outperforms STSGCN. The table below presents the results. Frigate continues to outperform all baselines. The implementation of all baselines have been obtained from the respective authors.
**MAE Values:**
| **Model** | **Chengdu** | **Harbin** | **Beijing** |
|------------------|--------------------|------------------|------------------|
| Frigate | **3.547** | **73.559** | **5.651** |
| STNN [24] | 5.633 | 126.305 | 6.215 |
| DCRNN [13] | 4.893 | 124.109 | 6.122 |
| STGCN [27] | 5.712 | OOT | OOT |
| LocaleGN [citation below] | 4.597 | 128.674 | 5.759 |
| GraphWavenet [20] | 5.308 | OOM | OOM |
| STGODE [6] | 4.693 | 122.493 | OOM |
In the above table OOM denotes "Out of Memory" and OOT denotes "Out of Time". With respect to the newly added baselines, GraphWavenet and STGODE fails to scale to large graphs since they perform expensive matrix multiplications over the adjacency matrix of the graph. On large road networks this imposes prohibitive demans on GPU memory. It is worth noting that the largest graph on which GraphWavenet and STGODE are evaluated in their respective papers contain 325 nodes and 1026 nodes respectively.
*LocaleGN: Li M, Tang Y, Ma W. Few-Sample Traffic Prediction With Graph Networks Using Locale as Relational Inductive Biases[J]. IEEE Transactions on Intelligent Transportation Systems, 2022.*
**Why is POI information not included in this work? If POI is considered, would it contribute to the performance?**
*Response:* We agree that POI information may further benefit the results. Unfortunately, we do not have easy access to POI data on the datasets used (it is available in google maps, but it limits the number of API calls we can make). We will state this as an explicit direction for future work in our conclusion section.
**Appeal to the reviewer:** As suggested, we have made the empirical comparison more comprehensive by incorporating three more baselines. We hope this additional empirical data would convince the reviewer of the merits of our work. If the reviewer agrees, we humbly appeal to please consider increasing our rating.
<!--
A$\rightarrow$B means B is better than A.-->
---
**Reply to 2KPx**
**1. The technical novelty of this paper, this work is the combination of LSTM and GNN, where the node-wise correlation in GNN is instantiated with a new metric. The concern lies in whether the Lipschitz embeddings make sense in ST learning**
We present below the empirical impact of not using Lipschitz embedding to characterize nodes (more precisely in Eq. 12, we no longer concatenate with $\mathbf{L}_v$). There is a clear increase in the MAE indicating that Lipschitz embedding is indeed useful
| | Dataset | MAE |
|-|-|-
|Frigate|Chengdu (50%)|3.547|
|Frigate-without Lipschitz | Chengdu (50%) |4.775|
|Frigate|Harbin (50%)|73.559|
|Frigate- without Lipschitz | Harbin (50%) |79.077|
**2. Lacking many related literatures, especially on Ability to extrapolate from partial sensing:**
**[1] Real-time Traffic Pattern Analysis and Inference with Sparse Video Surveillance Information, International Joint Conference on Artificial Intelligence, (IJCAI), 2018.**
**[2] Towards Learning in Grey Spatiotemporal Systems: A Prophet to Non-consecutive Spatiotemporal Dynamics. arXiv:2208.08878.**
**[3] Inferring Intersection Traffic Patterns with Sparse Video Surveillance Information: An ST-GAN method., IEEE Transactions on Vehicular Technology, (TVT), 2022.**
**Above papers have discussed how to inference the missing values in road network with partial sensing data, and [2] further investigates the two non-consecutive prediction scenarios in ST learning.**
We thank the reviewer for pointing out important works that need to be considered as part of the literature survey. We highlight similarities and differences between our work and these works. Their characterization as per Table 1 of our manuscript is provided below.
* Both Frigate and [2] perform prediction when part of the temporal history is unavailable. However, [2] makes certain design choices that make it unsuitable for the studied problem scenario.
* First [2] maps each node into a learnable vector that semantically encodes its location. This makes [2] transductive since it directly learns these vectors instead of utilizing shared paramaters as in message-passing GNNs. The transductive nature of [2] prohibits it from functioning when new nodes are added to a road network
* [2] can forecast only if it has visibility across all nodes of the network. Thus, while it can accomodate partial temporal history on a node, it cannot forecast on nodes with no temporal history (this arises due to no sensor being placed on a node or permanent sensor failure)
* Both Frigate and [1] perform prediction on parts of road network that may not have observations at all. While our work is based on a GNN, [1] models traffic as a multi-variate normal distribution and ties the parameters of nearby road regions together. [1] remains transductive since parameter sharing is performed only among nearby nodes. Consequently it cannot absorb network updates.
* Frigate and [3] share the approach of having a GNN followed by an LSTM. However, the GNN arhictecture and the overall approach is significantly different. Specifically, [3] models the GNN as an auto-encoder that completes the missing graph.Furthermore, [3] is trained only to predict for one time step in future.
| Paper | Partial sensing | Absorb network updates | Predict without temporal history |
|------|-|-|-|
|[1]|Yes|No|Yes|
|[2]|No|No|No|
|[3]|Yes|Yes|Yes|
may perform prediction if parts of recent information across all nodes is available such that:
1. No entry is null across any sensor.
2. It is consecutive till the recent most sensing event.
This is like saying if some time-steps have missing values on any node then it is discarded, and prediction is made based on rest of the time-steps.
Whereas, Frigate performs a prediction with parts of recent information that may have:
1. Null entries for some nodes throughout the observation history.
2. Null entries for some nodes sporadically.
This is more general than what [2] does, and makes no assumptions.
Further [2] also performs uncertainty analysis which Frigate doesn't do. Also, we'd like to point out that [2] uses location embeddings, but these are trainable parameters attached to each node, which make the model transductive, unlike Frigate which is inductive.
Both Frigate and [1] perform prediction on parts of road network that may not have observations at all. [1] does this by modeling the traffic as a multi-variate normal distribution and tying the parameters of nearby road regions together. However, [1] does not utilise the k-hop neighborhood information through message passing like Frigate does.
Frigate and [3] have similar approach to prediction, having a GNN followed by an LSTM. But there are a few key differences:
1. [3] models the GNN as an auto-encoder that completes the missing graph. Thus, it is constrained to generate final node representations of the same dimension as the original graph. In contrast, Frigate works in the latent space that may contain more information.
[3] is trained only to predict for one time step in future, while Frigate is trained to predict sequences.
To further help the reviewer, we place the 3 suggested works in the original table. Note that even though [2] is able to make predictions without temporal history, it is not able to make predictions in the more general setting that we target.
---
**To 2KPx**
Experimentally verifying the number of isomorphic node neighborhoods encountered. We take Chengdu dataset since it is the smallest, and computing isomorphisms is an expensive task. We manually compute the 3-hop message passing neighborhood at every timestep and count how many neighborhood pairs are isomorphic. We choose 3-hop because both DCRNN and STGCN use 3-hop neighborhoods.
We observe that within Chengdu, there are 4798453 3-hop neighborhoods that are isomorphic based on the topology and the traffic volume. Note, there are total 7356672 3-hop neighborhoods.
Observations and possible explanations:
1. Nodes (junctions) in road network have a maximum out-degree of $5$ and average is $2.27$.
2. Traffic values ($tv$) are traffic volume, non-negative integers, bounded above. In the case of Chengdu, the maximum value is $78$.
3. $P(tv\le 10)\approx 84\%$ and $P(tv\le 1)\approx 53\%$.
4. There $1510$ (out of $3193$) nodes that have $P_n(tv\le 1)>50\%$ and across all timesteps $P_t(tv\le 4)>50\%$.
5. The above suggest that the few traffic volume values have higher chances of appearing in a message passing tree increasing the chance of getting isomorphic trees.
6. Traffic follows power-law distribution (see histogram table below for Chengdu). Only a few popular nodes have high traffic going through them. Most of the network is mostly empty, and thus more likely to be close to the 0-5 discrete label space, somewhat similar to GIN.
|number of nodes|total traffic volume at node|
|-|-|
|1749|3530|
|501|10590|
|337|17650|
|271|24710|
|153|31770|
|93|38829|
|50|45889|
|21|52949|
|12|60009|
|6|67069|
---
**To W9Ap**
**Comment:**
**The authors conduct more experiments and add more baselines. The results seems promising. But for the second question (POI info), the response is not convincing. It is hard to convince people that a model without taking any POI data would outperform those taking POIs. There are a few things the authors need to address:**
*Response:* The above comment seems to be a misunderstanding. We have not claimed anywhere in our paper or our earlier response that the baselines are using PoI data. Neither the baseline, nor our algorithm use baseline data and hence the comparison is fair. In this context, we also point out none of the forecasting methods we have cited in our work and the additional works pointed out by other reviewers during this discussion phase use PoI data.
**1. If this model works well without taking POIs data (at least can compete with the models which take POIs), the authors need to explain why this model can work well without POIs. Experiments may be added.**
*Response:* As clarified above, the baselines do not use PoI either.
**2. If this model can take POI data, how? (the POIs can be found in OpenStreetMap for free). Simply saying that lacking in the access of POI data cannot convince people that the proposed model is reliable and better than the existing works which take more information like POIs. At this stage, I cannot increase the rating.**
*Response:* We have implemented a version to incorporate PoI data and study its impact on the performance
**Methodology:** For every node in the graph, we download all the POIs within a radius of 100 meters from openstreetmap. In total, we obtain 37 types of POIs. The PoI feature vector characterizing each node is the min-max normalized count vector over PoI types in its 100m neighborhood (i.e., a 37-dimensional continuous-valued feature vector) These additional features are appended to the traffic volume and Lipschitz embeddings and passed to our model. The only modification required is to change the input dimension to the GNN in the encoder.
**Results:** The experimental results on Chengdu are presented below. We do not observe any significant deviation. While MAPE slightly improves, MAE and RMSE slightly deteriorates.
|Model configuration|Dataset|MAE|MAPE|RMSE|
|-|-|-|-|-|
|Frigate without POI|Chengdu|**3.547**|131.17|**5.934**|
|Frigate with POI|Chengdu|3.744|**130.07**|6.270|