Daily Notes in CRG at ORI

# Research plan ## Meeting 21/11/2023 • Fixed bug in previous pgp implementation and https://github.com/QingbiaoLi/gnn_trajpre/blob/main/models/aggregators/pgp.py • Fix bug in utils in encoder that slow down data loading https://github.com/QingbiaoLi/gnn_trajpre/blob/main/models/decoders/utils.py • Adjust config file for fixed bug vector net https://github.com/QingbiaoLi/gnn_trajpre/blob/main/configs/polyline_pgp_lvm_traversal.yml • Implement visualiser to visualise the result https://github.com/QingbiaoLi/gnn_trajpre/blob/main/train_eval/visualizer.py ![image](https://hackmd.io/_uploads/r16EKYq46.png) ## Updates on 20 Nov 2023. • Fix bug on a baseline model called vectornet (https://arxiv.org/abs/2005.04259) https://github.com/QingbiaoLi/gnn_trajpre/blob/main/models/encoders/polyline_subgraph.py • Line up experiments |Graph-based|Agent-node Attention|Graph Network|Min_ade_5|min_ade_10|miss_rate_5|miss_rate_10|Experiment ID|Status| |---|---|---|---|---|---|---|---|---| | [x] |[ ]|[ ]|---|---|---|---|---|Running| | [x] |[x]|[]|---|---|---|---|---|Running| |[x] |[x]|GNN|---|---|---|---|---|TODO| | [x] |[x]|GAT|1.29|0.97|0.55|0.37|---|Finished| |[x] |[x]|GAT-pygeo|1.26|0.94|0.53|0.34|---|Finished| ## Meeting 13/11/2023 min_ade_5: 1.29 min_ade_10: 0.97 miss_rate_5: 0.55 miss_rate_10: 0.37å pi_bc: 1.87 | | Min_ade_5 | min_ade_10 | miss_rate_5 | miss_rate_10 | | --- | --------- | ---------- | ----------- | ----------------| | traj++ | 1.88 | 1.51 | 0.7 | 0.57 | | gnn pre | 1.29 | 0.97 | 0.55 | 0.37 | 1.88 1.51 0.70 0.57 9.52 0.25 # Social trajectory prediction using Graph Neural Networks ### Problem formulation The objective is to predict the future paths of specific vehicles by analyzing their historical trajectories alongside the surrounding environmental context. **Input**: includes the prior route taken by the target vehicle and the contextual information of the scene. * let $𝑠^{i}_{t}\in R^6 = [𝑥^𝑖, 𝑦^𝑖, 𝑣^𝑖, 𝑎^𝑖, 𝜔^𝑖, 𝐼^𝑖]$ to represent the state of agent 𝑖 at time 𝑡, including its two-dimensional position in the BEV-plane, $𝜏^{i}_{t}= [𝑥^𝑖, 𝑦^𝑖]$ with the velocity $𝑣^𝑖$, acceleration $𝑎^𝑖$, yaw rate $𝜔^𝑖$, and a flag $𝐼^𝑖$ indicating the type of agent, where pedestrian ($M^𝑖 = 1$) or vehicle ($M^𝑖 = 0$). * let $c^{n} = [𝑥^𝑛, 𝑦^𝑛, \theta^{n}, M^n]$ to represent the lane centerlines with a fixed length segmentm, where each segment consists of a sequence of 𝑁 points. Each lane point 𝑛 includes its two-dimensional position $[𝑥^𝑛, 𝑦^𝑛]$, the yaw $\theta^𝑛$, and a semantic binary vector $M^𝑛$ indicating whether the point is a stop line $(M = [1, 0, 0, 0])$, turn stop $(M = [0, 1, 0, 0])$, crosswalk $(M = [0, 0, 1, 0])$, or traffic light $(M = [0, 0, 0, 1])$. **Objective**: forecast $p$ for upcoming time intervals $𝑇_{obs}$ < 𝑡 ≤ $𝑇_{pred}$  ### Methodology Our approach involves conceptualizing traffic scenarios as graphs, where nodes symbolize the distinct agents and immobile objects present in the scene, while edges signify the connections that depict their interactions. Employing a universal scene representation in the form of a graph provides the model with greater adaptability across diverse environments and road layouts, enabling it to leverage the expressive capabilities of Graph Neural Networks (GNNs). ![](https://hackmd.io/_uploads/SJKDswXga.png) **Graph definition** Hence, we define the whole scene context as a directed heterogeneous graph $G = {V, E, T, R}$. Each node $𝑣 \in V$ has a type, formally defined by a mapping function $𝜏(𝑣) : V → T$, and each edge $𝑒 \in E$ has a type, formally defined by a mapping function $\phi(𝑒): E → R$. Therefore, T denotes the set of node types and R the set of relations types. VΩ is the set of nodes of type $Ω \in T$ and $N^Ω = \{Γ|Γ, Ω \in T, (Γ, Ω) \in R\}$ denotes the neighborhood of Ω, being $Γ \in N^Ω$ a neighbor type of Ω. The interaction from Γ to Ω is denoted as <Γ, Ω> Notation: * T = {lane, vehicle, pedestrian}, * R = {succ, prox, v2l, v2v, p2v} – defining successor, proximal, vehicle-tolane, vehicle-to-vehicle and pedestrian-to-vehicle interactions, * The adjaceny matrix $𝐴^{Ω−Γ} \in R^{|V^Ω |× |V^Γ|}$, where each element s^{ij} at row i and column j indicate whethere node i and node j are connected. Note that the adjacency matrix may not be square for heterogeneous relations,since $|V^Ω|$ may not be equal to $|V^Γ|$. ### Model Architecture #### Scene and agent context encoding Gated recurrent unit (GRU) is introduced to to encode each object type. We utilize distinct encoders for the central agent, neighboring agents (with the input features specifying the agent type as either a vehicle $Z^{v}$ or pedestrian $Z^{p}$), and lane nodes $Z^{l}$. Atttention-based **Inter object level aggregation** via GATv2 is used to process all the hidden representation of neighbor objects within a common semantic space. For each canonical edge type in $E$, a subgraph is considered for the aggregation, with source V^Ω, destination V^{Γ} and relation R^{Ω−Γ}. We then compute a pair-wise un-normalized attention score between each two neighbors in the form of additive attention, and normalize them across all the neighborhood of Ω using the softmax function, $\alpha_{ij}=Softmax_{j}(\alpha^{T} LeakyReLU([W_{z_{i}}||W_{z{j}}]))$ where $𝑧$ is the node encoding, $𝑊$ is the weight matrix that parametrizes a shared linear transformation applied to every node, $𝑎$ defines the self-attention mechanism that indicate the relative importantce of the node j features to node i, $·𝑇$ represents transposition, and || is the concatenation operation. **Type-level** attention via GATv2 is deployed to fuse representation from **different types** of neighboring object representations. To compute the importance of the representations of neighbor types, including the self representation, for $V^Γ$, we linearly project the hidden representation from the previous step, ${H^{Ω−Γ}} ∪ {H^Γ}$, into keys, and {H^Γ} into the query. The values will be the output from the previous step. **Type-level** attention is computed as following: $e^Γ = ELU([H^Γ W^Γ_{k}||H^Γ W^Γ_{q}] w^Γ)$ $e^{Ω-Γ} = ELU([H^{Ω-Γ} W^Γ_{k}||H^Γ W^Γ_{q}] w^Γ)$ Finally, a softmax function is applied to get the normalised attention coefficient, $\alpha^{Γ}_{i}$ and $\alpha^{Ω-Γ}_{i}$ $\hat{H}^Γ_{i}= ELU(\alpha^Γ_{i} {H}^Γ_{i} + \sum_{Ω \in N}\alpha^{Ω-Γ}_{i} {H}^{Ω-Γ}_{i} )$ [] GATv2 https://arxiv.org/pdf/2105.14491.pdf #### Graph traversal and decoding The final representation for the focal agent, $Hˆ𝑣_0$, is concatenated with its encoding previous the aggregation, $𝑍^v_0$. This output, together with the representation for the lane nodes $H^𝑙_0$ is used to learn a policy for graph traversal using behavior cloning. ### Experiment **Dataset** In our experiments, we utilize the publicly available nuScenes dataset, a substantial dataset for autonomous driving created by the Motional team. This dataset comprises 1000 scenes, each spanning 20 seconds, and includes both ground-truth annotations and high-definition (HD) maps. These scenes were collected in both Boston and Singapore, where different traffic regulations (right-hand and left-hand traffic rules) are in effect. In our prediction scenario, we rely on 2 seconds of historical data from dynamic agents and incorporate map features to forecast the subsequent 6 seconds. Due to the dataset's geographic diversity and its comprehensive representation of intricate situations such as turns and intersections, nuScenes stands as one of the most demanding benchmarks for prediction tasks. **Evaluation metrics** We conduct a quantitative assessment of the motion prediction model, focusing on nuScenes' motion prediction benchmark. Our model generates 10 predictions for the central agent, projecting their motion 6 seconds into the future at a rate of 2Hz. Additionally, it provides the probability that the agent will follow each of these trajectories. The evaluation employs several metrics from the nuScenes benchmark to gauge how closely this set of predicted trajectories aligns with the actual ground-truth trajectories: * Minimum Average Displacement Error (minADE): This metric computes the smallest point-wise L2 distance between the predicted trajectories and the ground-truth trajectory across all predicted trajectories. * Miss Rate (MR) with the Best-of-K Criteria: MR calculates the percentage of predictions where the maximum point-wise L2 distance between the prediction and ground-truth exceeds 2.0 meters. * Off-road Rate: This metric quantifies the proportion of trajectories that venture off-road, meaning they extend beyond the drivable area. --- **Timeline**: - 06 Aug 2023 to 13 Aug 2023: * - Method/Approach: * - Problem formulation - 13 Aug 2023 to 20 Aug 2023: - 20 Aug 2023 to 27 Aug 2023: - Working towards Initial Draft. - 27 Aug 2023 to 31 Sep 2023: - Initial Draft - 03 Sep 2023 to 31 Aug 2023: - Towards submission - Dataset: - nuScence: - Tools: python, pytorchGeometric - Metrics: - [ ] Implmentation: - [ ] 30 Aug 2023 - [ ] Initial result: - [ ] 1st Sep 2023 **Key contribution** * Efficiency: Graph Memory Piror instead of LSTM https://proceedings.mlr.press/v168/morad22a/morad22a.pdf * Interpreability: interchangable piror * Targeted Venue Soft deadline: * ICRA: Sep 15, 2023 In Between: * RA-L journal submission Hard deadline: * CVPR: 3rd Nov, 2023 Reference paper: Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance https://hoy021.github.io/projects/ Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation https://openreview.net/forum?id=xC-68ANJeK_ ## Meeting 13/06/2023 Reading list: VectorNet https://arxiv.org/pdf/2005.04259.pdf Large Scale Interactive Motion Forecasting for Autonomous Driving https://arxiv.org/pdf/2104.10133.pdf Deep Occupancy-Predictive Representations for Autonomous Driving https://arxiv.org/pdf/2303.04218.pdf DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets https://arxiv.org/pdf/2108.09640.pdf Wayformer: Motion Forecasting via Simple and Efficient Attention Networks https://arxiv.org/pdf/2207.05844.pdf https://waymo.com/open/data/motion/ ## Meeting 05/05/2023 * Meeting with Daniel * Open CARLA Challenges * Get involved with Daniel's project * https://github.com/opendilab/InterFuser/ * https://github.com/bradyz/2020_CARLA_challenge * Potential noise of sensor or malfunction in real world * Sunlight? Bright light - white * Fog * Have access to the Oxford Oxbotica dataset * Meeting with Samuel * Potential collaboration? * Graph discovery? ## FAQ command ``` docker run --rm -it -v /Volumes:/Volumes -v $PWD:/workspace --ipc=host --runtime=nvidia trajectron /bin/bash python process_data.py --data=/workspace/shared/divya-qingbiao-datasets/ --version="v1.0-trainval" --output_path=../processed ``` ``` python train.py --eval_every 1 --vis_every 1 --conf ../experiments/nuScenes/models/robot/config.json --train_data_dict nuScenes_train_full.pkl --eval_data_dict nuScenes_val_full.pkl --offline_scene_graph yes --preprocess_workers 10 --batch_size 256 --log_dir ../experiments/nuScenes/models --train_epochs 20 --node_freq_mult_train --log_tag _robot --incl_robot_node --map_encoding ``` ## Meeting 03/05/2023 * Meeting with Divia * Question/To-Do list: - [x] What dataset is used? * ETH/UCY dataset - [x] Objectives of this work towards CORL 2023/RAL: - whether graph-based interaction or memory can contribute to - transparent/logic/understandable/less-parameter of Nueral Network. - Reading: https://arxiv.org/pdf/2106.14117.pdf - how it compare with Kalmen filter or network? * Repo - [X] How to access the computation power with Tom - hostname: farscape.eng.ox.ac.uk - username: qingbiaoli - password: WeekItalyWith45% - port 22 - [x] Download dataset from traj++ (https://github.com/dthuremella/Trajectron-plus-plus/tree/eccv2020) - [x] Set up Weight and Bias - [] Potential extensions: Chat with ricardo with causal, vcae ## Meeting 02/05/2023 * Multi - traj * Prediction of * Social content * Dataset -> ### CBG reading group  * Arrange joint meeting with Divya * * work together * Relationship between agents goal assumption? * CORL Deadline * May 15, 2023 | Paper submission open * June 8 | Submission and Supplementary materials deadline * RA-L jounral * Human-centre AI workshop #### Reading list: * STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction * Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks * Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction * Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data (the baseline method I'm using now)