# Context Model Comparison ## Summary All three implementations expose a device-visible implicit state that serves as the default context; only rocshmem provides explicit context creation/destruction on both host and device. NVSHMEM and ISHMEM expose workgroup-scoped behaviors (team duplicates or group leader patterns) without explicit context handles. A unified spec likely needs a Tier-0 implicit-context model and a Tier-1 explicit-context model with optional per-context resources (e.g., QPs). ## Implicit contexts | Aspect | ishmem | nvshmem | rocshmem | Notes for osm-gpu-aux | |---|---|---|---|---| | Exists | Yes (global device state via `global_info`) | Yes (global device state `nvshmemi_device_state_d`) | Yes (`ROCSHMEM_CTX_DEFAULT`) | Baseline should require a device-visible default context. | | Created on host | Yes (`ishmemi_memory_init` sets `global_info`) | Yes (`nvshmemi_init_device_state`) | Yes (default context proxy + `set_internal_ctx`) | Define host init responsibility for default context. | | GPU-visible by default | Yes (device global pointer) | Yes (device constant state) | Yes (`__device__` default ctx symbol) | Specify device visibility guarantees for default context. | | Stored in GPU global memory | Yes (device USM allocation) | No (constant memory) | Yes (device global symbol) | Allow implementation-defined storage (global vs constant). | | Grid-wide visibility | UNVERIFIED | Yes (global constant state) | Yes (global device symbol) | Tier-0 should require grid-wide access; note ISHMEM UNVERIFIED. | | Customization allowed | UNVERIFIED | UNVERIFIED | No public API (UNVERIFIED) | Decide whether default context is configurable or fixed. | ## Explicit contexts | Aspect | ishmem | nvshmem | rocshmem | Notes for osm-gpu-aux | |---|---|---|---|---| | Exists | UNVERIFIED (no API observed) | UNVERIFIED (no API observed) | Yes (`rocshmem_ctx_create`, `rocshmem_wg_ctx_create`) | Tier-1: explicit contexts optional; Tier-0: not required. | | Device-visible handle | No (UNVERIFIED) | No (UNVERIFIED) | Yes (`rocshmem_ctx_t` on device) | Define handle validity on device for Tier-1. | | Workgroup specialization | No (UNVERIFIED) | Partial (team duplicates for collectives, not contexts) | Partial (WG create; options unused UNVERIFIED) | Distinguish WG-scoped semantics vs explicit contexts. | | Per-context resources (GDA-like) | No (UNVERIFIED) | Partial (IBGDA QPs exist, not tied to contexts) | Yes (GDA per-context QP arrays; RO per-context state) | Tier-1 can allow per-context resources; Tier-0 must not require them. | ## Granularity and lifecycle - Granularity differences: ISHMEM/NVSHMEM provide global implicit state; NVSHMEM adds per-workgroup team duplicates; rocshmem supports global default plus explicit WG contexts. - Lifecycle differences: ISHMEM/NVSHMEM initialize implicit state at init and tear down at finalize; rocshmem creates a device context pool at init and supports explicit create/destroy. - Primary constraints: ISHMEM/NVSHMEM lack explicit context APIs (UNVERIFIED) and rely on implicit state; rocshmem explicit contexts are bounded by `max_num_contexts` and WG create is collective within a block. ## IPC vs GDA - Which implementations are closer to IPC: ISHMEM uses IPC handles for symmetric heap; NVSHMEM default context is IPC-like at API level; rocshmem IPC backend is IPC-like. - Which are closer to GDA: rocshmem GDA backend uses per-context QP arrays; NVSHMEM IBGDA provides device-side QP resources (not exposed as contexts). - What the spec must abstract: Allow both uniform (IPC-like) contexts and resourceful (GDA-like) contexts without exposing transport-specific details in Tier-0. ## Implications for a unified spec - Tier-0 baseline proposal: Require a device-visible implicit context with grid-wide access and a host-defined lifetime; no explicit context creation required; allow workgroup-scoped behavior without explicit handles. - Tier-1 advanced proposal: Support explicit context creation (host + device), optional workgroup contexts, and optional per-context resources (e.g., QPs) with defined lifetime and handle validity on device. - Required conformance tests: - T-CTX-001: implicit context is device-visible and usable in a kernel without explicit context creation. - T-CTX-002: explicit context create/destroy works on host (Tier-1). - T-CTX-003: workgroup context creation is collective and yields a usable handle (Tier-1). - T-CTX-004: per-context resource isolation (e.g., independent ordering/quiet) if per-context resources are exposed (Tier-1).