# Context Model Comparison
## Summary
All three implementations expose a device-visible implicit state that serves as the default context; only rocshmem provides explicit context creation/destruction on both host and device. NVSHMEM and ISHMEM expose workgroup-scoped behaviors (team duplicates or group leader patterns) without explicit context handles. A unified spec likely needs a Tier-0 implicit-context model and a Tier-1 explicit-context model with optional per-context resources (e.g., QPs).
## Implicit contexts
| Aspect | ishmem | nvshmem | rocshmem | Notes for osm-gpu-aux |
|---|---|---|---|---|
| Exists | Yes (global device state via `global_info`) | Yes (global device state `nvshmemi_device_state_d`) | Yes (`ROCSHMEM_CTX_DEFAULT`) | Baseline should require a device-visible default context. |
| Created on host | Yes (`ishmemi_memory_init` sets `global_info`) | Yes (`nvshmemi_init_device_state`) | Yes (default context proxy + `set_internal_ctx`) | Define host init responsibility for default context. |
| GPU-visible by default | Yes (device global pointer) | Yes (device constant state) | Yes (`__device__` default ctx symbol) | Specify device visibility guarantees for default context. |
| Stored in GPU global memory | Yes (device USM allocation) | No (constant memory) | Yes (device global symbol) | Allow implementation-defined storage (global vs constant). |
| Grid-wide visibility | UNVERIFIED | Yes (global constant state) | Yes (global device symbol) | Tier-0 should require grid-wide access; note ISHMEM UNVERIFIED. |
| Customization allowed | UNVERIFIED | UNVERIFIED | No public API (UNVERIFIED) | Decide whether default context is configurable or fixed. |
## Explicit contexts
| Aspect | ishmem | nvshmem | rocshmem | Notes for osm-gpu-aux |
|---|---|---|---|---|
| Exists | UNVERIFIED (no API observed) | UNVERIFIED (no API observed) | Yes (`rocshmem_ctx_create`, `rocshmem_wg_ctx_create`) | Tier-1: explicit contexts optional; Tier-0: not required. |
| Device-visible handle | No (UNVERIFIED) | No (UNVERIFIED) | Yes (`rocshmem_ctx_t` on device) | Define handle validity on device for Tier-1. |
| Workgroup specialization | No (UNVERIFIED) | Partial (team duplicates for collectives, not contexts) | Partial (WG create; options unused UNVERIFIED) | Distinguish WG-scoped semantics vs explicit contexts. |
| Per-context resources (GDA-like) | No (UNVERIFIED) | Partial (IBGDA QPs exist, not tied to contexts) | Yes (GDA per-context QP arrays; RO per-context state) | Tier-1 can allow per-context resources; Tier-0 must not require them. |
## Granularity and lifecycle
- Granularity differences: ISHMEM/NVSHMEM provide global implicit state; NVSHMEM adds per-workgroup team duplicates; rocshmem supports global default plus explicit WG contexts.
- Lifecycle differences: ISHMEM/NVSHMEM initialize implicit state at init and tear down at finalize; rocshmem creates a device context pool at init and supports explicit create/destroy.
- Primary constraints: ISHMEM/NVSHMEM lack explicit context APIs (UNVERIFIED) and rely on implicit state; rocshmem explicit contexts are bounded by `max_num_contexts` and WG create is collective within a block.
## IPC vs GDA
- Which implementations are closer to IPC: ISHMEM uses IPC handles for symmetric heap; NVSHMEM default context is IPC-like at API level; rocshmem IPC backend is IPC-like.
- Which are closer to GDA: rocshmem GDA backend uses per-context QP arrays; NVSHMEM IBGDA provides device-side QP resources (not exposed as contexts).
- What the spec must abstract: Allow both uniform (IPC-like) contexts and resourceful (GDA-like) contexts without exposing transport-specific details in Tier-0.
## Implications for a unified spec
- Tier-0 baseline proposal: Require a device-visible implicit context with grid-wide access and a host-defined lifetime; no explicit context creation required; allow workgroup-scoped behavior without explicit handles.
- Tier-1 advanced proposal: Support explicit context creation (host + device), optional workgroup contexts, and optional per-context resources (e.g., QPs) with defined lifetime and handle validity on device.
- Required conformance tests:
- T-CTX-001: implicit context is device-visible and usable in a kernel without explicit context creation.
- T-CTX-002: explicit context create/destroy works on host (Tier-1).
- T-CTX-003: workgroup context creation is collective and yields a usable handle (Tier-1).
- T-CTX-004: per-context resource isolation (e.g., independent ordering/quiet) if per-context resources are exposed (Tier-1).