# Week 15 — Update
## TL;DR
Refactored topic memory management to use an interned pool in zig-libp2p.
PR: https://github.com/zen-eth/eth-p2p-z/pull/83/commits/c8876911681cd479b31d8b94fd56afca8982f973
---
## What is an interned pool (short)
An interned pool stores one unique immutable instance per distinct value (usually strings or bytes). On request, it returns a handle/reference to the canonical instance:
- If the value already exists, return a reference to that instance.
- Otherwise insert it into the pool and return a reference.
This transforms repeated, equal values into identical references (pointer-equality), saving memory and speeding equality checks.
---
## Why topics are a good fit
- Topics are immutable identifiers referenced often (publish/subscribe, mesh/fanout maps).
- Many peers use the same topic strings, causing repeated allocations and repeated byte-by-byte comparisons.
- Interning makes topic keys cheap to compare (pointer equality) and reduces total heap allocations for topic strings.
---
## Design overview
Goals:
- Provide a safe, concurrent-friendly, low-overhead pool for topic strings/byte-slices.
- Expose a compact handle type (TopicRef) that is cheap to copy and compare.
- Ensure deterministic behavior for tests (e.g., seeded RNG for candidate selection), and clear lifecycle rules (when interned entries may be reclaimed).
Basic API (conceptual)
- TopicRef = opaque handle (pointer-sized)
- interned_pool.init()
- pool.intern(value: []u8) -> TopicRef
- pool.lookup(value: []u8) -> ?TopicRef
- pool.release(ref: TopicRef) -> void // optional if using refcounting / weak refs
- pool.len() -> usize
- pool.for_each(callback)
Choices to make:
- Ownership model: strong references with refcount, or weak intern table (never reclaim unless explicit GC), or arena with manual clear on restart.
- Concurrency: lock-free map, striped locking, or global mutex.
- Key representation: store canonical bytes or an immutable Zig slice + length.
---
## Pseudocode
Interning API (conceptual):
```pseudo
// opaque handle to canonical topic data
type TopicRef = pointer
function intern(pool, value):
hash = hash_bytes(value)
bucket = pool.table[hash % table_size]
lock(bucket)
if entry := bucket.find_equal(value):
unlock(bucket)
return entry.ref
else:
canonical = allocate_copy(value)
ref = make_topic_ref(canonical)
bucket.insert(canonical, ref)
unlock(bucket)
return ref
```
Optional refcounted release:
```pseudo
function release(pool, ref):
// decrement refcount, and when reaches 0 remove entry from table and free
```
Usage in publish/receive paths:
```pseudo
topic_ref = pool.intern(topic_bytes)
mesh_map[topic_ref] = set_of_peers
fanout_map[topic_ref] = set_of_peers
// comparison: if (topic_ref1 == topic_ref2) // pointer eq
```
---
## Sequence diagram
```mermaid
sequenceDiagram
participant App
participant Router
participant Pool as interned_pool
App->>Router: publish(topic_bytes, P)
Router->>Pool: intern(topic_bytes) -> topicRef
Router->>Router: use topicRef as map key (mesh/fanout)
```
---
## Memory / lifecycle considerations
Options:
- Never-evict pool: simplest; safe but entries live for process lifetime (good if number of distinct topics is bounded).
- Refcounted entries: free entries when nobody holds them (requires careful locking and atomic refcounts).
- TTL / LRU eviction: track last-use time and evict rarely-used topics (adds complexity).
- Arena & reset: allocate interned entries in an arena, free all on restart (fast but coarse).
Recommendations:
- Start simple: implement a process-lifetime pool with deduplication and a small global mutex or sharded locks. This avoids subtle bugs from refcounting and GC during initial rollout.
- If memory proves an issue, add optional refcounting or LRU eviction behind a feature flag.
---
## Concurrency considerations
- Use sharded hash table or per-bucket mutexes to reduce contention on high throughput paths.
- Use atomic pointer swaps where possible for lock-free reads of canonical references.
- Ensure intern() is safe when two threads intern the same value concurrently (only one canonical instance must be inserted).
- Avoid long holds on global locks; prefer small critical sections around lookup/insert.
---
## Implementation checklist
1. Create InternedPool module with:
- init/free functions
- intern(value) -> TopicRef
- optional lookup(value) -> ?TopicRef
- debug/metrics (pool size, hits/misses)
2. Replace topic-key usage:
- convert maps (mesh, fanout, peers.gossipsub) to use TopicRef keys
- update publish/forward/join/leave code paths to call pool.intern(topic) early
3. Add sharded locking or lightweight synchronization
4. Add debug helpers to print top-k topics by reference count (if refcounting used) or usage
5. Add feature flags for eviction/refcounting if desired
---
## Tests & benchmarks
Unit tests:
- interning identity: intern(same_bytes) returns same TopicRef (pointer equality)
- concurrent intern: N threads intern same topic concurrently => single canonical instance
- map-key behavior: using TopicRef as key in mesh/fanout yields same semantics as string keys
Integration tests:
- publish/subscribe flows still work with TopicRef keys
- memory usage: after many repeated intern() for the same topic, memory doesn't grow
Benchmarks:
- baseline: measure allocations and CPU for publish/receive using raw strings
- optimized: measure same with interned pool
- measure latency impact and lock contention at target QPS
Leak detection:
- run with ASAN/valgrind or Zig leak detection to ensure no unintended leaks (if using refcounting, ensure release paths execute).
---
## Migration & rollout plan
1. Implement pool and unit tests behind a feature flag or compile switch.
2. Add static instrumentation (hits/misses, pool size) and run local perf tests.
3. Replace topic key usage in a branch and run integration tests.
4. Deploy to canary or long-running test nodes and monitor memory use.
5. If stable, remove feature flag and merge.
---
## Next
Start refactoring publish and receive code paths to call pool.intern(topic) early, add the unit and concurrency tests above, and run the benchmark comparisons. After that, iterate on an eviction/refcount strategy if memory numbers indicate the need.