"Ghost in the Machine" hardware health Interface 280126

### "Ghost in the Machine" Project start date 012826 Last updated 012926 Repo https://github.com/mmazco/ghost-in-the-machine Live https://tensghost.up.railway.app/ A machine health interface for **Tenstorrent's Tensix cores telemetry data** to show performance of the hardware. Translateing raw tt-smi telemetry ([ttnn-visualizer](https://github.com/tenstorrent/ttnn-visualizer)) into a visual language for a wider audience to understand how hardware works, effectively 'humanizing' the silicon. The prototype is a React component styled like a retro 8-bit Tamagotchi in a minimal web-based system health ds. Instead of showing standard hardware graphs, it uses a 'ghost' character (the silicon soul essentially) whose mood and animations are directly mapped to real-time Tenstorrent telemetry data. #### Ghost states summary: - Healthy: Temp < 65 and L1 < 75% - Critical: Temp > 70 or L1 > 85% (indicating a potential bottleneck or thermal throttling) - Dead: Temp > 75 and L1 > 90% (system failure state) ![Screenshot 2026-01-28 at 18.19.29](https://hackmd.io/_uploads/BJJTXSO8bx.png) - **Heartbeat/energy:** tied to power consumption (Watts) via tt-smi. - **Temperature/stress:** tied to chip temperature (°C). High heat makes the ghost "sweat" or turn red. - **The "Belly" (SRAM Usage):** tied to L1 Memory Utilization. If a model uses a lot of on-chip memory, the ghost appears "full" or bloated. - **The "Spark" (NoC Activity):** animated "data sparkles" travel across a background grid representing the 2D Torus network of Tensix cores whenever a kernel is active. #### Essential docs TT-SMI docs: primary data source to pull real-time telemetry like **temperature, power, and clock speeds**. https://docs.tenstorrent.com/tools/index.html (MAIN) TT-NN visualizer guide: explains how Tenstorrent visualizes memory and operation flows across the cores—perfect for designing your background "Torus Grid." https://docs.tenstorrent.com/ttnn-visualizer/ Tenstorrent DevCloud: needs access! TT-Metalium key concepts: explains the Tensix cores and L1 (SRAM) memory hierarchy you are visualizing. https://docs.tenstorrent.com/tt-metal/latest/tt-metalium/index.html Other https://github.com/tenstorrent/tt-metal #### Demo UI prototype ![Ghost-States-Preview-01-28-2026_03_24_PM copy](https://hackmd.io/_uploads/S1-BTz_L-e.png) (Self-contained HTML file with all the animations - no server needed) file:///Users/maryammaz/Library/Application%20Support/Claude/local-agent-mode-sessions/e5afcf87-780f-422d-948c-005fb171f3cf/1b27ab5f-2f59-4ad8-a941-b5988b913828/local_e9605fcf-6748-434a-938d-d1160d55b9ec/outputs/GhostStatesStatic.html Port http://localhost:8080/TenstorrentTamagotchi.html ![image](https://hackmd.io/_uploads/rycAZXO8Zg.png) #### Implementation phases **Phase 1: UI Simulation ✅ (log 012826)** - Ghost SVG with 5 health states and visual indicators - 12×12 Core Activity Grid with active core highlighting - Operation timeline playback (8 ops @ 500ms) - Simulation controls (temp, power, L1, utilization sliders) - Auto/Manual health mode toggle - Toggle switch for Ghost view vs Grid-only view **Phase 2: real data simulation ✅ (log 012926)** - Loading real JSON telemetry reports - Switching between Simulation and Loaded Data modes - Dynamic health computation based on real data - Core activity visualization from op_timeline - Convert html to React .Nextjs and Tailwind CSS for styling ![Tenstorrent-Tamagotchi-Ghost-in-the-Machine-01-29-2026_01_12_PM](https://hackmd.io/_uploads/HyqiTHtIWe.png) #### User flow for loaded data 1. Upload your .json report file 2. Data loads - Drop zone turns green with "✓ Llama-7B Inference Run" - Shows chip type + op count (e.g., "Wormhole · 10 ops") - "Loaded Data ✓" button becomes active - Stats bar updates: TEMP, POWER, L1 from file - "Loaded Data" indicator appears 3. Press play - Ghost animates through the 10 operations from the file - Each operation lights up different cores on the grid - Ghost health reflects the loaded temp/L1 values 4. Switch back (optional) - Click "Simulation" to return to slider-controlled mode - Or drop a new file to replace **In the demo, some things to look out to analyze the health of the ghost.** **Sample-report.json presents:** - Temp: 58°C - L1 Usage: 80% (838860 / 1048576 bytes) - Utilization: 82% Result: **SICK** because L1 usage (80%) > 75% threshold *TLDR the memory pressure is high, even though the temperature is fine.* ![Tenstorrent-Tamagotchi-Ghost-in-the-Machine-01-29-2026_01_13_PM](https://hackmd.io/_uploads/B1ch6HKL-g.png) **Whats the difference between the 10 operations and the active cores?** **Operations (10 ops)** *The sequence of AI computations the model performs:* | Op | Name | What it does | | --- | ---- | ------------ | | 1 | embedding | Convert tokens to vectors| | 2 | matmul | Matrix multiplication| | 3 | layernorm| Normalize activations| | 4 | attention| Self-attention mechanism| | 5 | softmax| Probability distribution| | 6 | dropout| Regularization| | 7 | linear| Dense layer| | 8 | gelu | Activation function| | 9 | residual| Skip connection| | 10 | output| Final projection| Active Cores (varies per op) *The physical Tensix cores on the chip executing that operation:* | Operation| Cores used | Pattern | | -------- | -------- | -------- | | embedding| 6 cores |cluster| | matmul | 7 cores |diagonal| | linear | 8 cores |bottom rows| TLDR Operations = what's computing, Cores = where it's computing on the chip. ---- #### Tenstorrent glossary **The "Silicon Soul" (Hardware Terms)** Tensix Core: The fundamental building block of Tenstorrent’s AI processors. Each core contains five "baby" RISC-V processors, a compute engine for math, and its own local memory. RISC-V ("Baby Cores"): The open-source instruction set architecture (ISA) Tenstorrent uses. Inside each Tensix core, these five small processors manage the "to-do list": two move data in and out, while three drive the math operations. L1 SRAM (On-Chip Memory): High-speed, local memory (approx. 1MB–1.5MB) located directly inside each Tensix core. It is much faster than standard DRAM because data doesn't have to leave the chip to be processed. 2D Torus Network (NoC): The "highway system" on the chip. Unlike a simple grid (mesh), a torus connects opposite edges together (top to bottom, left to right), allowing data to take "shortcuts" across the chip. Wormhole & Blackhole: The names of Tenstorrent’s chip generations. Wormhole is their current flagship for developers, while Blackhole is their next-generation, higher-power chip designed for massive AI clusters. **The "Mood Engine" (Telemetry & Tools)** Telemetry: Real-time data about the hardware's "health," such as its temperature, power usage, and how hard the processors are working. tt-smi (System Management Interface): The command-line tool used to "check the pulse" of the hardware. It provides the raw numbers for your Ghost’s heartbeat (power) and stress level (temp). TT-NN Visualizer: A diagnostic tool that creates "reports" on how an AI model is moving through the chip. You can use these reports as the "script" for your Ghost's animations. SRAM Utilization: A metric showing how much of that fast L1 memory is currently "full" of data. In your prototype, this determines if the Ghost looks "stuffed" or bloated. **Developer Tooling** TT-Metalium: The "low-level" SDK for developers who want to write custom code directly for the hardware (bare-metal). TT-NN: The "high-level" Python library that feels like PyTorch, making it easy for AI researchers to run models on Tenstorrent without needing to be hardware experts.