Responses are dotted lines. Protocol guarantees responses arrive in order.
Trying to make this look like PCIE so moving to PCIE later will feel natural.
All messages are send and forget, except for memory read which needs a response. Device<–>host communication is using UART or ethernet.
A packet has 1 byte header to define the packet type and length. Then the payload follows.
Sometimes order matters, sometimes it doesn't.
Host refers to the host driver
Python refers to the tinygrad/pytorch runtime.
There's 2 types of memory on the host PC.
One is tiled memory, this imitates how we'll structure memory in VRAM in the future. It's a 1GB array (emulating a 1GB VRAM). The tensors are stored tile contigous.
The other memory is normal row contigous with strides. This is how pytorch and tinygrad store tensors.
Right now to_gpu()
moves a tensor from row contigous memory to the 1GB tiled memory to emulate transfer to VRAM. In the future, it will move the tensor to the real VRAM.