owned this note
owned this note
Published
Linked with GitHub
# Inet4AI live notes
Space for live Q&A. Please use the section below each presentation for questions specific to the talks, and the wrap-up discussion section for general remarks and questions.
## Invited talk 1: Uno: A One-Stop Solution for Inter- and Intra-Datacenter Congestion Control and Reliable Connectivity
Questions:
* Do you use specific control flow / feedback to manage the Phantom Queues? You mentionned ECN, have you observed filtering of this signal between DCs?
* Answer: We use Phantom Queues to decide whether we ECN-mark a packet or not at the switch. Then the sender uses ECN marks (through ACKs) to reduce its sending rate/cwnd. So in our work we tried two assumptions: using "simple" switches with ECN support everywhere, including border switches, or using proper border switches without ECN support. In the latter we restored to latency measurements.
* Do you need extra buffering at the egress/ingress of the DC? could you use such an additional buffer to distribute phantom queues?
* Answer: So we tried both approaches: one where we use large buffers for border switches and one where they are equivalent to "normal" DC switches. From our results, things seem to work okay even without the extra buffering capacity at the border switches, however in practice this is probably not easily doable currently as most border switches have a different design. For phantom queues I think they help more when the buffer is small as we can use them to give us more info compared to the real queue.
* Multipath [to be detailed]
* On faireness:
* Q: what kind of faireness ideally you want for inter- and intra-DC mixture traffic? how it relates to the Phantom Queue dimensioning w.r.t to the physical queue size?
* A: if you can perfectly control the scheudel of communication and computation, here might not be overlaps between inter- and intra-DC collectives, such that you don't need to handle mixture of inter- and intra-DC flow. But you can not assume so, for the sake of simplicity in design, we design for fairness. Also, one other issue is that if you don't have fairness it means flows will finish at different times at will take time to ramp up their sending rate. Unless you can do this very quickly, this will create a slodown which is why we aim for fairness in Uno.
## AI4Net paper 1: Self-supervised Application-level Network Traffic Inversion
Questions:
* How does the sampling rate affects the results obtained with the reconstruction method?
* the granularith of iversion result and downstream tasks? A: bytes per sloted time, say per sec or per 5min to be compatible with netflwo and sflow input. Targeting loss, retransmission events for conf calls. (still having doubts on the down stream task viability through reconstructing the sampled flow though.. to be confirmed)
## Net4AI paper 1: Latency-Optimal Load Balancing For Distributed MoE Inference
Questions:
* How many experts are used in DeepSeek model you have used? How does your approach scale with the number of (routed) experts used in the model?
* Do you plan to distribute your large scale target cluster over long distances / in different DCs?
* This is quite latency-oriented. I noticed the expert movements are much lower than the DeepSeek. That means there is much lower overhead of memory movements. the only hint found is mean workload. do you notice the improvement from the perspective of throughput. in addtion, how do you envision regarding of the trade-off between rebalancing frequency and throughput speedup
## Invited talk 2: From Homogeneous to Disaggregated Architectures for Large Model Inference
Questions:
* Have you experience in distributing the KVCache over long distances (or even over the Internet)? Did you experience specific bottlenecks in such a setting?
* In TENT, in which context are you using TCP? Did you try other transport protocols (Besides RDMA)?
* A paper has suggested working on KVcache-centric networking to put the KVCaching capabilities closer to the network layer, adopting similar approach as content-centric networking. Is it a thread of work you are considering?
* What type of KVCache reuse policies you are using? Prefix based?
* On cache granularith and possibility go beyond prefix matching? A: page based, default page size of 64 token, and can prefix match to as much pages as possible. the hit ratio is calculated on page granularity. To go beyond prefix matching, CacheBlend is a starting point to stich KV segements with partial tokens recalculated. The challenge is that the inference quality drops due to the limited among recalculated the tokens. A next stage is submitted to SIGMOD 26, with both online and offline part. The offline part performs some preprocessing so as to increase the amount the token calculated in stiching.
## Net4AI paper 2: SCALE-CCL: A Scalable Collective Communication Library for Wide-Area Distributed Training
Questions:
* Did you consider propagating link metrics about WAN links during the local load balance phase?
* Did you test ScaleCCL with heterogeneous WAN links?
* How possibly optimize for all sorts of collectives?
## Net4AI paper 3: You've got a few GPUs, now what?! --- Experimenting with a Nano-Cluster for Distributed Training of AI Models
Questions:
* Your work seems to be the step right before federated learning frameworks (FlowerLLM comes to mind). Did you get inspiration from such systems?
## Invited talk 3: Debriefing the Open Innovation Platform for UnifiedBus
Questions:
* Is UB suited to a use accross inter-datacenter links? over the Internet? Are you aware of people simulating such use cases on the simulation tools?
## Wrap-up discussion
Questions:
*