8/24 Hackathon Day3

# 8/24 Hackathon Day3 ## verbs ### Subnetmanager ![](https://i.imgur.com/JbRXapi.png) ### GID LID GUID ![](https://i.imgur.com/l7UgA5O.png) ### The "Verbs" library ![](https://i.imgur.com/3l387p2.png) ### Why use "Verbs"? ![](https://i.imgur.com/5xM42Xi.png) ### libibverbs ![](https://i.imgur.com/UrH4jZl.png) ### general tips on libibverbs #### few tips - 要使用得要 include 此 header - `#include<infiniband/verbs.h>` - 有使用 libibverbs 的可執行檔跟函式庫都要 link `-libverbs` - input struct 需要被預設為零 - 可使用 `memset` - 多數資源控制代碼(handle)是指標(pointer)，所以當用了錯誤的控制代碼可能會造成記憶體區段錯誤 - 當回傳的是 pointer，其值為 valid value 代表成功，NULL 代表失敗 - 當回傳的是整數，0 為成功，-1 為失敗 ![](https://i.imgur.com/2v1p0Yl.png) ### Memory Region (MR) ![](https://i.imgur.com/3zFDSks.png) ### Queue Pair (QP) ![](https://i.imgur.com/7HMI7XT.png) ![](https://i.imgur.com/6tTid25.png) ### Memory Region: API ![](https://i.imgur.com/FJ4V1Pp.png) ### Connecting Queue Pairs (QPs) ![](https://i.imgur.com/Ukcp0Cb.png) ![](https://i.imgur.com/gFfs9ZM.png) ### Queue Pair: API ![](https://i.imgur.com/tq5Ujuf.png) ![](https://i.imgur.com/hit67ut.png) ##### example ![](https://i.imgur.com/z8y1h4J.png) ### Post Sent Request ![](https://i.imgur.com/FshBg6G.png) ### Post Sent Request: API ![](https://i.imgur.com/mfFfGrL.png) ### Polling for Work Completing ![](https://i.imgur.com/fOFfxal.png) ### Polling for Work Completing: API ![](https://i.imgur.com/rOkCkd7.png) ##### example ![](https://i.imgur.com/Vewzkup.png) ## Lecture Outline ### Mutilplexing & De-Mutilplexing ![](https://i.imgur.com/B5qqzOH.png) ### Ping-pong Measurements - Client - round-trip-time 15.7 microseconds - user CPU time 100% of elapsed(通過) time - kernel CPU time 0% of elapsed time - Server - round-trip time 15.7 microseconds - user CPU time 100% of elapsed time - kernel CPU time 0% of elapsed time #### How to reduce 100% CPU usage - Cause is “busy polling” to wait for completions - in tight loop on ibv_poll_cq() - burns CPU since most calls find nothing - Why is “busy polling” used at all? - simple to write such a loop - gives very fast response to a completion #### ”busy polling” to get completions 1. start loop 2. ibv_poll_cq() to get any completion in queue 3. exit loop if a completion is found 4. end loop - wait-for-event - ibv_req_notify_cq() - to arm NIC to send event on next completion added to CQ - ibv_poll_cq() to get new completion between 2&4 - exit loop if a completion is found - ibv_get_cq_event() - to wait until CA sends event - ibv_ack_cq_event() - acknowledges “event” - end loop ## The Eager protocol - On the Responder – 發布一些接收緩衝區(receive buffers) - On the Requestor – 使用SEND opcode 傳送到遠端的QP(由QPN) - sends 是典型的"SIGNALED":當它完成後，我們會希望產生一個work completion. - 並非所有send都是"signaled". - 缺點:memcpy! (不是zero-copy) ![](https://i.imgur.com/PZ7ZHqo.png) ## The Rendezvous protocol ### Rendezvous feature - 較為安全和堅固 - 不需要緩衝區 - 主要用於傳遞較大的訊息 - 程式較為複雜 - 可以不用copy(user to user direct) - 延遲較高 - Main features - Transports: IB/RoCE, Shared memory, TCP - Java and Python bindings - Seamless handling of GPU memory (NVIDIA, AMD) ![](https://i.imgur.com/qQPqR06.png) - UCP objects - ucp_context_h -> Top-level context for the application. - ucp_worker_h -> Communication resources and progress engine context.A possible usage is to create one worker per thread or per CPU core. - ucp_listener_h -> Listens for incoming connection requests on a specific port. - ucp_ep_h -> Connection to a remote peer, used to send/receive data. - ucp_mem_h -> Handle to memory allocated or registered in the local process. - ucp_rkey_h -> Remote memory handle. Allows access to remote memory for one-sided operations and atomics. ![](https://i.imgur.com/90zS1Ga.png) ![](https://i.imgur.com/mjI3Jry.png) - Client/server connection establishment API ![](https://i.imgur.com/DF90adp.png) ## code 622行 ```diff -char *servername; +char *servername = NULL; ``` 執行 `gcc bw_template.c -libverbs`