# 8/24 Hackathon Day3
## verbs
### Subnetmanager

### GID LID GUID

### The "Verbs" library

### Why use "Verbs"?

### libibverbs

### general tips on libibverbs
#### few tips
- 要使用得要 include 此 header
- `#include<infiniband/verbs.h>`
- 有使用 libibverbs 的可執行檔跟函式庫都要 link `-libverbs`
- input struct 需要被預設為零
- 可使用 `memset`
- 多數資源控制代碼(handle)是指標(pointer),所以當用了錯誤的控制代碼可能會造成記憶體區段錯誤
- 當回傳的是 pointer,其值為 valid value 代表成功,NULL 代表失敗
- 當回傳的是整數,0 為成功,-1 為失敗

### Memory Region (MR)

### Queue Pair (QP)


### Memory Region: API

### Connecting Queue Pairs (QPs)


### Queue Pair: API


##### example

### Post Sent Request

### Post Sent Request: API

### Polling for Work Completing

### Polling for Work Completing: API

##### example

## Lecture Outline
### Mutilplexing & De-Mutilplexing

### Ping-pong Measurements
- Client
- round-trip-time 15.7 microseconds
- user CPU time 100% of elapsed(通過) time
- kernel CPU time 0% of elapsed time
- Server
- round-trip time 15.7 microseconds
- user CPU time 100% of elapsed time
- kernel CPU time 0% of elapsed time
#### How to reduce 100% CPU usage
- Cause is “busy polling” to wait for completions
- in tight loop on ibv_poll_cq()
- burns CPU since most calls find nothing
- Why is “busy polling” used at all?
- simple to write such a loop
- gives very fast response to a completion
#### ”busy polling” to get completions
1. start loop
2. ibv_poll_cq() to get any completion in queue
3. exit loop if a completion is found
4. end loop
- wait-for-event
- ibv_req_notify_cq() - to arm NIC to send event on next completion added to CQ
- ibv_poll_cq() to get new completion between 2&4
- exit loop if a completion is found
- ibv_get_cq_event() - to wait until CA sends event
- ibv_ack_cq_event() - acknowledges “event”
- end loop
## The Eager protocol
- On the Responder – 發布一些接收緩衝區(receive buffers)
- On the Requestor – 使用SEND opcode 傳送到遠端的QP(由QPN)
- sends 是典型的"SIGNALED":當它完成後,我們會希望產生一個work completion.
- 並非所有send都是"signaled".
- 缺點:memcpy! (不是zero-copy)

## The Rendezvous protocol
### Rendezvous feature
- 較為安全和堅固
- 不需要緩衝區
- 主要用於傳遞較大的訊息
- 程式較為複雜
- 可以不用copy(user to user direct)
- 延遲較高
- Main features
- Transports: IB/RoCE, Shared memory, TCP
- Java and Python bindings
- Seamless handling of GPU memory (NVIDIA, AMD)

- UCP objects
- ucp_context_h -> Top-level context for the application.
- ucp_worker_h -> Communication resources and progress engine context.A possible usage is to create one worker per thread or per CPU core.
- ucp_listener_h -> Listens for incoming connection requests on a specific port.
- ucp_ep_h -> Connection to a remote peer, used to send/receive data.
- ucp_mem_h -> Handle to memory allocated or registered in the local process.
- ucp_rkey_h -> Remote memory handle. Allows access to remote memory for one-sided operations and atomics.


- Client/server connection establishment API

## code
622行
```diff
-char *servername;
+char *servername = NULL;
```
執行
`gcc bw_template.c -libverbs`