---
title: RDMA programmming
tags: APAC HPC-AI competition, training
---
# RDMA Programming
[TOC]
> BTN IPv6 addr.
2001:e10:6840:21:4eed:fbff:fe42:22a3
[Course Video](https://zoom.us/rec/share/pp12AKji8mpIToHfuXGDWJ5xQtzuT6a82yga_qcKmhR0_R9VWWLMvbRTLmT8vKo)
>password:2020@HPCAI
---
## RDMA flow
A basic RDMA flow between a requestor and a responder would consist of:
- Handshake - exchange details between requestor and responder (mainly allocated memory addresses and access keys).
- Create a READ/WRITE/ATOMIC request on the requestor side.
- Send the request to the responder.
- Directly access the memory on the responder side.
- If READ/ATOMIC - send the data read from responder's memory back to the requestor.
---
## TCP Sockets
**RDMA Protocol for Better Efficiency**
- Transport offload
- Kernel bypass
- just send the data directly to network adapter (HCA)
- RDMA & Atomic operations
- RDMA write -> no CPU intervention
(HCA : Host Channel Adapter)
TCP: App -> OS -> NIC -To Peer-> NIC -> OS -> APP
- 需要經過很多Buffer
```
Compute
APP APP APP
User
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kernal
Deivce Driver
```
---
### **Traditional Hardware Access**
- Kernel role
- App interface
- Protocol stack
- Resource arbitration across apps
- might have multiple application you wanna communicate with
- Memory management(pinning, DMA)
- Device driver orle
- Hardware abstraction
### **Kernel Bypass**
- Hardware roles
- App interface
- Protocol stack
- Resource arbitration across apps
- Memory management(pinning, DMA)
**Separation of Conrol and Data paths**
- Data path
- Send
- Recv
- RDMA
- Completion Retrieval
- Req event
- wanna be **faster**? don't go pass kernal!
- Control path
- Resource setup
- Memory management
Need to create a new protocol for the objects we need in `control`
---
## Introduction to RDMA
Two-sided communication(send/rev)
- Data is read in local side
- Can be gatherd for multi buffers
- Sent over the wire as a message
- Remote side specify where message will be stored
- Can be scattered to multi buffers
One-sided communication
- Local side can write data directly to remote side memory
- Can be gatherd locally from multiple buffers
- Local side can read data directly from remote sid memory
- Can be sactterd locally to multiple buffers
- Remote side isn't aware to any activity
- 偷偷進入
### Q&A
1. Are the packets sent in parallel with the acknowledge packets check for completion?
> Yes, whenever the responder successfully received the data, it send the acknowledgement back immediately.
2. How the RDMA handles the case that the sending packets were not successful?
> There are multiple types, will discuss that later.
- out of equence

- timeout

3. What might be the case which cause unsuccessful packet send/recv
> we drop the packet, then the mechanism will make the requester send the package again.
4. Is RDMA a kind of connection-less protocol?
> Will cover that later.
5. If the data is moved directly from memory to network adapter, where is the RDMA protocol header? Adding protocal header should be done by cpu, right?
> Good question!
> the header is not added by the cpu, we want the networking devices to do everything, the network adapter is responsible for all that work. Will discuss that later.
---
## Zero-Copy
### Data Movement Tech
- Buffer Copy
- Zero Copy
- RDMA Read
- RDMA Write

### Typical Read zero copy
- Advertise message
- Send - Data = key, addr, len
- RDMA Read
- RDMA Response
- Send with Data
- Completion Msg(Send)

### Typical Write zero copy
- Advertise message
- Send Data = key, addr, len
- RDMA Write
- Send with Data
- Completion Msg(Send)

> APP Buf -> application buffer
### Q&A
1. During the zero copy read operation, can local threads(of the app) still access the buffer?
> you r not suppose to modify the data.
> when u have exposed the buffer to the network, then do not modify the data from then.

> BCopy -> Buffer copy
> ZCopy -> Zero copy
ahead 在這邊指的是?
Q: Pipelines data meaning?
> [i'm not sure here is the answer or not](https://datascience.stackexchange.com/questions/35801/what-is-the-meaning-of-the-term-pipeline-within-data-science)
---
## Transport Offloads
### Transport types
- RC(comparable to TCP)
- Reliable, Connection oriented, transport
- Guarantess full, in-order, delivery of msg and RDMA
- We like reliability! but somtimes you need speed than reliability?
- Key is `quickly & faster`
- Really? except u
- UD(comparable to UDP)
- Unreliable, conn-less, transport
- Best effort to deliver msg
- But sometimes we don't need reliability, depends
- DC
- Dynamically Connected, reliable and scalable transport
- Dynamically connect & ensure reliablity
- Other transports also exist: UC, RD, and XRC(not that commonly used)
---
## Introduction to verbs
There is no RDMA socket!!!
### The "Verbs" Library
- Verbs is an abstract description of the functionality that is provided for app for RDMA
- Verbs typically refers to either the library or the API
- Verbs are also available in Kernel-space (e.g. for device drivers)
- Just like the actions RDMA does.
- Verbs can be divide into two major groups (two sets of APIs)
- Control path - manage the resources and usually **requires context switch**
- Create
- Destroy
- Modify
- Query
- Work with events
- Data path - Use the resources to send/recv data and **doesn't require context switch**
- Post Send
- Post Recv
- Poll CQ
- Req for completion event
### WHY use verbs?
> before we discuss the 2 types of APIs,
> let's start with a Q, why use verbs?
- Verbs is a low level descripton for RDMA program
- Verbs are close to the bear-metal & provide best performance
- Latency
- BandWidth(BW)
- Msg rate
- Verbs can be used as building blocks for any apps
- Sockets
- Storage
- Parallel computing
**Libibverbs**
> Libibverbs 程式庫可讓使用者空間處理程序使用「遠端直接存取記憶體 (RDMA)」動詞。 from [IBM](https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/rdma/rdma_libibverbs.html)
> [REF.](https://www.rdmamojo.com/2012/05/18/libibverbs/)
- Libibverbs is the user-level library for RDMA apps
- Open source, available for multiple linux distributions
- There's also a kenrel counterpart
- but will focus on user-space, and this library
- A library that actually implement the verbs
---
## Verbs Objects
### Request-Response Objects
(Basic ones)
- Work Req (WR)
- Work items that the HW should perform
- Typically either a send req or a recv req
- Work Completion (WC)
- When a WR is completed, it may create a Work Completion which provides info about the ended WR
- Type, opcode, amount of data send/recv, more
### Queue Types
- Work Queue(WQ)
- A queue which contains WRs
- Adding(push) a WR to a WQ is called `posting a WR`, which is how SW does comm
- Can be either Send or Recv Queue(one queue for each mission)
- Scheduled by the HW
- WR execution ordering is a guaranteed within the same WQ - FIFO
- There is no guarantee about the order between different Work Queues
- Every WR taht was posted is considerd `outstanding` until it ends with WC. While outstanding:
- One cannot know if it was scheduled by the HW or not
- Send buffer(s) connt be (re)used/freed
- Recv buffer(s) content is undetermined
- Completion Queue(CQ)
- A queue which contains WCs
- Written by the HW, queried by the SW
- Send Queue(SQ)
- A Work Queue that handles sending messages
- Every entry is called a Send Request(SR). It specifies:
- How data is used
- What memory buffers to use
- To send or receive data - depends on the opcode
- How much data is sent
- More attributes
- Adding a SP to a SQ is called `posting a Send Request`
- SR may end with a Work Completion
- Receive Queue(RQ)
- A Work Queue that handles incoming messages.
- Every entry is called a Receive Request(RR). It specifies:
- Memory buffers to be used(If RR is consumed - depends on opcode)
- Adding a RR to the RQ is called "posting a Receive Request"
- An RR always ends with a Work Completion
- This queue may send data as a response - depends on opcode
- Queue Pair(QP)
- An object which unifies both Send and Receive Queues
- Every Queue is independent
- Every QP is associated with a Partition Key(P_Key)
- Diagram

### Q&A
1. Can the work queue schedule the next WQE object in the queue if the previous outstanding WQE has not yet complete (has not go back to the completion queue yet)?
> The answer is yes, ...
2. If the next one is sending to the same responde, they will be complete in order,right?
> yes,...
---
## Software Transport Architecture





(SGL:Scatter Gather(S/G) List)
---
## Memory Management
### Virtual Memory Management - Short Reminder
- Operating systems typically include `Virtual Memory Managemnt`
- Hardware ahs finite amout of physical memeory
- Each Apps `sees` a range of `dedicated` virtual memory
- In short:
- Operatin system manages mappings Virtual -> Physical
- User-level software (most Apps) only uses virtual memory addresses
- Some of the virtual memory range is `allocated` and available for use, but most is vacant or `free`.
- The App
### Memory Registration & Memory Regions
- RDMA requires a special kind of memory allocation
- We call it `memory registration`, since we `register` existing alloacations
- Involves the driver inside the OS(`control path`)
- Registerd memory has serveral properties:
- Protection
- Byte-level range(base-address and length)
- Permissions(local/remote, read/write)
- Memory Pinning (physical memory is 'locked')
- `Translation handle` to be used for access
- Verbs can only send from and recv to registerd memory
- Memory registration typically happens once at the start of the App
### Memory Pinning
- Pinning allows the selection of specific files, folders, and applications to be accelerated, thus enabling customized responsiveness. Here's a guide on how to improve your computer performance.
- Memory pages pinning has adavatages
- Memory pages are always present in RAM
- Never swapped out
- Low Latency
> [the meaning of page](https://ithelp.ithome.com.tw/articles/10207797)
> [快速了解作業系統](https://ithelp.ithome.com.tw/users/20112132/ironman/1884)
**Virtual Memory Regions**
> Previously, the user-space application would allocate a virtual memory region and register the memory using MR verbs. Registering a virtual memory region using the kernel manages the issues involved with RDMA and virtual addresses.
>
>When working with RDMA and virtual memory it is important to know that:
>
>Virtual memory pages can be swapped and therefore should be pinned.
Translation to a physical address takes time to calculate.
Virtual memory is not continuous in physical memory and therefore each page should be translated.
### Q&A
1. What is the best way to know how many or which pages should be pinned when registering memory?
> yes, that is correct.
2. Memory pinning is at page-level right? Can we pinned an arbitrary part of a page?
> You can not have some part pinned and some other not pinned.
---
## Verbs in short
### The Verbs API essentials
- QP - Queue Pair
- Transport endpoint - async interface
- SW posts SendWR and RecvWR
- CQ - Completion Queue
- Transport endpoint completions - async interface
- Upon WR completion - a WC is generated(not always)
- MR - Memory Region
- References registered memory
- Buffers used in send/recv-WR must point to local MR
- One-sided operations must point to remote MR
- **If you look at the complete verbs API you'll see a ton of objects - these are most important for sending and receiving messages**

---
### RDMA op-codes: Send
- The reponder Post Recevie requests (before data is recevied)
- The requester Post sEnd Request
- Only data is sent over the wire
- ACK is sent only in reliable transport types

> back to the Send WR:SS Procrssing again

### RDMA op-codes: Write
- The requester Post Send Request

#### RDMA Write: Remote End



### RDMA op-codes: Read
- The requester Post Send Request
- data and remote memory attribute are sent
- Responder is passive
- Data is sent from the responder
- Available only in reliable transport types

#### RDMA Read: Remote End


## Atomic
Two types of Atomic Operation,
1. Fetch and Add
- The following is done in atomic way for 64-bits numbers at responder's memory:
- Fetch data from memory
- Add a value
- Store the new number
- Send the original value to the requester
2. Compare and Swap
- The following is done in atomic way for 64-bits numbers at responder's memory:
- Fetch data from memory
- Compare the data with number1
- If they are equal, then store number2 in memory
- Send the orighinal value to the requester
[](https://i.imgur.com/guwVwJ6.png)
## Memory Registration?...(sorry, I forgot...
### Memory Region(MR)內存 (@Phoebe)
> virtually conyiguous memory block that was registered (for RDMA)

### Memory Region (MR): API (@Angela)

## Send Request (@Marvin)
:::danger
**Most important part!!**
:::

---
### Scatter/Gather (S/G) elements
- Every S/G refers to a Memory Region or part of it
- No S/G entries means zero-byte message
- Gather : when local data is read and sent over the wire
- Scatter : when data is received and written locally
```cpp=
struct ibv_sge {
uint64_t addr; //Start address of the memory buffer (registered memory)
uint32_t length; //Size (in bytes) of the memory buffer
uint32_t lkey; //lkey of Memory Region associated with this memory buffer
}
```
---
### Post Send Request: API
```cpp=
int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr);
```
- Add a linked list of Send Requests to the Send Queue
- **Warning :** bad_wr is mandatory;It will be assigned the address of the Send Request that its posting failed
```cpp=
struct ibv_wr{
uint64_t wr_id; //Private contest that will be available in the corresponding Work Completion
struct ibv_send_wr *next; //Address of the next Send Request. NULL in the last Send Request
struct ibv_sge *sq_list; //Array of scatter/gather elements
int num_sqe; //Number of elements in sg_list
enum ibv_wr_opcode opcode; //The opcode to be used
int sed_flags; //Send flags,or of the following flags.
//IBV_SEND_FENCE : Prevent process this Send Request until the processing of previous RDMA Read and Atomic operations were completed
//IBV_SEND_SIGNALED : Generate a Work Completion after processing of this Send Request ends
//IBV_SEND_SOLICITED : Generate Solicited event for this message in remote side
//IBV_SEND_INLINE : Allow the low-level driver to read the gather buffers
uint32_t imm_data; //Send message with immediate data(for supported opcode)
}
```
> have some pointer to work requests, really important meta data
point to point to point
---
### Post Receive request
---
### Polling for Work Completion
- Polling for Work Completion checks if the processing of a Work Request has ended
- A Work Completion holds information about a completed Work Request
- Every Work Completion contains information about the corresponding completed Wor Request
- Every Work Completion contain several attributes
- The following fields are always vaild (even if the Work Completion was ended with error)
- wr_id
- status
- qp_num
- vendor_err
- The rest of the fields depend on the QP's transport type, opcode and status
- Work Completion of Send Requests:
- Mark that a Send Request was performed and its memory buffers can be (re)used
- For reliable transport QP: this means that the message was written in the buffers (if status is successful)
- For unreliable transport QP: this means that the message was sent from the local port
- Work Completion of Receive Requests:
- Mark that an incoming message was completed and its memory buffers can be (re)used
- Contains some attributes about the incoming emssage, such as size, origin, etc.
#### Polling for Work Completion: API
```cpp
/* Prototype */
int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc);
// Read one or more Work Completions from a CQ and remove them from the CQ
// (Return value < 0) ? Error occurred : The number of polled Work Completions
/* Struct ibv_wc */
struct ibv_wc {
uint64_t wr_id; // Private context taht was posted in the corredsponding Work Request
enum ibv_wc_status status; // The status of the Work Completion
enum ibv_wc_opcode opcode; // The opcode of the Work Completion
uint32_t vendor_err; // Vendor specific
uint32_t byte_len; // Number of bytes that were received
uint32_t imm_data; // Immediate data, in network order, if the falgs indicate that such exists
uint32_t qp_num; // The local QP number that this Work Completion ended in
uint32_t src_qp; // The remote QP number
int wc_flags; // Work Completion flags. Or of the following flags:
// IBV_WC_GRH: Indicator that the first 40 bytes of the receive buffer(s) contain a valid GRH
// IBV_WC_WITH_IMM: Indicator that the received message contains immediate data
uint16_t pkey_index;
uint16_t slid; // For UD QP: the source LID
uint8_t s1; // For UD QP: the source Service Level
uint8_t dlid_path_bits; // For UD QP: the destination LID path bits
}
```
#### Typical Work Completion status
- `IBV_WC_SUCCESS`: Operation completed successfully
- `IBV_WC_LOC_LEN_ERR`: Local length error when processing SR or RR
- `IBV_WC_LOC_PORT_ERR`: Local Prtection error; S/G entries doesn't point to a valid MR
- `IBV_WC_WR_FLUSH_ERR`: Work Request fluch error; it was processed when the QP was in Error state
- `IBV_WC_RETRY_EXC_ERR`: Retyr exceeded; the remote QP didn't send any ACK/NACK, even after message retransmission
- `IBV_WC_RNR_RETRY_EXC_ERR`: Recv Not Read; a message that requires a Recv Request was sent, but isn't any RR in the remote QP even after message retransmission
---
### General Tips on libibverbs
- Source code that uses libibverbs should include the header:
- `#include <infiniband/verbs.h>`
- Executables/libraries that work with libibverbs should be linked with:
- `-libverbs`
- All input structures should be zeroed
- Using `memset()` or structure initialization
- If the structure will be extended in the future, te value zero will keep the legacy behavior.
- Most resources handles are pointers, so bad handles may cause segmentation fault.
- Verbs that return a pointer - return a valid value in case of a success and NULL in case of a failure.
- Verbs that return an integer - return zero in case of a success and -1 or errno(? in case of a failure.
### Abbreviation (abbr.)
- BCopy vs ZCopy
- Queue
- Queue Pair (QP)
- Send Queue (SQ)
- Receive Queue (RQ)
- Completion Queue (CQ)
- Verbs
- libibverbs
- Hardware
- Host Channel Adapter(HCA)
- BandWidth(BW)
## Q&A
1. I wonder which MPI library has naive RDMA support.
> A: All of them.
### Adding materials :heart:
- [RDMA](https://wiki.debian.org/RDMA)
- [IBM Libibverbs](https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/rdma/rdma_libibverbs.html)
- [libibverbs Doc.](https://www.rdmamojo.com/2012/05/18/libibverbs/)
- [Introduction to Programming Infiniband RDMA](https://insujang.github.io/2020-02-09/introduction-to-programming-infiniband/)
- [快速了解作業系統](https://ithelp.ithome.com.tw/users/20112132/ironman/1884)
- -> 看第19天 [the meaning of page](https://ithelp.ithome.com.tw/articles/10207797)
---
## Actually run a RDMA program
### Environment
**Soft-RoCE** is a software implementation of the RDMA transport. It is a project that has been developed as a Github community project, with primary contributions from IBM, Mellanox, and System Fabric Works. Soft-RoCE is now ready for Linux upstream submission. Soft-RoCE leverages the same efficiency characteristics as RoCE, providing a complete RDMA stack implementation over any NIC.
Try the open source Soft-RoCE(RXE) in two VMs on your personal computer or workstation.
[HowTo Configure Soft-RoCE](https://community.mellanox.com/s/article/howto-configure-soft-roce)
[Verify that RDMA is working](https://www.rdmamojo.com/2015/01/24/verify-rdma-working/)
To identify RDMA-capable devices in your system, type the following command:`# ibstat`
### Code
- [ ] Phoebe
```
```
- [ ] Julie
```
```
- [ ] Angela
```
```
- [ ] Johnson
```cpp
struct ibv_context* createContext(const std::string& device_name)
{
/* There is no way to directly open the device with its name; we should get the list of devices first. */
struct ibv_context* context = nullptr;
int num_devices; // 存device的個數
struct ibv_device** device_list = ibv_get_device_list(&num_devices);
for (int i = 0; i < num_devices; i++){
/* match device name. open the device and return it */
if (device_name.compare(ibv_get_device_name(device_list[i])) == 0) {
context = ibv_open_device(device_list[i]); //開啟device
break;
}
}
/* it is important to free the device list; otherwise memory will be leaked. */
ibv_free_device_list(device_list);
if (context == nullptr) {
std::cerr << "Unable to find the device " << device_name << std::endl;
}
return context;
}
```
- [ ] Joycelyn
```c=
//RDMA SEND
//
//Getting the device list
//ibv_get_device_list( ) returns an array of the RDMA devices currently available.
struct ibv_device **dev_list;
dev_list = ibv_get_device_list(NULL);
if(!dev_list)
exit(1);
//Opening the requested device
//ibv_open_device( ) opens the device and creates a context for further use.
struct ibv_context *ctx;
ctx = ibv_open_device(dev_list[0]);
if(!ctx){
fprintf(stderr, “Error, failed to open the device ‘%s’\n”, ibv_get_device_name(dev_list[i]));
return -1;
}
printf(“The device ‘%s’ was opened\n”, ibv_get_device_name(ctx->device));
//Querying the device’s capabilities
//ibv_query_device( ) returns the attributes of the RDMA device that is associated with a context. These attributes are constant and can be later used.
struct ibv_device_attr device_attr;
int rc;
rc = ibv_query_device(ctx, &device_attr);
if(rc){
fprintf(stderr, “Error, failed to query the device ‘%s’ attributes\n”, ibv_get_device_name(device_list[i]));
return -1;
}
//Allocating a protection domain
//ibv_alloc_pd( ) allocates a protection domain for an RDMA device context.
struct ibv_context *context;
struct ibv_pd *pd;
pd = ibv_alloc_pd(context);
if(!pd){
fprintf(stderr, “Error, ibv_alloc_pd() failed\n”);
return -1;
}
//Registering a memory region
//ibv_reg_mr( ) registers a memory region associated with the protection domain to allow the RDMA device to perform read/write operations.
struct ibv_mr *mr;
mr = ibv_reg_mr(pd, buf, size, IBV_ACCESS_LOCAL_WRITE);
if(!mr){
fprintf(stderr, “Error, ibv_reg_mr() failed\n”);
return -1;
}
//Creating a completion queue
//ibv_create_cq( ) creates a completion queue for an RDMA device context.
struct ibv_cq *cq;
cq = ibv_create_cq(context, 100, NULL, NULL, 0);
if(!cq){
fprintf(stderr, “Error, ibv_create_cq() failed\n”);
return -1;
}
//Creating a queue pair
//ibv_create_qp( ) creates a queue pair associated with a protection domain.
struct ibv_qp *qp;
struct ibv_qp_init_attr qp_init_attr;
memset(&qp_init_attr, 0, sizeof(qp_init_attr));
qp_init_attr.send_cq = cq;
qp_init_attr.recv_cq = cq;
qp_init_attr.qp_type = IBV_QPT_RC;
qp_init_attr.cap.max_send_wr = 2;
qp_init_attr.cap.max_recv_wr = 2;
qp_init_attr.cap.max_send_sge = 1;
qp_init_attr.cap.max_recv_sge = 1;
qp = ibv_create_qp(pd, &qp_init_attr);
if(!qp){
fprintf(stderr, “Error, ibv_create_qp() failed\n”);
return -1;
}
//Creating an address vector
//ibv_create_ah( ) creates an address handle associated with a protection domain.
struct ibv_ah *ah;
struct ibv_ah_attr ah_attr;
memset(&ah_attr, 0, sizeof(ah_attr));
ah_attr.is_global = 0;
ah_attr.dlid = dlid;
ah_attr.sl = sl;
ah_attr.src_path_bits = 0;
ah_attr.port_num = port;
ah = ibv_create_ah(pd, &ah_attr);
if (!ah) {
fprintf(stderr, “Error, ibv_create_ah() failed\n”);
return -1;
}
//Posting work requests
//ibv_post_send( ) posts a linked list of work requests to the send queue of a queue pair.
struct ibv_sge sg;
struct ibv_send_wr wr;
struct ibv_send_wr *bad_wr;
memset(&sg, 0, sizeof(sg));
sg.addr = (uintptr_t)buf_addr;
sg.length = buf_size;
sg.lkey = mr->lkey;
memset(&wr, 0, sizeof(wr));
wr.wr_id = 0;
wr.sg_list = &sg;
wr.num_sge = 1;
wr.opcode = IBV_WR_SEND;
wr.send_flags = IBV_SEND_SIGNALED;
if(ibv_post_send(qp, &wr, &bad_wr)){
fprintf(stderr, “Error, ibv_post_send() failed\n”);
return -1;
}
//Polling for completion
//ibv_poll_cq( ) polls work completions from a completion queue.
struct ibv_wc wc;
int num_comp;
do{
num_comp = ibv_poll_cq(cq, 1, &wc);
} while (num_comp == 0);
if(num_comp < 0){
fprintf(stderr, “ibv_poll_cq() failed\n”);
return -1;
}
```
---
[Intro to RDMA programming](https://insujang.github.io/2020-02-09/introduction-to-programming-infiniband/)
[Fundamentals of RDMA programming](https://www.opensourceforu.com/2016/09/fundamentals-of-rdma-programming/)
[RDMA Tutorial](https://github.com/jcxue/RDMA-Tutorial/wiki)
[RDMA Aware Networks Programming User Manual](https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf)