Zero Copy

tags: Note MCL Zero Copy

Zero Copy

scenario

從 disk 讀檔,再將檔案用 socket 傳給另一 client

read(file, tmp_buf, len);
write(socket, tmp_buf, len);

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • cpu copy 2 次
  • dma copy 2 次
  • 切 mode 4 次

mmap + write

利用 mapping 的方式減少 cpu copy

tmp_buf = mmap(file, len);
write(socket, tmp_buf, len);

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • cpu copy 1 次
  • dma copy 2 次
  • 切 mode 4 次

SIGBUS 問題 (非預期的記憶體存取操作)

解法:

  • callback function
  • file leasing (opportunistic locking)
    • 當另一個 process 嘗試 write 你正在使用的 memory 段時, kernel 會送 RT_SIGNAL_LEASE,表示無法使用,並且在 process 被 SIGBUS kill 之前中斷 write

sendfile

直接從一個 fd copy 內容到另一個 fd

sendfile(socket, file, offset, len);

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • cpu copy 1 次
  • dma copy 2 次
  • 切 mode 2 次

sendfile + DMA gather

只 copy 些微資訊,例如 fd, offset, len 之類的

sendfile(socket, file, offset, len);

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • cpu copy 0 次 (嚴格來說有 1 次,但非 copy 整份內容)
  • dma copy 2 次
  • 切 mode 2 次

splice

建立 pipe buffer

ssize_t splice(int fd_in, off64_t *off_in, int fd_out,
                      off64_t *off_out, size_t len, unsigned int flags);

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • cpu copy 0 次 (嚴格來說有 2 次,但只有 copy pointer)
  • dma copy 2 次
  • 切 mode 6 次

Zero Copy 應用

網路封包處理

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

PF_RING

  • mmap
    • mapping kernel_buf and user_buf
NIC 到 user space
  • cpu copy 1 次

PF_RING zc

  • mmap
    • RX_buffer and user_buf
NIC 到 user space
  • cpu copy 0 次

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

DPDK

  • UIO + mmap
    • UIO (userspace driver)
    • mapping RX_buffer and user_buf
  • PMD (Poll Mode Driver)
    • 禁用 interrupt 改用 polling
  • Huge Pages
    • 減少 TLB miss
NIC 到 user space
  • cpu copy 0 次

Reference

Zero Copy I: User-Mode Perspective
一文带你,彻底了解,零拷贝 Zero-Copy 技术
Performance Review of Zero Copy Techniques
Linux I/O 原理和 Zero-copy 技术全面揭秘
DPDK解析