Socketpair shim idea

# Socketpair shim idea ## Introduction ```rust pub unsafe extern "C" fn socketpair( domain: c_int, type_: c_int, protocol: c_int, socket_vector: *mut c_int ) -> c_int //source: https://docs.rs/libc/latest/libc/fn.socketpair.html ``` - **domain:** communication domain for the socket - ``socketpair`` uses ``AF_UNIX`` or ``AF_LOCAL``, which are synonymous and represent Unix domain socket. - **type:** socket type - ``SOCK_STREAM``: stream socket - ``SOCK_DGRAM (Not supported)``: datagram socket - ``SOCK_CLOEXEC``: cause the kernel to enable the close-on-exec flag ``FD_CLOEXEC`` for the new file descriptor - ``SOCK_NONBLOCK (Linux specific)``: cause the I/O operation on the socket to be nonblocking - **protocol:** - only support 0 [^1] - **socket_vector:** vector to store 2 file descriptors returned by the function - **return value:** 0 on success, or -1 on error ## Implementation - Flag handling - ``SOCK_DGRAM`` is not supported. [^2] - SocketPair design: ```rust #[derive(Debug)] struct SocketPair { writebuf: Weak<RefCell<Buffer>>, readbuf: Rc<RefCell<Buffer>>, is_nonblock: bool, } #[derive(Debug)] struct Buffer { buf: VecDeque<u8>, clock: VClock, buf_has_writer: bool, } ``` - The maximum capacity of a socket is set to ``212992``, this is just a random number as usually the user of a host system can freely set it to any number. ### Mechanism of buffers - Motivating example ```c int main(void) { int fd[2]; // file descriptor pair char buf0, buf1; if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd) == -1) { perror("socketpair"); exit(EXIT_FAILURE); } write(fd[0], "b", 1); write(fd[1], "a", 1); read(fd[1], &buf1, 1); printf("buf1 stored: %c\n", buf1); // "b" is printed read(fd[0], &buf0, 1); printf("buf0 stored: %c\n", buf0); //"a" is printed return 0; } ``` We can model the bidirectional message passing of ``socketpair`` as below: ``` write to fd[0] -----> read from fd[1] read from fd[0] <----- write to fd[1] ``` So we can create 2 buffer to handle both directions, ``` write to fd[0] -----> write to buffer0 ----> read from fd[1] read from fd[0] <----- write to buffer1 <---- write to fd[1] ``` ## Specification - This specification considers: 1. blocking and non-blocking 2. cases where either end is dropped ### For non-blocking - read - if write end open but no data available - fail with ``EWOULDBLOCK`` - if write end close - if buffer is not empty - read as much as we need/can and return number of bytes read - else (buffer is empty ) - return 0 - if write end open with data - if read size < available data - read as requested and return - if read size >= available data - read as much as we can and return (partial read) - write - if read end open - if available_space == 0 - fail with ``EAGAIN/EWOULDBLOCK`` - else - write as much as we can and return number of bytes written - if read end closed - fail with ``EPIPE`` ### For blocking - read - if write end open but no data available - block - if write end close - if buffer is not empty - read as much as we need/can and return number of bytes read - else (buffer is empty ) - return 0 - if write end open with data - if read size < available data - read as requested and return - if read size >= available data - read as much as we can and return - write - if read end open - if available_space == 0 - block - else - write as much as we can then return number of bytes written. - if read end close - fail with ``EPIPE`` ### Synchronisation As sometimes IO is used for synchronisation, we need to make sure Miri knows about IO synchronisation through ``socketpair::read`` and ``socketpair::write`` to prevent false positive in data race detection. [^3] ### Test - Test the below from both both direction 1. read size == data available in buffer 2. read size > data available in buffer: Read all available data in the buffer. 3. write size < remaining capacity of buffer - The tests below is under ``fail-dep`` because blocking is not supported yet - Write when the buffer is full - Read when the buffer is empty - Threaded test - Data race test ### Sidenote: - In BSD-derived implementation, ``pipe()`` is implemented as a call to ``socketpair`` with type ``SOCK_STREAM``, so potentially we can have ``pipe()``completed after this. ### Idea (not part of the proposal, might come in handy in future): > (For pipe or FIFO) If the O_NONBLOCK flag is clear, a write request may cause the thread to block, but on normal completion it shall return nbyte. > > If fildes refers to a socket, write() shall be equivalent to send() with no flags set. - https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html > > ssize_t send(int socket, const void *buffer, size_t length, int flags); > > The length of the message to be sent is specified by the length argument. If the message is too long to pass through the underlying protocol, send() shall fail and no data shall be transmitted. - https://pubs.opengroup.org/onlinepubs/9699919799/functions/send.html [^1]: https://github.com/rust-lang/miri/issues/3442#issuecomment-2123268607 [^2]: https://stackoverflow.com/questions/37475039/what-is-the-difference-between-type-and-protocol-in-c-socket-function [^3]: https://github.com/rust-lang/miri/pull/3609#pullrequestreview-2072024441