56. SOCKETS: INTRODUCTION

# 56. SOCKETS: INTRODUCTION Sockets are a method of **IPC** that allow data to be exchanged between applications, either on the same host (computer) or on different hosts connected by a network. ## Overview In a typical client-server scenario, applications communicate using sockets as follows: - Each application creates a socket. A socket is the “apparatus” **that allows communication, and both applications require one** - The server binds its socket to a **well-known address (name)** so that clients can locate it. - A socket is created using the socket() system call, which returns a **file descriptor** used to refer to the socket in subsequent system calls: ```c= fd = socket(domain, type, protocol); ``` ### Communication domains Modern operating systems support at least the following domains: - `UNIX` (AF_UNIX) domain allows communication between applications **on the same host**. - `IPv4` (AF_INET) domain allows communication between applications running on hosts connected **via an Internet Protocol version 4** (IPv4) network. - `IPv6` (AF_INET6) domain allows communication between applications running on hosts connected **via an Internet Protocol version 6** (IPv6) network. ![](https://i.imgur.com/p3XoKPW.png) ### Socket types * stream * datagram <img src="https://i.imgur.com/ZEdu9JA.png" width="70%"> ### Stream Sockets * operate in connected pairs * described as connection-oriented. * provide a **reliable**, **bidirectional**, **byte-stream** communication channel. - `Reliable`: means that we are guaranteed that either the transmitted data will **arrive intact** at the receiving application - `Bidirectional` means that data may be transmitted in either direction **between two sockets**. - `Byte-stream` means that, as with pipes, there is **no concept of message boundaries** ### Datagram Sockets * concept of a **connectionless socket**. * exchanged in the form of messages called **datagrams**. - **message boundaries** are preserved, - data transmission is **not reliable**. Messages may arrive out of order, be duplicated, or not arrive at all. In the Internet domain, datagram sockets employ the User Datagram Protocol(UDP), and stream sockets (usually) employ the Transmission Control Protocol (TCP). ## Socket system calls The key socket system calls are the following: - `socket()` creates a new socket. - `bind()` binds a socket to an address. - `listen()` allows a stream socket to accept incoming connections from other sockets - `accept()` accepts a connection from a peer application on a listening stream socket, and optionally returns the address of the peer socket. - `connect()` establishes a connection with another socket. - Socket I/O can be performed using the conventional `read()` and `write()` system calls, or using a range of socket-specific system calls (e.g., `send()`, `recv()`, `sendto()`, and `recvfrom()`). ## Creating a Socket: socket() The socket() system call creates a new socket. On success, socket() returns a file descriptor used to refer to the newly created socket in later system calls. ![](https://i.imgur.com/8RHA80a.png) - `domain`: specifies the communication domain for the socket. - `type`: - SOCK_STREAM, to create a stream socket, - SOCK_DGRAM, to create a datagram socket. - `protocol` argument is always specified as 0 for the socket types we describe in this book. ## Binding a Socket to an Address: bind() The bind() system call binds a socket to an address. ![](https://i.imgur.com/LeBFQmz.png) - `sockfd`: file descriptor obtained from a previous call to `socket()` - `addr`: pointer to a structure specifying the address to which this socket is to be bound - `addrlen`: specifies the size of the address structure. ## Generic Socket Address Structures: struct sockaddr - This structure serves as **a template for all of the domain-specific address** structures. - purpose for this type is to **cast the various domain-specific address structures to a single type** for use as arguments in the socket system calls. - IPV4, IPV6.... ```c= struct sockaddr { sa_family_t sa_family; /* Address family (AF_* constant) */ char sa_data[14]; /* Socket address (size varies according to socket domain) */ }; ``` ## Stream Sockets analogy with the telephone system: ![](https://i.imgur.com/8Uw83X2.png) ### Listening for Incoming Connections: listen() The listen() system call marks the stream socket referred to by the file descriptor **sockfd as passive**. ![](https://i.imgur.com/scNOhK9.png) - We can’t apply listen() to a connected socket - `backlog` argument allows us to limit the number of such pending connections (ex 56-2). - Connection requests up to this limit succeed immediately. (For TCP sockets, the story is a little more complicated, as we’ll see in Section 61.6.4.) ![](https://hackmd.io/_uploads/Sk0MmA3Vn.png) ### Accepting a Connection: accept() * `accept()` system call accepts an incoming connection on the listening stream socket * creates a **new** socket, and it is this new socket that is connected to the peer socket that performed the connect(). * 回傳 socket 讓 client 使用 connect ![](https://i.imgur.com/8w94OU9.png) - `addr`: points to a structure that is used to return the socket address. - `addrlen`: set to indicate the number of bytes of data actually copied into the buffer. ### Connecting to a Peer Socket: connect() The connect() system call connects the active socket referred to by the file descriptor sockfd to the listening socket whose address is specified by `addr` and `addrlen`. ![](https://i.imgur.com/ahxXCoo.png) The `addr` and `addrlen` arguments are specified in the same way as the corresponding arguments to bind(). ### I/O on Stream Sockets A pair of connected stream sockets provides a bidirectional communication channel between the two endpoints. ![](https://i.imgur.com/nwJdUd5.png) - To perform I/O, we use the `read()` and `write()` system calls (or the socket-specific `send()` and `recv()`, which we describe in Section 61.3) - A socket may be closed using the `close()` system call or as a consequence of the application terminating. When closed: - `read`: receives end-of-file - `write`: receives a SIGPIPE signal, and the system call fails with the error EPIPE. ### Connection Termination: close() If multiple file descriptors refer to the same socket, then **the connection is terminated when all of the descriptors are closed.** ## Datagram Sockets The operation of datagram sockets can be explained by analogy with the postal system: ![](https://i.imgur.com/pndxYtc.png) Just as with the postal system, when multiple datagrams (letters) are sent from one address to another, **there is no guarantee that they will arrive in the order** they were sent, or even arrive at all. ## Exchanging Datagrams: recvfrom() and sendto() The `recvfrom()` and `sendto()` system calls receive and send datagrams on a datagram socket. ![](https://i.imgur.com/4Xg1tfC.png) - The return value and the first three arguments to these system calls are the same as for `read()` and `write()`. - `flags`a bit mask controlling socket-specific I/O features. (describe in 61.3) - `src_addr` and `addrlen`: used to obtain or specify the address of the peer socket with which we are communicating. If we are not interested in the address of the sender, then we specify both src_addr and addrlen as NULL. - `dest_addr` and `addrlen`: specify the socket to which the datagram is to be sent. ## Using connect() with Datagram Sockets - After a datagram socket has been connected: * Datagrams can **be sent through the socket using write()** (or send()) and are automatically **sent to the same peer socket**. As with sendto(), each write() call results in a separate datagram. * Only datagrams sent by the peer socket may be read on the socket. - **no longer need to use sendto() with dest_addr and addrlen** arguments, but can instead use write(). ## Code Server ```c= #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> int main(int argc , char *argv[]) { //socket的建立 char inputBuffer[256] = {}; char message[] = {"Hi,this is server.\n"}; int sockfd = 0,forClientSockfd = 0; sockfd = socket(AF_INET , SOCK_STREAM , 0); if (sockfd == -1){ printf("Fail to create a socket."); } //socket的連線 struct sockaddr_in serverInfo,clientInfo; int addrlen = sizeof(clientInfo); bzero(&serverInfo,sizeof(serverInfo)); serverInfo.sin_family = PF_INET; serverInfo.sin_addr.s_addr = INADDR_ANY; serverInfo.sin_port = htons(8700); bind(sockfd,(struct sockaddr *)&serverInfo,sizeof(serverInfo)); listen(sockfd,5); while(1){ forClientSockfd = accept(sockfd,(struct sockaddr*) &clientInfo, &addrlen); send(forClientSockfd,message,sizeof(message),0); recv(forClientSockfd,inputBuffer,sizeof(inputBuffer),0); printf("Get:%s\n",inputBuffer); } return 0; } ``` Client ```c= #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> int main(int argc , char *argv[]) { //socket的建立 int sockfd = 0; sockfd = socket(AF_INET , SOCK_STREAM , 0); if (sockfd == -1){ printf("Fail to create a socket."); } //socket的連線 struct sockaddr_in info; bzero(&info,sizeof(info)); info.sin_family = PF_INET; //localhost test info.sin_addr.s_addr = inet_addr("127.0.0.1"); info.sin_port = htons(8700); int err = connect(sockfd,(struct sockaddr *)&info,sizeof(info)); if(err==-1){ printf("Connection error"); } //Send a message to server char message[] = {"Hi there"}; char receiveMessage[100] = {}; send(sockfd,message,sizeof(message),0); recv(sockfd,receiveMessage,sizeof(receiveMessage),0); printf("%s",receiveMessage); printf("close Socket\n"); close(sockfd); return 0; } ``` # SOCKETS: UNIX DOMAIN This chapter looks at the use of UNIX domain sockets, which allow communication between processes on the same host system. ## UNIX Domain Socket Addresses: struct sockaddr_un In the UNIX domain, a socket address takes the form of a pathname, and the domain-specific socket address structure is defined as follows: ```c= struct sockaddr_un { sa_family_t sun_family; /* Always AF_UNIX */ char sun_path[108]; /* Null-terminated socket pathname */ }; ``` In order to bind a UNIX domain socket to an address we: 1. initialize a `sockaddr_un` structure 2. pass a (cast) pointer to this structure as the `addr` argument to `bind()` 3. specify `addrlen` as the size of the structure, ```c= const char *SOCKNAME = "/tmp/mysock"; int sfd; struct sockaddr_un addr; sfd = socket(AF_UNIX, SOCK_STREAM, 0); /* Create socket */ if (sfd == -1) errExit("socket"); memset(&addr, 0, sizeof(struct sockaddr_un)); /* Clear structure */ addr.sun_family = AF_UNIX; /* UNIX domain address */ strncpy(addr.sun_path, SOCKNAME, sizeof(addr.sun_path) - 1); if (bind(sfd, (struct sockaddr *) &addr, sizeof(struct sockaddr_un)) == -1) errExit("bind"); ``` - bind() creates an **entry in the file system**. - The ownership of the file is determined **according to the usual rules for file creation** - The file is marked as a socket. When `stat()` is applied to this pathname, it returns the value **S_IFSOCK** in the file-type component of the st_mode field of the stat structure (Section 15.1). - When listed with ls –l, a UNIX domain socket is shown with the **type s** in the first column The following points are worth noting about binding a UNIX domain socket: - We **can’t bind a socket to an existing pathname** - It is usual to bind a socket to an **absolute pathname**, so that the socket resides at a fixed address in the file system. **Using a relative pathname is possible, but unusual** - **A socket may be bound to only one pathname**; conversely, a pathname can be bound to only one socket. - We **can’t use `open()`** to open a socket. - When the socket is no longer required, its pathname entry can (and generally should) be **removed using `unlink()`** ## Stream Sockets in the UNIX Domain We now present a simple client-server application that uses stream sockets in the UNIX domain. Header: ```clike= #include <sys/un.h> #include <sys/socket.h> #include "tlpi_hdr.h" #define SV_SOCK_PATH "/tmp/us_xfr" #define BUF_SIZE 100 ``` Server: ```c= int main(int argc, char * argv[]) { struct sockaddr_un addr; int sfd, cfd; ssize_t numRead; char buf[BUF_SIZE]; sfd = socket(AF_UNIX, SOCK_STREAM, 0); if (sfd == -1) errExit("socket"); /* Construct server socket address, bind socket to it, and make this a listening socket */ // 確保 path 不存在 if (remove(SV_SOCK_PATH) == -1 && errno != ENOENT) errExit("remove-%s", SV_SOCK_PATH); memset( & addr, 0, sizeof(struct sockaddr_un)); addr.sun_family = AF_UNIX; strncpy(addr.sun_path, SV_SOCK_PATH, sizeof(addr.sun_path) - 1); if (bind(sfd, (struct sockaddr * ) & addr, sizeof(struct sockaddr_un)) == -1) errExit("bind"); if (listen(sfd, BACKLOG) == -1) errExit("listen"); for (;;) { /* Handle client connections iteratively */ /* Accept a connection. The connection is returned on a new socket, 'cfd'; the listening socket ('sfd') remains open and can be used to accept further connections. */ cfd = accept(sfd, NULL, NULL); if (cfd == -1) errExit("accept"); /* Transfer data from connected socket to stdout until EOF */ while ((numRead = read(cfd, buf, BUF_SIZE)) > 0) if (write(STDOUT_FILENO, buf, numRead) != numRead) fatal("partial/failed write"); if (numRead == -1) errExit("read"); if (close(cfd) == -1) errMsg("close"); } } ``` Client ```c= int main(int argc, char * argv[]) { struct sockaddr_un addr; int sfd; ssize_t numRead; char buf[BUF_SIZE]; sfd = socket(AF_UNIX, SOCK_STREAM, 0); /* Create client socket */ if (sfd == -1) errExit("socket"); /* Construct server address, and make the connection */ memset( & addr, 0, sizeof(struct sockaddr_un)); addr.sun_family = AF_UNIX; strncpy(addr.sun_path, SV_SOCK_PATH, sizeof(addr.sun_path) - 1); if (connect(sfd, (struct sockaddr * ) & addr, sizeof(struct sockaddr_un)) == -1) errExit("connect"); /* Copy stdin to socket */ while ((numRead = read(STDIN_FILENO, buf, BUF_SIZE)) > 0) if (write(sfd, buf, numRead) != numRead) fatal("partial/failed write"); if (numRead == -1) errExit("read"); exit(EXIT_SUCCESS); /* Closes our socket; server sees EOF */ } ``` running server in the background ```shell= $ ./us_xfr_sv > b & [1] 9866 $ ls -lF /tmp/us_xfr # Examine socket file with ls srwxr-xr-x 1 mtk users 0 Jul 18 10:48 /tmp/us_xfr= ``` create a test file to be used as input for the client, and run the client ```shell= $ cat *.c > a $ ./us_xfr_cl < a # Client takes input from test file ``` At this point, the child has completed. Now we terminate the server as well, and check that the server’s output matches the client’s input: ```shell= $ kill %1 # Terminate server [1]+ Terminated ./us_xfr_sv >b Shell sees server’s termination $ diff a b $ ``` ## Datagram Sockets in the UNIX Domain we stated that communication using **datagram sockets is unreliable.** This is the case for datagrams transferred over a network. **However**, for UNIX domain sockets, datagram transmission is **carried out within the kernel, and is reliable.** Server ```c= int main(int argc, char * argv[]) { struct sockaddr_un svaddr, claddr; int sfd, j; ssize_t numBytes; socklen_t len; char buf[BUF_SIZE]; sfd = socket(AF_UNIX, SOCK_DGRAM, 0); /* Create server socket */ if (sfd == -1) errExit("socket"); /* Construct well-known address and bind server socket to it */ if (remove(SV_SOCK_PATH) == -1 && errno != ENOENT) errExit("remove-%s", SV_SOCK_PATH); memset( & svaddr, 0, sizeof(struct sockaddr_un)); svaddr.sun_family = AF_UNIX; strncpy(svaddr.sun_path, SV_SOCK_PATH, sizeof(svaddr.sun_path) - 1); if (bind(sfd, (struct sockaddr * ) & svaddr, sizeof(struct sockaddr_un)) == -1) errExit("bind"); /* Receive messages, convert to uppercase, and return to client */ for (;;) { len = sizeof(struct sockaddr_un); numBytes = recvfrom(sfd, buf, BUF_SIZE, 0, (struct sockaddr * ) & claddr, & len); if (numBytes == -1) errExit("recvfrom"); printf("Server received %ld bytes from %s\n", (long) numBytes, claddr.sun_path); for (j = 0; j < numBytes; j++) buf[j] = toupper((unsigned char) buf[j]); if (sendto(sfd, buf, numBytes, 0, (struct sockaddr * ) & claddr, len) != numBytes) fatal("sendto"); } } ``` Client: ```c= int main(int argc, char * argv[]) { struct sockaddr_un svaddr, claddr; int sfd, j; size_t msgLen; ssize_t numBytes; char resp[BUF_SIZE]; if (argc < 2 || strcmp(argv[1], "--help") == 0) usageErr("%s msg...\n", argv[0]); /* Create client socket; bind to unique pathname (based on PID) */ sfd = socket(AF_UNIX, SOCK_DGRAM, 0); if (sfd == -1) errExit("socket"); memset( & claddr, 0, sizeof(struct sockaddr_un)); claddr.sun_family = AF_UNIX; snprintf(claddr.sun_path, sizeof(claddr.sun_path), "/tmp/ud_ucase_cl.%ld", (long) getpid()); if (bind(sfd, (struct sockaddr * ) & claddr, sizeof(struct sockaddr_un)) == -1) errExit("bind"); /* Construct address of server */ memset( & svaddr, 0, sizeof(struct sockaddr_un)); svaddr.sun_family = AF_UNIX; strncpy(svaddr.sun_path, SV_SOCK_PATH, sizeof(svaddr.sun_path) - 1); /* Send messages to server; echo responses on stdout */ for (j = 1; j < argc; j++) { msgLen = strlen(argv[j]); /* May be longer than BUF_SIZE */ if (sendto(sfd, argv[j], msgLen, 0, (struct sockaddr * ) & svaddr, sizeof(struct sockaddr_un)) != msgLen) fatal("sendto"); numBytes = recvfrom(sfd, resp, BUF_SIZE, 0, NULL, NULL); if (numBytes == -1) errExit("recvfrom"); printf("Response %d: %.*s\n", j, (int) numBytes, resp); } remove(claddr.sun_path); /* Remove client socket pathname */ exit(EXIT_SUCCESS); } ``` The following shell session log demonstrates the use of the server and client programs: ```shell= $ ./ud_ucase_sv & [1] 20113 $ ./ud_ucase_cl hello world # Send 2 messages to server Server received 5 bytes from /tmp/ud_ucase_cl.20150 Response 1: HELLO Server received 5 bytes from /tmp/ud_ucase_cl.20150 Response 2: WORLD $ ./ud_ucase_cl 'long message' # Send 1 longer message to server Server received 10 bytes from /tmp/ud_ucase_cl.20151 Response 1: LONG MESSA $ kill %1 # Terminate server ``` ## UNIX Domain Socket Permissions The ownership and permissions of the socket file determine which processes are able to communicate with that socket: - To **connect** to a UNIX domain stream socket, **write permission is required** on the socket file. - To **send a datagram** to a UNIX domain datagram socket, **write permission is required** on the socket file. By default, a socket is **created (by `bind()`) with all permissions granted to owner (user), group, and other.** To change this, we can precede the call to `bind()` with a call to `umask()` to disable the permissions that we do not wish to grant. ## The Linux Abstract Socket Namespace A Linux-specific feature that allows us to bind a UNIX domain socket to a name **without that name being created in the file system**. - We don’t need to worry about possible **collisions with existing names** in the file system. - It is not necessary to unlink the socket pathname when we have finished using the socket. The abstract name is **automatically removed when the socket is closed**. - We don’t need to **create a file-system pathname** for the socket. To create an abstract binding, we specify the first byte of the sun_path field as a null byte ```c= struct sockaddr_un addr; memset(&addr, 0, sizeof(struct sockaddr_un)); /* Clear address structure */ addr.sun_family = AF_UNIX; /* UNIX domain address */ /* addr.sun_path[0] has already been set to 0 by memset() */ strncpy(&addr.sun_path[1], "xyz", sizeof(addr.sun_path) - 2); /* Abstract name is "xyz" followed by null bytes */ sockfd = socket(AF_UNIX, SOCK_STREAM, 0); if (sockfd == -1) errExit("socket"); if (bind(sockfd, (struct sockaddr *) &addr, sizeof(struct sockaddr_un)) == -1) errExit("bind"); ``` # SOCKETS: FUNDAMENTALS OF TCP/IP NETWORKS - This chapter provides an introduction to computer networking concepts and the TCP/IP networking protocols. - Starting in this chapter, we begin mentioning various *Request for Comments* (RFC) documents. (Section 58.7) ## Internets - address format is used to identify all hosts in the internet. - Although various internetworking protocols have been devised, **TCP/IP has become the dominant protocol suite** - Figure shows a simple internet. In this diagram, the machine tekapo is an **example of a router**, a computer whose function is to connect one subnetwork to another, transferring data between them. ![](https://i.imgur.com/wq2xEac.png) ## Networking Protocols and Layers - A ***networking protocol*** is a set of rules defining how information is to be transmitted across a network. - The TCP/IP protocol suite is a ***layered*** networking protocol: ![](https://i.imgur.com/kjowF3A.png) - One of the notions that lends great power and flexibility to protocol layering is ***transparency*** - communicating directly with each other via the sockets API ### Encapsulation **higher layer to a lower layer is treated as opaque data by the lower layer.** ![](https://i.imgur.com/QrhU3RV.png) ## The Data-Link Layer The lowest layer is the data-link layer, which consists of the device driver and the hardware interface (network card) to the underlying physical medium - To transfer data, the data-link layer encapsulates datagrams from the network layer into units called ***frames***. - Each frame includes a header containing, for example, the destination address and frame size. - This layer may perform error detection, retransmission, and flow control One characteristic of the data-link layer that is important for our discussion of IP is the ***maximum transmission unit (MTU).*** - A data-link layer’s MTU is the upper limit that the layer places on the size of a frame. ## The Network Layer: IP Above the data-link layer is the network layer, which is concerned with delivering ***packets*** (data) from the source host to the destination host. This layer performs a variety of tasks, including: - breaking data into fragments small enough for transmission via the data-link layer (if necessary); - routing data across the internet; - providing services to the transport layer. The version of IP that appeared in the 4.2BSD implementation was IP version 4 (**IPv4**). In the early 1990s, a revised version of IP was devised: IP version 6 (IPv6). The most notable difference between the two versions is that IPv4 identifies subnets and hosts using **32-bit addresses**, while IPv6 uses **128-bit addresses** ### IP transmits datagrams - IP transmits data in the form of datagrams (packets) - An IP datagram includes a header, which ranges in size from 20 to 60 bytes. - The header contains the address of the target host. - Includes the originating address of the packet - An IP implementation may place an upper limit on the size of datagrams that it supports. - In IPv4, this limit is 576 bytes; - In IPv6, it is 1500 bytes. ### IP is connectionless and unreliable - unreliable protocol - doesn’t guarantee that packets will arrive in the order they were transmitted - won’t be duplicated - IP provide doesn't error recovery - Reliability must be provided either by using a reliable **transport-layer protocol** (e.g., TCP) or within the application itself. ### IP may fragment datagrams When an IP datagram is larger than the MTU, **IP fragments (breaks up) the datagram into suitably sized units for transmission across the network.** These fragments are then reassembled at the final destination to re-create the original datagram. ## IP Addresses An IP address consists of two parts: a network ID, which specifies the network on which a host resides, and a host ID, which identifies the host within that network. ### IPv4 addresses ![](https://hackmd.io/_uploads/H1TyZp4E3.png) When an organization applies for a range of IPv4 addresses for its hosts, it receives a **32-bit network address** and a corresponding **32-bit network mask**. The 1s indicate which part of the address contains the **assigned network ID**, while the 0s indicate which part of the address is available to the organization to assign as **unique host IDs** on its network. ***204.152.189.0/24*** The /24 indicates that the network ID part of the assigned address consists of the leftmost 24 bits, with the remaining 8 bits specifying the host ID. An organization holding this address can assign **254 unique Internet addresses** to its computers—204.152.189.1 through 204.152.189.254. Certain IPv4 addresses have special meanings. The special address **127.0.0.1** is normally defined as the loopback address, and is conventionally assigned the hostname **localhost**. A datagram sent to this address never actually reaches the network, but instead **automatically loops back** to become input to the sending host. ![](https://hackmd.io/_uploads/H1u4VTEN3.png) Typically, IPv4 addresses are ***subnetted***. Subnetting divides the host ID part of an IPv4 address into two parts: a subnet ID and a host ID (Figure 58-6). This combination is usually referred to as the extended network ID. Within a subnet, **the subnet mask serves the same role as described earlier for the network mask**, and we can use a similar notation to indicate the range of addresses assigned to a particular subnet ### IPv6 addresses The principles of IPv6 addresses are similar to IPv4 addresses. The key difference is that IPv6 addresses consist of **128 bits**, and the first few bits of the address are a ***format prefix***, indicating the address type. (We won’t go into the details) IPv6 addresses are typically written as a series of 16-bit hexadecimal numbers separated by colons, as in the following: ***F000:0:0:0:0:0\:A:1*** ***F000:\:A:1*** ![](https://hackmd.io/_uploads/H1GoHpEVn.png) In order to allow IPv6 applications to communicate with hosts supporting only IPv4, IPv6 provides **so-called IPv4-mapped IPv6 addresses.** ## 58.6 The Transport Layer There are two widely used transport-layer protocols in the TCP/IP suite: - **User Datagram Protocol (UDP)** is the protocol used for datagram sockets. - **Transmission Control Protocol (TCP)** is the protocol used for stream sockets. ### 58.6.1 Port Numbers the transport layer requires a method of differentiating the applications on a host. In TCP and UDP, **this differentiation is provided by a 16-bit port number** #### Well-known, registered, and privileged ports - the ssh (secure shell) daemon uses the well-known port 22, - HTTP (the protocol used for communication between web servers and browsers) uses the well-known port 80. Well-known ports are assigned numbers in the range **0 to 1023** by a central authority, the Internet Assigned Numbers Authority(IANA) The range of IANA registered ports is **1024 to 41951**. (Not all port numbers in this range are registered.) IANA specifies the ports in the range **49152 to 65535 as dynamic or private**, 7with the intention that these ports can be used by local applications and assigned as **ephemeral ports.** (暫時性的 port) #### User Datagram Protocol (UDP) UDP adds just two features to IP: **port numbers** and a **data checksum** to allow the detection of errors in the transmitted data. ### 58.6.3 Transmission Control Protocol (TCP) TCP provides a reliable, connection-oriented, bidirectional, byte-stream communication channel between two endpoints (i.e., applications), as shown in Figure 58-8. ![](https://hackmd.io/_uploads/Hy6C3p443.png) #### Connection establishment - Before communication can commence, TCP establishes a communication channel between the two endpoints #### Packaging of data in segments - Data is broken into segments, each of which contains a checksum to allow the detection of end-to-end transmission errors. #### Acknowledgements, retransmissions, and timeouts - When a TCP segment arrives at its destination without errors, the receiving TCP sends a positive **acknowledgement** to the sender, informing it of the **successfully delivered data** - If a segment arrives with errors, then it is discarded, and no acknowledgement is sent. - If an acknowledgement is not received before the timer expires, the segment is retransmitted. #### Sequencing Each byte that is transmitted over a TCP connection is **assigned a logical sequence number**. This number indicates the position of that byte in the data stream for the connection. - passed as a byte stream - identify which TCP segment was received - eliminate duplicate segments #### Flow control Flow control prevents a fast sender from overwhelming a slow receiver. To implement flow control, the receiving TCP maintains a **buffer for incoming data**. The TCP flow-control algorithm employs a so-called **sliding window algorithm**, which allows unacknowledged segments containing a total of up N (the offered window size) bytes to be in transit between the sender and receiver #### Congestion control: slow-start and congestion-avoidance algorithms TCP’s congestion-control algorithms are designed to prevent a fast sender from overwhelming a network. TCP’s congestion-control strategy employs two algorithms in combination: slow start and congestion avoidance. - **slow-start algorithm**: causes the sending TCP to initially transmit segments at a slow rate, but allows it to exponentially increase the rate as these segments are acknowledged by the receiving TCP. - **congestion avoidance,** at the beginning of a connection, the sending TCP starts with a small congestion window, which limits the amount of unacknowledged data that it can transmit. initially **grows exponentially**. However, once the congestion window reaches a certain threshold believed to be close to the transmission capacity of the network, **its growth becomes linear**