CS:APP Proxy Lab 解題筆記

# CS:APP Proxy Lab 解題筆記 ###### tags: `cs:app` 本次實驗配合課本第 10、11 及 12 章，要求學員實作一個基本的 http proxy，並逐步擴增 proxy 的功能，讓 proxy 能同時應付多個並行的 request，最後再實作簡易的 cache 機制，完善 proxy 框架 Proxy (網路代理) 是一種在瀏覽器與服務器之間作為中介的程式，其用途非常多樣，如防火牆、隱藏瀏覽器的相關訊息，或著快取曾經瀏覽過的網頁，proxy 的基本架構如下 ![](https://i.imgur.com/iNQqtUc.png) 其接收客戶端傳來的 request，解析後重新組成 header 再送給 web server，接著 proxy 接收由 server 送來的 response，最後再將 response 送回給 client 結束整個 request-response 流程，所以 proxy 在過程中同時扮演了 client 及 server 的角色 ## Part I: Implementing a sequential web proxy ### main 第一部份要求我們實作一個基本的 sequential proxy 處理 `HTTP/1.0 GET requests`，我們先參考 CMU 提供的 web server 基礎框架 `tiny.c` 寫出 proxy 處理流程的基本邏輯 - 由 command line 取得 proxy 欲監聽的 port - 建立監聽用的 socket，並嘗試使用 `bind` 告訴 os 這個 socket 將用來監聽 - `bind` 成功後，呼叫 `listen` 標示 socket 被動接收外部請求 - 執行無窮迴圈 - 呼叫 `accept` 接收客戶端傳來的請求 - 連線建立後，呼叫 `do_request` 解析並執行請求 - 完成後將連線的 `connfd` 關閉避免 fd 的洩漏問題 ```cpp int main(int argc, char **argv) { int listen_fd, conn_fd; char hostname[MAXLINE], port[MAXLINE]; socklen_t clientlen; struct sockaddr_storage clientaddr; /* check command line args */ if (argc != 2) { fprintf(stderr, "usage: %s <port>\n", argv[0]); exit(1); } /* get listen fd */ int ret, optval=1; struct addrinfo hints, *list_ptr; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; /* Accept connections */ hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */ hints.ai_flags |= AI_NUMERICSERV; /* ... using port number */ if ((ret=getaddrinfo(NULL, argv[1], &hints, &list_ptr)) != 0) { fprintf(stderr, "getaddrinfo failed (port %s): %s\n", argv[1], gai_strerror(ret)); exit(1); } struct addrinfo *ptr; for (ptr = list_ptr; ptr; ptr=ptr->ai_next) { if ((listen_fd=socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol)) < 0) continue; /* Eliminates "Address already in use" error from bind */ setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval , sizeof(int)); if (bind(listen_fd, ptr->ai_addr, ptr->ai_addrlen) == 0) break; /* Success create server listen socker */ /* close fd created to prevent fd leakage */ close(listen_fd); } freeaddrinfo(list_ptr); if (!ptr) exit(1); if (listen(listen_fd, LISTENQ) < 0) { close(listen_fd); fprintf(stderr, "listen failed (port %s)\n", argv[1]); exit(1); } while (1) { clientlen = sizeof(clientaddr); conn_fd = accept(listen_fd, (SA *)&clientaddr, &clientlen); if ((ret=getnameinfo((SA *)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0)) != 0) { fprintf(stderr, "getnameinfo failed, %s\n", gai_strerror(ret)); continue; } printf("Accept connection from (%s, %s)\n", hostname, port); do_request(conn_fd); close(conn_fd); } return 0; } ``` ### do_request `do_request` 要負責 1. 解析客戶端傳來的 request 2. 依照 CMU `write-up` 中的要求，重建 header 3. 向目標服務器建立連線，並傳送 request 4. 從服務器接收 response，並將資訊回傳給客戶端 ```cpp void do_request(int clientfd) { int n_read; int port = 80; char header_to_server[MAXLINE]; char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE]; char hostname[MAXLINE], path[MAXLINE]; rio_t client_myio; Rio_readinitb(&client_myio, clientfd); n_read = Rio_readlineb(&client_myio, buf, MAXLINE); if (!n_read) return; printf("%s", buf); sscanf(buf, "%s %s %s", method, uri, version); if (strcasecmp(method, "GET")) { clienterror(clientfd, method, "501", "Method Not Implemented"); return; } parse_uri(buf, hostname, path, &port); build_header(header_to_server, hostname, path, port, &client_myio); int serverfd; rio_t server_myio; serverfd = connect_server(hostname, port); if (serverfd < 0) { fprintf(stderr, "connect_server failed\n"); return; } /* Read response from server and send to client */ Rio_readinitb(&server_myio, serverfd); /* Send request to server */ Rio_writen(serverfd, header_to_server, strlen(header_to_server)); while ((n_read = Rio_readlineb(&server_myio, buf, MAXLINE)) > 0) { printf("Proxy receives %d bytes from server\n", n_read); Rio_writen(clientfd, buf, n_read); } close(serverfd); } ``` - 1. Request parse - 假設客戶端傳來的 request 為 `GET http://www.cmu.edu/hub/index.html HTTP/1.1`，我們應該要將這段文字拆為 1. method: 對應 `GET` 2. uri: 對應 `http://www.cmu.edu/hub/index.html` 3. version: 對應 `HTTP/1.1` - 判斷 `method` 是否為 `GET`，若否則結束連線 - 將 `uri` 拆為 1. hostname: 對應 `www.cmu.edu` 2. path: 對應 `/hub/index.html` - 另外 `write-up` 要求，如果 port number 在 url 中有指定的話，例如 `http://www.cmu.edu:8080/hub/index.html,`，則 port number 依指定處理，否則預設為 `80` `parse_uri` 稍微注意指標的運用即可，並不會很複雜 ```cpp void parse_uri(char *uri, char *hostname, char *path, int *port) { char *ptr, *host_ptr, *path_ptr, *port_ptr; ptr = strstr(uri, "//"); ptr = uri? ptr+2:uri; port_ptr = strstr(ptr, ":"); if (!port_ptr) /* no specific port number */ { host_ptr = strstr(ptr, "/"); if (!host_ptr) sscanf(ptr, "%s", hostname); else { *host_ptr = '\0'; sscanf(ptr, "%s", hostname); path_ptr = host_ptr; *path_ptr = '/'; sscanf(path_ptr, "%s", path); } } else /* web browser indicate port numer */ { *port_ptr = '\0'; sscanf(ptr, "%s", hostname); port_ptr += 1; sscanf(port_ptr, "%d%s", port, path); } } ``` - 2. Rebuild header 整個 header 可分為以下 7 種類型 1. request header: `GET <path> HTTP/1.0\r\n` 2. host header: `HOST: <hostname>\r\n` 3. connection header: 固定為 `Connection: close\r\n`，代表在完成一次的 request-response 後，就將連線關閉 4. proxy connection head: 固定為 `Proxy-Connection: close\r\n`，理由同上 5. user agent header: 固定為 `User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n`，提供 client 的相關資訊 6. other header: 非以上 5 種的其他 header 7. end header: `\r\n` 我們透過 while 迴圈不斷讀取 buffer 中的資訊，依照上述的類型進行分類，並將對應的內容存進對應的字串 ```cpp void build_header(char *header, char* hostname, char* path, int port, rio_t *myio) { char buf[MAXLINE], request_hrd[MAXLINE], host_hdr[MAXLINE], other_hdr[MAXLINE]; sprintf(request_hrd, "GET %s HTTP/1.0\r\n", path); sprintf(host_hdr, "HOST: %s\r\n", hostname); while (Rio_readlineb(myio, buf, MAXLINE) > 0) { if (!strcmp(buf, end_hdr)) /* EOF */ break; else if (!(strncasecmp(buf, "HOST", 4))) { /* HOST: */ strcpy(host_hdr, buf); } else if (!(strncasecmp(buf, "User-Agent", 10)) || !(strncasecmp(buf, "Proxy-Connection", 16)) || !(strncasecmp(buf, "Connection", 10))) /* headers should not be changed */ continue; else strcat(other_hdr, buf); } sprintf(header, "%s%s%s%s%s%s%s", request_hrd, host_hdr, conn_hdr, proxy_conn_hdr, user_agent_hdr, other_hdr, end_hdr); } ``` - Build connection to end server & send request 利用 hostname 及 port number 建立 client socket，基本上參考課本的範例程式碼撰寫即可 ```cpp int connect_server(char *hostname, int port) { int connfd, ret; struct addrinfo hints, *list_ptr; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; /* Accept connections */ hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */ hints.ai_flags |= AI_NUMERICSERV; /* ... using port number */ char port_str[10]; sprintf(port_str, "%d", port); if ((ret=getaddrinfo(hostname, port_str, &hints, &list_ptr)) != 0) { fprintf(stderr, "getaddrinfo failed (%s:%s): %s\n", hostname, port_str, gai_strerror(ret)); return -2; } struct addrinfo *ptr; for (ptr = list_ptr; ptr; ptr=ptr->ai_next) { if ((connfd=socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol)) < 0) continue; /* Failed */ if ((connect(connfd, ptr->ai_addr, ptr->ai_addrlen)) != -1) break; /* Success */ close(connfd); } freeaddrinfo(list_ptr); if (!ptr) return -1; return connfd; } ``` ## Part II: Dealing with multiple concurrent requests 第二部分要求 proxy 能同時處理多筆連線，本文使用 `thread pool` 的方式實作 - `main` 函數的執行緒負責監聽，並將接收到的請求分派給其他執行緒執行 - 由於使用 thread pool，而非每次連線都建立一個新的執行緒，我們需要一個佇列(queue)，由主執行緒放入待執行的 client socket，並由工作執行緒(worker)從佇列中取出並執行 `main` 函數相應的改變如下，加入佇列的初始化及釋放、thread pool 的建立，將 `do_request` 改為任務放入佇列的動作。 ```cpp void main(int argc, char **argv) { ... ... queue_init(QUEUE_SIZE); pthread_t tid[NUM_THREADS]; for (int i = 0; i < NUM_THREADS; i++) { ret = pthread_create(&tid[i], NULL, thread_fn, NULL); if (ret != 0) fprintf(stderr, "pthread_create failed, idx %d\n", i); } while (1) { clientlen = sizeof(clientaddr); conn_fd = accept(listen_fd, (SA *)&clientaddr, &clientlen); if ((ret=getnameinfo((SA *)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0)) != 0) { fprintf(stderr, "getnameinfo failed, %s\n", gai_strerror(ret)); continue; } printf("Accept connection from (%s, %s)\n", hostname, port); queue_put(buf_queue, conn_fd); } queue_free(buf_queue); return 0; } ``` `do_request` 改由 worker 執行，每個執行緒都以無窮迴圈監聽佇列是否有新的任務到來。另外我們不希望使用 `pthread_join` 阻塞主執行緒回收資源，因此必需呼叫 `pthread_detach` ，讓執行緒結束後自動回收，以下節自 man ``` The pthread_detach() function marks the thread identified by thread as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with the terminated thread. ``` ```cpp void *thread_fn(void *vargp) { int ret, conn_fd; ret = pthread_detach(pthread_self()); if (ret != 0) fprintf(stderr, "pthread_detach failed, tid: %ld\n", pthread_self()); while (1) { conn_fd = queue_get(buf_queue); do_request(conn_fd); close(conn_fd); } } ``` 由於為多執行緒的程序，參考課本 `sbuf` 包，利用 `semaphore` 實作一個簡單的 `thread-safe` 佇列 - `insert` 用來通知佇列中是否有空間可以放置新任務的信號量 - `remove` 用來通知佇列中是否有任務待執行的信號量 - `mutext` 確保同一時間只有一個執行緒能夠改變佇列的內容 ```cpp /* FIFO thread-safe queue */ typedef struct { int *buf; /* buffer */ int maxnum; /* max number of items in buffer */ int front; /* (front + 1) = first item in buffer */ int rear; /* (rear + 1) = last item in buffer */ sem_t insert; sem_t remove; sem_t mutex; } queue_t; void queue_init(int n) { buf_queue = malloc(sizeof(queue_t)); if (!buf_queue) { fprintf(stderr, "queue_init failed to allocate memory for queue_t\n"); exit(1); } buf_queue->buf = calloc(0, sizeof(int) * n); if (!buf_queue->buf) { fprintf(stderr, "queue_init failed to allocate memory for buffer\n"); exit(1); } buf_queue->maxnum = n; buf_queue->front = 0; buf_queue->rear = 0; sem_init(&buf_queue->mutex, 0, 1); sem_init(&buf_queue->insert, 0, n); sem_init(&buf_queue->remove, 0, 0); } void queue_free(queue_t *queue) { free(queue->buf); free(queue); } void queue_put(queue_t *queue, int item) { sem_wait(&queue->insert); sem_wait(&queue->mutex); queue->buf[(++queue->rear) % (queue->maxnum)] = item; sem_post(&queue->mutex); sem_post(&queue->remove); } int queue_get(queue_t *queue) { int item; sem_wait(&queue->remove); sem_wait(&queue->mutex); item = queue->buf[(++queue->front) % (queue->maxnum)]; sem_post(&queue->mutex); sem_post(&queue->insert); return item; } ``` ## Part III: Caching web objects 第三部分要求 proxy 擁有快取的功能，並以 LRU 策略進行快取替換。快取的讀取不得有 race condition 發生，對於 writer 來說不困難，使用 mutex 處理即可，但對於 reader 來說，因為同時可能有多個 readers 讀取同一個快取，但又要求使用 LRU 策略，在讀取發生時我們也必需更改快取的時間戳， race condition 處理起來比較麻煩，針對這點 CMU 在 `write-up` 中特別說明不用嚴格遵循 LRU 策略，讓同步問題比較容易處理 ``` As such, protecting accesses to the cache with one large exclusive lock is not an acceptable solution. You may want to explore options such as partitioning the cache, using Pthreads readers-writers locks, or using semaphores to implement your own readers-writers solution. In either case, the fact that you don’t have to implement a strictly LRU eviction policy will give you some flexibility in supporting multiple readers. ``` 主程式邏輯與第二部分幾乎相同，差別僅在於要先判斷 uri 是否在快取中，若在快取中則直接將內容回傳給客戶端，若不在快取中，則判斷請求的內容大小是否小於要求的 `100 KB`，若小於 `100 KB` 則以 LRU 策略替換快取內容 ```cpp /* if uri in cache, send content to client from cache */ void do_request(int clientfd) { ... ... int is_cached; is_cached = cache_srch(cache_blks, uri, clientfd); if (!is_cached) { ... ... size_t read_bytes = 0; char cache_buf[MAX_OBJECT_SIZE]; Rio_writen(serverfd, header_to_server, strlen(header_to_server)); while ((n_read = Rio_readlineb(&server_myio, buf, MAXLINE)) > 0) { printf("Proxy receives %d bytes from server\n", n_read); Rio_writen(clientfd, buf, n_read); read_bytes += n_read; if (read_bytes < MAX_OBJECT_SIZE) strcat(cache_buf, buf); } close(serverfd); /* if total size of response < 100 kb, cache content and evict with LRU*/ if (read_bytes < MAX_OBJECT_SIZE) cache_evict(cache_blks, uri, cache_buf, read_bytes); } } ``` 課本給出的參考範例為 First reader-writer problem 模型，為讀者優先的架構，我想嘗試不同於課本的方法，決定採用寫者優先的 Second reader-writer problem，首先我們建立快取的結構體 - valid: 快取是否可讀 - stamp: 時間戳 - readcnt: 讀者的計數 - writecnt: 寫者的計數 - bufsize: 快取內容的 bytes 數 - uri: 紀錄快取的 uri，用來比對請求是否在快取內 - buf: 紀錄快取的內容 - ready: 通知現在是否有寫者在寫的信號量 - reader: 讀者的 mutex - writer: 寫者的 mutex - resource: 寫入 buf 的 mutex ```cpp typedef struct { int valid; int stamp; int readcnt; int writecnt; size_t bufsize; char uri[MAXLINE]; char buf[MAX_OBJECT_SIZE]; sem_t ready; sem_t reader; sem_t writer; sem_t resource; } cache_t; ``` Second reader-writer problem 引入 `ready` 這個信號量，讀者在進入 critical section(cs) 之前，會等待 `ready` 的信號 (因為是 Semaphore，以 signal 而非 owner 解釋比較恰當)，代表一次只能有一個讀者進入，其他讀者都會被阻塞，而只要第一個寫者被通知 `ready` ，後續的寫者都可以接續進入 cs，此時讀者必需等待所有的寫者完成(還有可能被插隊)，才有機會進入 cs ，藉此達到寫者優先的目的 ```cpp #include "cache.h" void cache_init(cache_t *caches, int nblk) { for (int i = 0; i < nblk; i++) { caches[i].valid = 0; caches[i].stamp = 0; caches[i].bufsize = 0; caches[i].readcnt = 0; caches[i].writecnt = 0; sem_init(&caches[i].ready, 0, 1); sem_init(&caches[i].reader, 0, 1); sem_init(&caches[i].writer, 0, 1); sem_init(&caches[i].resource, 0, 1); } } int cache_srch(cache_t *caches, char *uri, int connfd) { int cached = 0; int to_write = 0; for (int i = 0; i < NUM_CACHE_BLK; i++) { sem_wait(&caches[i].ready); sem_wait(&caches[i].reader); caches[i].readcnt++; if (caches[i].readcnt == 1) sem_wait(&caches[i].writer); if (caches[i].valid && !strcmp(caches[i].uri, uri)) { cached = 1; to_write = 1; caches[i].stamp = 0; } else caches[i].stamp++; if (cached && to_write) { Rio_writen(connfd, caches[i].buf, caches[i].bufsize); to_write = 0; } caches[i].readcnt--; if (caches[i].readcnt == 0) sem_post(&caches[i].writer); sem_post(&caches[i].reader); sem_post(&caches[i].ready); } return cached; } void cache_evict(cache_t *caches, char *uri, char *buf, size_t nbytes) { int max_stamp = INT_MIN; int idx; for (int i = 0; i < NUM_CACHE_BLK; i++) { if (!caches[i].valid) { idx = i; break; } if (caches[i].stamp > max_stamp) { idx = i; max_stamp = caches[i].stamp; } } sem_wait(&caches[idx].writer); caches[idx].writecnt++; if (caches[idx].writecnt == 1) sem_wait(&caches[idx].ready); sem_post(&caches[idx].writer); sem_wait(&caches[idx].resource); caches[idx].valid = 1; caches[idx].stamp = 0; caches[idx].bufsize = nbytes; strcpy(caches[idx].uri, uri); strcpy(caches[idx].buf, buf); sem_post(&caches[idx].resource); sem_wait(&caches[idx].writer); caches[idx].writecnt--; if (caches[idx].writecnt == 0) sem_post(&caches[idx].ready); sem_post(&caches[idx].writer); return; } ``` - 測試結果 ``` *** Basic *** Starting tiny on 19016 Starting proxy on 27627 1: home.html Fetching ./tiny/home.html into ./.proxy using the proxy Fetching ./tiny/home.html into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 2: csapp.c Fetching ./tiny/csapp.c into ./.proxy using the proxy Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 3: tiny.c Fetching ./tiny/tiny.c into ./.proxy using the proxy Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 4: godzilla.jpg Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 5: tiny Fetching ./tiny/tiny into ./.proxy using the proxy Fetching ./tiny/tiny into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. Killing tiny and proxy basicScore: 40/40 *** Concurrency *** Starting tiny on port 3809 Starting proxy on port 6721 Starting the blocking NOP server on port 11337 Trying to fetch a file from the blocking nop-server Fetching ./tiny/home.html into ./.noproxy directly from Tiny Fetching ./tiny/home.html into ./.proxy using the proxy Checking whether the proxy fetch succeeded Success: Was able to fetch tiny/home.html from the proxy. Killing tiny, proxy, and nop-server concurrencyScore: 15/15 *** Cache *** Starting tiny on port 31729 Starting proxy on port 25563 Fetching ./tiny/tiny.c into ./.proxy using the proxy Fetching ./tiny/home.html into ./.proxy using the proxy Fetching ./tiny/csapp.c into ./.proxy using the proxy Killing tiny Fetching a cached copy of ./tiny/home.html into ./.noproxy Success: Was able to fetch tiny/home.html from the cache. Killing proxy cacheScore: 15/15 totalScore: 70/70 ``` :::success totalScore: 70/70 PASS ::: 完整程式碼請參考 [github 連結](https://github.com/Chang-Chia-Chi/CSAPP-Labs/tree/main/Proxy-Lab) ## Refernece 1. [Computer Systems: A Programmer’s Perspective, 3/E (CS:APP3e)](http://csapp.cs.cmu.edu/3e/labs.html)