# CS:APP Proxy Lab 解題筆記
###### tags: `cs:app`
本次實驗配合課本第 10、11 及 12 章,要求學員實作一個基本的 http proxy,並逐步擴增 proxy 的功能,讓 proxy 能同時應付多個並行的 request,最後再實作簡易的 cache 機制,完善 proxy 框架
Proxy (網路代理) 是一種在瀏覽器與服務器之間作為中介的程式,其用途非常多樣,如防火牆、隱藏瀏覽器的相關訊息,或著快取曾經瀏覽過的網頁,proxy 的基本架構如下

其接收客戶端傳來的 request,解析後重新組成 header 再送給 web server,接著 proxy 接收由 server 送來的 response,最後再將 response 送回給 client 結束整個 request-response 流程,所以 proxy 在過程中同時扮演了 client 及 server 的角色
## Part I: Implementing a sequential web proxy
### main
第一部份要求我們實作一個基本的 sequential proxy 處理 `HTTP/1.0 GET requests`,我們先參考 CMU 提供的 web server 基礎框架 `tiny.c` 寫出 proxy 處理流程的基本邏輯
- 由 command line 取得 proxy 欲監聽的 port
- 建立監聽用的 socket,並嘗試使用 `bind` 告訴 os 這個 socket 將用來監聽
- `bind` 成功後,呼叫 `listen` 標示 socket 被動接收外部請求
- 執行無窮迴圈
- 呼叫 `accept` 接收客戶端傳來的請求
- 連線建立後,呼叫 `do_request` 解析並執行請求
- 完成後將連線的 `connfd` 關閉避免 fd 的洩漏問題
```cpp
int main(int argc, char **argv)
{
int listen_fd, conn_fd;
char hostname[MAXLINE], port[MAXLINE];
socklen_t clientlen;
struct sockaddr_storage clientaddr;
/* check command line args */
if (argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(1);
}
/* get listen fd */
int ret, optval=1;
struct addrinfo hints, *list_ptr;
memset(&hints, 0, sizeof(struct addrinfo));
hints.ai_socktype = SOCK_STREAM; /* Accept connections */
hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */
hints.ai_flags |= AI_NUMERICSERV; /* ... using port number */
if ((ret=getaddrinfo(NULL, argv[1], &hints, &list_ptr)) != 0) {
fprintf(stderr, "getaddrinfo failed (port %s): %s\n", argv[1], gai_strerror(ret));
exit(1);
}
struct addrinfo *ptr;
for (ptr = list_ptr; ptr; ptr=ptr->ai_next) {
if ((listen_fd=socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol)) < 0)
continue;
/* Eliminates "Address already in use" error from bind */
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int));
if (bind(listen_fd, ptr->ai_addr, ptr->ai_addrlen) == 0)
break; /* Success create server listen socker */
/* close fd created to prevent fd leakage */
close(listen_fd);
}
freeaddrinfo(list_ptr);
if (!ptr) exit(1);
if (listen(listen_fd, LISTENQ) < 0) {
close(listen_fd);
fprintf(stderr, "listen failed (port %s)\n", argv[1]);
exit(1);
}
while (1) {
clientlen = sizeof(clientaddr);
conn_fd = accept(listen_fd, (SA *)&clientaddr, &clientlen);
if ((ret=getnameinfo((SA *)&clientaddr, clientlen, hostname,
MAXLINE, port, MAXLINE, 0)) != 0) {
fprintf(stderr, "getnameinfo failed, %s\n", gai_strerror(ret));
continue;
}
printf("Accept connection from (%s, %s)\n", hostname, port);
do_request(conn_fd);
close(conn_fd);
}
return 0;
}
```
### do_request
`do_request` 要負責
1. 解析客戶端傳來的 request
2. 依照 CMU `write-up` 中的要求,重建 header
3. 向目標服務器建立連線,並傳送 request
4. 從服務器接收 response,並將資訊回傳給客戶端
```cpp
void do_request(int clientfd) {
int n_read;
int port = 80;
char header_to_server[MAXLINE];
char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];
char hostname[MAXLINE], path[MAXLINE];
rio_t client_myio;
Rio_readinitb(&client_myio, clientfd);
n_read = Rio_readlineb(&client_myio, buf, MAXLINE);
if (!n_read)
return;
printf("%s", buf);
sscanf(buf, "%s %s %s", method, uri, version);
if (strcasecmp(method, "GET")) {
clienterror(clientfd, method, "501", "Method Not Implemented");
return;
}
parse_uri(buf, hostname, path, &port);
build_header(header_to_server, hostname, path, port, &client_myio);
int serverfd;
rio_t server_myio;
serverfd = connect_server(hostname, port);
if (serverfd < 0) {
fprintf(stderr, "connect_server failed\n");
return;
}
/* Read response from server and send to client */
Rio_readinitb(&server_myio, serverfd);
/* Send request to server */
Rio_writen(serverfd, header_to_server, strlen(header_to_server));
while ((n_read = Rio_readlineb(&server_myio, buf, MAXLINE)) > 0) {
printf("Proxy receives %d bytes from server\n", n_read);
Rio_writen(clientfd, buf, n_read);
}
close(serverfd);
}
```
- 1. Request parse
- 假設客戶端傳來的 request 為 `GET http://www.cmu.edu/hub/index.html HTTP/1.1`,我們應該要將這段文字拆為
1. method: 對應 `GET`
2. uri: 對應 `http://www.cmu.edu/hub/index.html`
3. version: 對應 `HTTP/1.1`
- 判斷 `method` 是否為 `GET`,若否則結束連線
- 將 `uri` 拆為
1. hostname: 對應 `www.cmu.edu`
2. path: 對應 `/hub/index.html`
- 另外 `write-up` 要求,如果 port number 在 url 中有指定的話,例如 `http://www.cmu.edu:8080/hub/index.html,`,則 port number 依指定處理,否則預設為 `80`
`parse_uri` 稍微注意指標的運用即可,並不會很複雜
```cpp
void parse_uri(char *uri, char *hostname, char *path, int *port) {
char *ptr, *host_ptr, *path_ptr, *port_ptr;
ptr = strstr(uri, "//");
ptr = uri? ptr+2:uri;
port_ptr = strstr(ptr, ":");
if (!port_ptr) /* no specific port number */
{
host_ptr = strstr(ptr, "/");
if (!host_ptr)
sscanf(ptr, "%s", hostname);
else {
*host_ptr = '\0';
sscanf(ptr, "%s", hostname);
path_ptr = host_ptr;
*path_ptr = '/';
sscanf(path_ptr, "%s", path);
}
}
else /* web browser indicate port numer */
{
*port_ptr = '\0';
sscanf(ptr, "%s", hostname);
port_ptr += 1;
sscanf(port_ptr, "%d%s", port, path);
}
}
```
- 2. Rebuild header
整個 header 可分為以下 7 種類型
1. request header: `GET <path> HTTP/1.0\r\n`
2. host header: `HOST: <hostname>\r\n`
3. connection header: 固定為 `Connection: close\r\n`,代表在完成一次的 request-response 後,就將連線關閉
4. proxy connection head: 固定為 `Proxy-Connection: close\r\n`,理由同上
5. user agent header: 固定為 `User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n`,提供 client 的相關資訊
6. other header: 非以上 5 種的其他 header
7. end header: `\r\n`
我們透過 while 迴圈不斷讀取 buffer 中的資訊,依照上述的類型進行分類,並將對應的內容存進對應的字串
```cpp
void build_header(char *header, char* hostname, char* path, int port, rio_t *myio) {
char buf[MAXLINE], request_hrd[MAXLINE], host_hdr[MAXLINE], other_hdr[MAXLINE];
sprintf(request_hrd, "GET %s HTTP/1.0\r\n", path);
sprintf(host_hdr, "HOST: %s\r\n", hostname);
while (Rio_readlineb(myio, buf, MAXLINE) > 0) {
if (!strcmp(buf, end_hdr)) /* EOF */
break;
else if (!(strncasecmp(buf, "HOST", 4))) { /* HOST: */
strcpy(host_hdr, buf);
}
else if (!(strncasecmp(buf, "User-Agent", 10)) ||
!(strncasecmp(buf, "Proxy-Connection", 16)) ||
!(strncasecmp(buf, "Connection", 10))) /* headers should not be changed */
continue;
else
strcat(other_hdr, buf);
}
sprintf(header, "%s%s%s%s%s%s%s",
request_hrd,
host_hdr,
conn_hdr,
proxy_conn_hdr,
user_agent_hdr,
other_hdr,
end_hdr);
}
```
- Build connection to end server & send request
利用 hostname 及 port number 建立 client socket,基本上參考課本的範例程式碼撰寫即可
```cpp
int connect_server(char *hostname, int port) {
int connfd, ret;
struct addrinfo hints, *list_ptr;
memset(&hints, 0, sizeof(struct addrinfo));
hints.ai_socktype = SOCK_STREAM; /* Accept connections */
hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */
hints.ai_flags |= AI_NUMERICSERV; /* ... using port number */
char port_str[10];
sprintf(port_str, "%d", port);
if ((ret=getaddrinfo(hostname, port_str, &hints, &list_ptr)) != 0) {
fprintf(stderr, "getaddrinfo failed (%s:%s): %s\n", hostname, port_str, gai_strerror(ret));
return -2;
}
struct addrinfo *ptr;
for (ptr = list_ptr; ptr; ptr=ptr->ai_next) {
if ((connfd=socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol)) < 0)
continue; /* Failed */
if ((connect(connfd, ptr->ai_addr, ptr->ai_addrlen)) != -1)
break; /* Success */
close(connfd);
}
freeaddrinfo(list_ptr);
if (!ptr) return -1;
return connfd;
}
```
## Part II: Dealing with multiple concurrent requests
第二部分要求 proxy 能同時處理多筆連線,本文使用 `thread pool` 的方式實作
- `main` 函數的執行緒負責監聽,並將接收到的請求分派給其他執行緒執行
- 由於使用 thread pool,而非每次連線都建立一個新的執行緒,我們需要一個佇列(queue),由主執行緒放入待執行的 client socket,並由工作執行緒(worker)從佇列中取出並執行
`main` 函數相應的改變如下,加入佇列的初始化及釋放、thread pool 的建立,將 `do_request` 改為任務放入佇列的動作。
```cpp
void main(int argc, char **argv) {
...
...
queue_init(QUEUE_SIZE);
pthread_t tid[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
ret = pthread_create(&tid[i], NULL, thread_fn, NULL);
if (ret != 0)
fprintf(stderr, "pthread_create failed, idx %d\n", i);
}
while (1) {
clientlen = sizeof(clientaddr);
conn_fd = accept(listen_fd, (SA *)&clientaddr, &clientlen);
if ((ret=getnameinfo((SA *)&clientaddr, clientlen, hostname,
MAXLINE, port, MAXLINE, 0)) != 0) {
fprintf(stderr, "getnameinfo failed, %s\n", gai_strerror(ret));
continue;
}
printf("Accept connection from (%s, %s)\n", hostname, port);
queue_put(buf_queue, conn_fd);
}
queue_free(buf_queue);
return 0;
}
```
`do_request` 改由 worker 執行,每個執行緒都以無窮迴圈監聽佇列是否有新的任務到來。另外我們不希望使用 `pthread_join` 阻塞主執行緒回收資源,因此必需呼叫 `pthread_detach` ,讓執行緒結束後自動回收,以下節自 man
```
The pthread_detach() function marks the thread identified by thread as detached.
When a detached thread terminates, its resources are automatically released back
to the system without the need for another thread to join with the terminated thread.
```
```cpp
void *thread_fn(void *vargp) {
int ret, conn_fd;
ret = pthread_detach(pthread_self());
if (ret != 0)
fprintf(stderr, "pthread_detach failed, tid: %ld\n", pthread_self());
while (1) {
conn_fd = queue_get(buf_queue);
do_request(conn_fd);
close(conn_fd);
}
}
```
由於為多執行緒的程序,參考課本 `sbuf` 包,利用 `semaphore` 實作一個簡單的 `thread-safe` 佇列
- `insert` 用來通知佇列中是否有空間可以放置新任務的信號量
- `remove` 用來通知佇列中是否有任務待執行的信號量
- `mutext` 確保同一時間只有一個執行緒能夠改變佇列的內容
```cpp
/* FIFO thread-safe queue */
typedef struct {
int *buf; /* buffer */
int maxnum; /* max number of items in buffer */
int front; /* (front + 1) = first item in buffer */
int rear; /* (rear + 1) = last item in buffer */
sem_t insert;
sem_t remove;
sem_t mutex;
} queue_t;
void queue_init(int n) {
buf_queue = malloc(sizeof(queue_t));
if (!buf_queue) {
fprintf(stderr, "queue_init failed to allocate memory for queue_t\n");
exit(1);
}
buf_queue->buf = calloc(0, sizeof(int) * n);
if (!buf_queue->buf) {
fprintf(stderr, "queue_init failed to allocate memory for buffer\n");
exit(1);
}
buf_queue->maxnum = n;
buf_queue->front = 0;
buf_queue->rear = 0;
sem_init(&buf_queue->mutex, 0, 1);
sem_init(&buf_queue->insert, 0, n);
sem_init(&buf_queue->remove, 0, 0);
}
void queue_free(queue_t *queue) {
free(queue->buf);
free(queue);
}
void queue_put(queue_t *queue, int item) {
sem_wait(&queue->insert);
sem_wait(&queue->mutex);
queue->buf[(++queue->rear) % (queue->maxnum)] = item;
sem_post(&queue->mutex);
sem_post(&queue->remove);
}
int queue_get(queue_t *queue) {
int item;
sem_wait(&queue->remove);
sem_wait(&queue->mutex);
item = queue->buf[(++queue->front) % (queue->maxnum)];
sem_post(&queue->mutex);
sem_post(&queue->insert);
return item;
}
```
## Part III: Caching web objects
第三部分要求 proxy 擁有快取的功能,並以 LRU 策略進行快取替換。快取的讀取不得有 race condition 發生,對於 writer 來說不困難,使用 mutex 處理即可,但對於 reader 來說,因為同時可能有多個 readers 讀取同一個快取,但又要求使用 LRU 策略,在讀取發生時我們也必需更改快取的時間戳, race condition 處理起來比較麻煩,針對這點 CMU 在 `write-up` 中特別說明不用嚴格遵循 LRU 策略,讓同步問題比較容易處理
```
As such, protecting accesses to the cache with one large exclusive lock is not an
acceptable solution. You may want to explore options such as partitioning the cache,
using Pthreads readers-writers locks, or using semaphores to implement your own
readers-writers solution. In either case, the fact that you don’t have to
implement a strictly LRU eviction policy will give you some flexibility in supporting
multiple readers.
```
主程式邏輯與第二部分幾乎相同,差別僅在於要先判斷 uri 是否在快取中,若在快取中則直接將內容回傳給客戶端,若不在快取中,則判斷請求的內容大小是否小於要求的 `100 KB`,若小於 `100 KB` 則以 LRU 策略替換快取內容
```cpp
/* if uri in cache, send content to client from cache */
void do_request(int clientfd) {
...
...
int is_cached;
is_cached = cache_srch(cache_blks, uri, clientfd);
if (!is_cached) {
...
...
size_t read_bytes = 0;
char cache_buf[MAX_OBJECT_SIZE];
Rio_writen(serverfd, header_to_server, strlen(header_to_server));
while ((n_read = Rio_readlineb(&server_myio, buf, MAXLINE)) > 0) {
printf("Proxy receives %d bytes from server\n", n_read);
Rio_writen(clientfd, buf, n_read);
read_bytes += n_read;
if (read_bytes < MAX_OBJECT_SIZE) strcat(cache_buf, buf);
}
close(serverfd);
/* if total size of response < 100 kb, cache content and evict with LRU*/
if (read_bytes < MAX_OBJECT_SIZE)
cache_evict(cache_blks, uri, cache_buf, read_bytes);
}
}
```
課本給出的參考範例為 First reader-writer problem 模型,為讀者優先的架構,我想嘗試不同於課本的方法,決定採用寫者優先的 Second reader-writer problem,首先我們建立快取的結構體
- valid: 快取是否可讀
- stamp: 時間戳
- readcnt: 讀者的計數
- writecnt: 寫者的計數
- bufsize: 快取內容的 bytes 數
- uri: 紀錄快取的 uri,用來比對請求是否在快取內
- buf: 紀錄快取的內容
- ready: 通知現在是否有寫者在寫的信號量
- reader: 讀者的 mutex
- writer: 寫者的 mutex
- resource: 寫入 buf 的 mutex
```cpp
typedef struct {
int valid;
int stamp;
int readcnt;
int writecnt;
size_t bufsize;
char uri[MAXLINE];
char buf[MAX_OBJECT_SIZE];
sem_t ready;
sem_t reader;
sem_t writer;
sem_t resource;
} cache_t;
```
Second reader-writer problem 引入 `ready` 這個信號量,讀者在進入 critical section(cs) 之前,會等待 `ready` 的信號 (因為是 Semaphore,以 signal 而非 owner 解釋比較恰當),代表一次只能有一個讀者進入,其他讀者都會被阻塞,而只要第一個寫者被通知 `ready` ,後續的寫者都可以接續進入 cs,此時讀者必需等待所有的寫者完成(還有可能被插隊),才有機會進入 cs ,藉此達到寫者優先的目的
```cpp
#include "cache.h"
void cache_init(cache_t *caches, int nblk) {
for (int i = 0; i < nblk; i++) {
caches[i].valid = 0;
caches[i].stamp = 0;
caches[i].bufsize = 0;
caches[i].readcnt = 0;
caches[i].writecnt = 0;
sem_init(&caches[i].ready, 0, 1);
sem_init(&caches[i].reader, 0, 1);
sem_init(&caches[i].writer, 0, 1);
sem_init(&caches[i].resource, 0, 1);
}
}
int cache_srch(cache_t *caches, char *uri, int connfd) {
int cached = 0;
int to_write = 0;
for (int i = 0; i < NUM_CACHE_BLK; i++) {
sem_wait(&caches[i].ready);
sem_wait(&caches[i].reader);
caches[i].readcnt++;
if (caches[i].readcnt == 1)
sem_wait(&caches[i].writer);
if (caches[i].valid && !strcmp(caches[i].uri, uri)) {
cached = 1;
to_write = 1;
caches[i].stamp = 0;
}
else caches[i].stamp++;
if (cached && to_write) {
Rio_writen(connfd, caches[i].buf, caches[i].bufsize);
to_write = 0;
}
caches[i].readcnt--;
if (caches[i].readcnt == 0)
sem_post(&caches[i].writer);
sem_post(&caches[i].reader);
sem_post(&caches[i].ready);
}
return cached;
}
void cache_evict(cache_t *caches, char *uri, char *buf, size_t nbytes) {
int max_stamp = INT_MIN;
int idx;
for (int i = 0; i < NUM_CACHE_BLK; i++) {
if (!caches[i].valid) {
idx = i;
break;
}
if (caches[i].stamp > max_stamp) {
idx = i;
max_stamp = caches[i].stamp;
}
}
sem_wait(&caches[idx].writer);
caches[idx].writecnt++;
if (caches[idx].writecnt == 1)
sem_wait(&caches[idx].ready);
sem_post(&caches[idx].writer);
sem_wait(&caches[idx].resource);
caches[idx].valid = 1;
caches[idx].stamp = 0;
caches[idx].bufsize = nbytes;
strcpy(caches[idx].uri, uri);
strcpy(caches[idx].buf, buf);
sem_post(&caches[idx].resource);
sem_wait(&caches[idx].writer);
caches[idx].writecnt--;
if (caches[idx].writecnt == 0)
sem_post(&caches[idx].ready);
sem_post(&caches[idx].writer);
return;
}
```
- 測試結果
```
*** Basic ***
Starting tiny on 19016
Starting proxy on 27627
1: home.html
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
2: csapp.c
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
3: tiny.c
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
4: godzilla.jpg
Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy
Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
5: tiny
Fetching ./tiny/tiny into ./.proxy using the proxy
Fetching ./tiny/tiny into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
Killing tiny and proxy
basicScore: 40/40
*** Concurrency ***
Starting tiny on port 3809
Starting proxy on port 6721
Starting the blocking NOP server on port 11337
Trying to fetch a file from the blocking nop-server
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Fetching ./tiny/home.html into ./.proxy using the proxy
Checking whether the proxy fetch succeeded
Success: Was able to fetch tiny/home.html from the proxy.
Killing tiny, proxy, and nop-server
concurrencyScore: 15/15
*** Cache ***
Starting tiny on port 31729
Starting proxy on port 25563
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Killing tiny
Fetching a cached copy of ./tiny/home.html into ./.noproxy
Success: Was able to fetch tiny/home.html from the cache.
Killing proxy
cacheScore: 15/15
totalScore: 70/70
```
:::success
totalScore: 70/70 PASS
:::
完整程式碼請參考 [github 連結](https://github.com/Chang-Chia-Chi/CSAPP-Labs/tree/main/Proxy-Lab)
## Refernece
1. [Computer Systems: A Programmer’s Perspective, 3/E (CS:APP3e)](http://csapp.cs.cmu.edu/3e/labs.html)