# CS:APP 3/e Proxy Lab contributed by < `type59ty` > ###### tags: `sysprog2018` > [原始程式碼](https://github.com/type59ty/proxylab) ## 事前準備 - Download the [handout](http://csapp.cs.cmu.edu/3e/labs.html) - Study the [write up](http://csapp.cs.cmu.edu/3e/proxylab.pdf) - Study CSAPP ch 10,11,12 ## 參考資料 - [代理伺服器](https://zh.wikipedia.org/zh-tw/%E4%BB%A3%E7%90%86%E6%9C%8D%E5%8A%A1%E5%99%A8) - [區網控制者: Proxy 伺服器](http://linux.vbird.org/linux_server/0420squid.php) - [CSAPP: Proxy lab](https://blog.csdn.net/u012336567/article/details/52056089) ## Proxy 介紹 Web proxy 是一種 Web browser 跟 end server 之間的中介程式,使用者在瀏覽網頁時並不是直接連到 end server ,而是透過 proxy 接收 request , 由 proxy 將 request 發送給 end server , 再由 proxy 將 end server 的回應 (e.g 網頁內容) 傳送到使用者的 browser ,因此 proxy 同時扮演 client 和 server 兩種角色。 一些閘道器、路由器等網路裝置具備 proxy 功能。一般認為 proxy 服務有利於保障網路終端的隱私或安全,防止攻擊。 ![](https://i.imgur.com/C1ZVXzL.png) ## 作業要求 寫一個簡單的 HTTP proxy , 可以將 web 內容暫存。 此 lab 有3個部份要完成: 1. 設定 proxy 基本功能,接收 incoming connections,讀取並解析 request , forward requests 到 web servers,讀取 server 的 responses,最後將 responds forward 給對應的 clients。 此部份將學到基本的 HTTP 操作、了解如何運用 sockets 寫一個能在網路上溝通的程式。 2. 將 proxy 擴充,使其能夠同時處理多個連線。 3. 加入 cache 機制,用一個簡單的 main memory cache 記錄最近連線的網頁內容。 ## Practice : echo server and client 根據課本 p.663, 664,建立一個簡單的 client 和 server 程式,藉此熟悉這些 function 的操作。 第 15 行的 Open_clientfd 用來建立與 server 的連接 - echoclient.c ```c= #include "csapp.h" int main(int argc, char **argv) { int clientfd; char *host, *port, buf[MAXLINE]; rio_t rio; if (argc != 3) { fprintf(stderr, "usage: %s <host> <port>\n", argv[0]); exit(0); } host = argv[1]; port = argv[2]; clientfd = Open_clientfd(host, port); Rio_readinitb(&rio, clientfd); while (Fgets(buf, MAXLINE, stdin) != NULL) { Rio_writen(clientfd, buf, strlen(buf)); Rio_readlineb(&rio, buf, MAXLINE); Fputs(buf, stdout); } Close(clientfd); exit(0); } ``` - echoserver.c ```c= #include "csapp.h" void echo(int connfd); int main(int argc, char **argv) { int listenfd, connfd; socklen_t clientlen; struct sockaddr_storage clientaddr; char client_hostname[MAXLINE], client_port[MAXLINE]; if (argc != 2) { fprintf(stderr, "usage: %s <port>\n", argv[0]); exit(0); } listenfd = Open_listenfd(argv[1]); while (1) { clientlen = sizeof(struct sockaddr_storage); connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen); Getnameinfo((SA *) &clientaddr, clientlen, client_hostname, MAXLINE, client_port, MAXLINE, 0); printf("Connected to (%s, %s)\n", client_hostname, client_port); echo(connfd); Close(connfd); } exit(0); } void echo(int connfd) { size_t n; char buf[MAXLINE]; rio_t rio; Rio_readinitb(&rio, connfd); while ((n=Rio_readlineb(&rio,buf,MAXLINE)) != 0) { printf("server received %d bytes\n", (int)n); Rio_writen(connfd, buf, n); } } ``` #### 用途: 先將 echoserver 打開,並指定一個 port,作為 client 要連接該 server 的 port ```shell $ ./echoserver 4000 ``` 再來從 client 端連接 ```shell $ ./echoclient hostname 4000 ``` server 端將會顯示 ```shell $ ./echoserver 4000 Connected to (localhost, 49492) ``` 代表成功連接,然後就能從 client 端送出 request,此例會回傳 client 送出的字串 ```shell $ ./echoclient hostname 4000 hello hello no no echo echo ``` ## Part I: Implementing a sequential web proxy - 目標: 實做 sequential proxy 處理 HTTP/1.0 GET requests ![](https://i.imgur.com/xQB5sOy.png) 參考 CSAPP p.667 ,這邊先定義兩個功能 : 1. HTTP request: 一個 **request line** ( line 5 ) 後面跟隨零個或多個 **request header** ( line 6 ) ,再跟隨一行 empty text line 來終止 header list ( line 7 ) 。 - request line 格式: ``` method URI version ``` - request header 格式: ``` header-name: header-data v ``` 2. HTTP response: 和 HTTP request 相似, 它是由一個 **response line** ( line 8 ) ,後面跟隨零個或多個 **response header** ( line 9~13 ) ,再跟隨一行終止 header list 的 empty line ( line 14 ) ,再跟隨一個 response body ( line 15~17 )。 :::info 大部分的架構可參考 p.671 TINY Web server ::: ### Makefile ```c CC = gcc CFLAGS = -g -Wall LDFLAGS = -lpthread all: proxy csapp.o: csapp.c csapp.h $(CC) $(CFLAGS) -c csapp.c proxy.o: proxy.c csapp.h $(CC) $(CFLAGS) -c proxy.c proxy: proxy.o csapp.o $(CC) $(CFLAGS) proxy.o csapp.o -o proxy $(LDFLAGS) driver.sh --exclude port-for-user.pl --exclude free-port.sh --exclude ".*") clean: rm -f *~ *.o proxy core *.tar *.zip *.gzip *.bzip *.gz ``` ### main ```c= int main(int argc,char **argv) { int listenfd,connfd; socklen_t clientlen; char hostname[MAXLINE],port[MAXLINE]; struct sockaddr_storage clientaddr; if(argc != 2){ fprintf(stderr,"usage :%s <port> \n",argv[0]); exit(1); } listenfd = Open_listenfd(argv[1]); while(1){ clientlen = sizeof(clientaddr); connfd = Accept(listenfd,(SA *)&clientaddr,&clientlen); /*print accepted message*/ Getnameinfo((SA*)&clientaddr,clientlen,hostname,MAXLINE,port,MAXLINE,0); printf("Accepted connection from (%s %s).\n",hostname,port); /*sequential handle the client transaction*/ doit(connfd); Close(connfd); } return 0; } ``` ### doit ```c= /*handle the client HTTP transaction*/ void doit(int connfd) { int end_serverfd;/*the end server file descriptor*/ char buf[MAXLINE],method[MAXLINE],uri[MAXLINE],version[MAXLINE]; char endserver_http_header [MAXLINE]; /*store the request line arguments*/ char hostname[MAXLINE],path[MAXLINE]; int port; rio_t rio,server_rio;/*rio is client's rio,server_rio is endserver's rio*/ Rio_readinitb(&rio,connfd); Rio_readlineb(&rio,buf,MAXLINE); sscanf(buf,"%s %s %s",method,uri,version); /*read the client request line*/ if(strcasecmp(method,"GET")){ printf("Proxy does not implement the method"); return; } /*parse the uri to get hostname,file path ,port*/ parse_uri(uri,hostname,path,&port); /*build the http header which will send to the end server*/ build_http_header(endserver_http_header,hostname,path,port,&rio); /*connect to the end server*/ end_serverfd = connect_endServer(hostname,port,endserver_http_header); if(end_serverfd<0){ printf("connection failed\n"); return; } Rio_readinitb(&server_rio,end_serverfd); /*write the http header to endserver*/ Rio_writen(end_serverfd,endserver_http_header,strlen(endserver_http_header)); /*receive message from end server and send to the client*/ size_t n; while((n=Rio_readlineb(&server_rio,buf,MAXLINE))!=0) { printf("proxy received %d bytes,then send\n",n); Rio_writen(connfd,buf,n); } Close(end_serverfd); } ``` ### build_http_header ```c= void build_http_header(char *http_header, char *hostname,char *path,int port,rio_t *client_rio) { char buf[MAXLINE],request_hdr[MAXLINE],other_hdr[MAXLINE],host_hdr[MAXLINE]; /*request line*/ sprintf(request_hdr,requestlint_hdr_format,path); /*get other request header for client rio and change it */ while(Rio_readlineb(client_rio,buf,MAXLINE)>0) { if(strcmp(buf,endof_hdr)==0) break;/*EOF*/ if(!strncasecmp(buf,host_key,strlen(host_key)))/*Host:*/ { strcpy(host_hdr,buf); continue; } if(!strncasecmp(buf,connection_key,strlen(connection_key)) &&!strncasecmp(buf,proxy_connection_key,strlen(proxy_connection_key)) &&!strncasecmp(buf,user_agent_key,strlen(user_agent_key))) { strcat(other_hdr,buf); } } if(strlen(host_hdr)==0) { sprintf(host_hdr,host_hdr_format,hostname); } sprintf(http_header,"%s%s%s%s%s%s%s", request_hdr, host_hdr, conn_hdr, prox_hdr, user_agent_hdr, other_hdr, endof_hdr); return ; } ``` ### connect_endServer ```c /*Connect to the end server*/ inline int connect_endServer(char *hostname,int port,char *http_header){ char portStr[100]; sprintf(portStr,"%d",port); return Open_clientfd(hostname,portStr); } ``` ### parse_uri ```c= /*parse the uri to get hostname,file path ,port*/ void parse_uri(char *uri,char *hostname,char *path,int *port) { *port = 80; char* pos = strstr(uri,"//"); pos = pos!=NULL? pos+2:uri; char*pos2 = strstr(pos,":"); if(pos2!=NULL) { *pos2 = '\0'; sscanf(pos,"%s",hostname); sscanf(pos2+1,"%d%s",port,path); } else { pos2 = strstr(pos,"/"); if(pos2!=NULL) { *pos2 = '\0'; sscanf(pos,"%s",hostname); *pos2 = '/'; sscanf(pos2,"%s",path); } else { sscanf(pos,"%s",hostname); } } return; } ``` - 測試 ``` $ ./driver.sh *** Basic *** Starting tiny on 24009 Starting proxy on 2533 …… basicScore: 40/40 *** Concurrency *** …… concurrencyScore: 0/15 *** Cache *** …… cacheScore: 0/15 totalScore: 40/70 ``` ## Part II: Dealing with multiple concurrent requests 原本的 sequential 版本一次只能處理一個 request , part II 的目的就是要加入 thread 的功能,使 proxy 可以一次處理多個 request ### main ```c= int main(int argc,char **argv) { …… while(1) { …… Pthread_create(&tid,NULL,thread,(void *)connfd); } } ``` ### *thread ```c= void *thread(void *vargp){ int connfd = (int)vargp; Pthread_detach(pthread_self()); doit(connfd); Close(connfd); } ``` ## Part III: Caching web objects 加入 cache 機制,用一個簡單的 main memory cache 記錄最近連線的網頁內容。 ### Cache structure ```c= typedef struct { char cache_obj[MAX_OBJECT_SIZE]; char cache_url[MAXLINE]; int LRU; int isEmpty; int readCnt; /*count of readers*/ sem_t wmutex; /*protects accesses to cache*/ sem_t rdcntmutex; /*protects accesses to readcnt*/ int writeCnt; sem_t wtcntMutex; sem_t queue; } cache_block; typedef struct { cache_block cacheobjs[CACHE_OBJS_COUNT]; /*ten cache blocks*/ int cache_num; } Cache; ``` ### main ```c= int main(int argc,char **argv) { …… cache_init(); …… } ``` ### doit ```c= void doit(int connfd) { …… char url_store[100]; strcpy(url_store,uri); /*store the original url */ …… /*the uri is cached ? */ int cache_index; if((cache_index=cache_find(url_store))!=-1){ /*in cache then return the cache content*/ readerPre(cache_index); Rio_writen(connfd,cache.cacheobjs[cache_index].cache_obj, strlen(cache.cacheobjs[cache_index].cache_obj)); readerAfter(cache_index); cache_LRU(cache_index); return; } …… /*store it*/ if(sizebuf < MAX_OBJECT_SIZE){ cache_uri(url_store,cachebuf); } } ``` ### Cache functions ```c= void cache_init(){ cache.cache_num = 0; int i; for(i=0;i<CACHE_OBJS_COUNT;i++){ cache.cacheobjs[i].LRU = 0; cache.cacheobjs[i].isEmpty = 1; Sem_init(&cache.cacheobjs[i].wmutex,0,1); Sem_init(&cache.cacheobjs[i].rdcntmutex,0,1); cache.cacheobjs[i].readCnt = 0; cache.cacheobjs[i].writeCnt = 0; Sem_init(&cache.cacheobjs[i].wtcntMutex,0,1); Sem_init(&cache.cacheobjs[i].queue,0,1); } } void readerPre(int i){ P(&cache.cacheobjs[i].queue); P(&cache.cacheobjs[i].rdcntmutex); cache.cacheobjs[i].readCnt++; if(cache.cacheobjs[i].readCnt==1) P(&cache.cacheobjs[i].wmutex); V(&cache.cacheobjs[i].rdcntmutex); V(&cache.cacheobjs[i].queue); } void readerAfter(int i){ P(&cache.cacheobjs[i].rdcntmutex); cache.cacheobjs[i].readCnt--; if(cache.cacheobjs[i].readCnt==0) V(&cache.cacheobjs[i].wmutex); V(&cache.cacheobjs[i].rdcntmutex); } void writePre(int i){ P(&cache.cacheobjs[i].wtcntMutex); cache.cacheobjs[i].writeCnt++; if(cache.cacheobjs[i].writeCnt==1) P(&cache.cacheobjs[i].queue); V(&cache.cacheobjs[i].wtcntMutex); P(&cache.cacheobjs[i].wmutex); } void writeAfter(int i){ V(&cache.cacheobjs[i].wmutex); P(&cache.cacheobjs[i].wtcntMutex); cache.cacheobjs[i].writeCnt--; if(cache.cacheobjs[i].writeCnt==0) V(&cache.cacheobjs[i].queue); V(&cache.cacheobjs[i].wtcntMutex); } /*find url is in the cache or not */ int cache_find(char *url){ int i; for(i=0;i<CACHE_OBJS_COUNT;i++){ readerPre(i); if((cache.cacheobjs[i].isEmpty==0) && (strcmp(url,cache.cacheobjs[i].cache_url)==0)) break; readerAfter(i); } if(i>=CACHE_OBJS_COUNT) return -1; /*can not find url in the cache*/ return i; } /*find the empty cacheObj or which cacheObj should be evictioned*/ int cache_eviction(){ int min = LRU_MAGIC_NUMBER; int minindex = 0; int i; for(i=0; i<CACHE_OBJS_COUNT; i++) { readerPre(i); if(cache.cacheobjs[i].isEmpty == 1){/*choose if cache block empty */ minindex = i; readerAfter(i); break; } if(cache.cacheobjs[i].LRU< min){ /*if not empty choose the min LRU*/ minindex = i; readerAfter(i); continue; } readerAfter(i); } return minindex; } /*update the LRU number except the new cache one*/ void cache_LRU(int index){ writePre(index); cache.cacheobjs[index].LRU = LRU_MAGIC_NUMBER; writeAfter(index); int i; for(i=0; i<index; i++) { writePre(i); if(cache.cacheobjs[i].isEmpty==0 && i!=index){ cache.cacheobjs[i].LRU--; } writeAfter(i); } i++; for(i; i<CACHE_OBJS_COUNT; i++) { writePre(i); if(cache.cacheobjs[i].isEmpty==0 && i!=index){ cache.cacheobjs[i].LRU--; } writeAfter(i); } } /*cache the uri and content in cache*/ void cache_uri(char *uri,char *buf){ int i = cache_eviction(); writePre(i);/*writer P*/ strcpy(cache.cacheobjs[i].cache_obj,buf); strcpy(cache.cacheobjs[i].cache_url,uri); cache.cacheobjs[i].isEmpty = 0; writeAfter(i);/*writer V*/ cache_LRU(i); } ``` - 測試 ```shell $ ./driver.sh *** Basic *** Starting tiny on 10614 Starting proxy on 19175 1: home.html Fetching ./tiny/home.html into ./.proxy using the proxy Fetching ./tiny/home.html into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 2: csapp.c Fetching ./tiny/csapp.c into ./.proxy using the proxy Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 3: tiny.c Fetching ./tiny/tiny.c into ./.proxy using the proxy Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 4: godzilla.jpg Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. 5: tiny Fetching ./tiny/tiny into ./.proxy using the proxy Fetching ./tiny/tiny into ./.noproxy directly from Tiny Comparing the two files Success: Files are identical. Killing tiny and proxy basicScore: 40/40 *** Concurrency *** Starting tiny on port 1644 Starting proxy on port 31209 Starting the blocking NOP server on port 12414 Trying to fetch a file from the blocking nop-server Fetching ./tiny/home.html into ./.noproxy directly from Tiny Fetching ./tiny/home.html into ./.proxy using the proxy Checking whether the proxy fetch succeeded Success: Was able to fetch tiny/home.html from the proxy. Killing tiny, proxy, and nop-server concurrencyScore: 15/15 *** Cache *** Starting tiny on port 24273 Starting proxy on port 12555 Fetching ./tiny/tiny.c into ./.proxy using the proxy Fetching ./tiny/home.html into ./.proxy using the proxy Fetching ./tiny/csapp.c into ./.proxy using the proxy Killing tiny Fetching a cached copy of ./tiny/home.html into ./.noproxy Success: Was able to fetch tiny/home.html from the cache. Killing proxy cacheScore: 15/15 totalScore: 70/70 ``` :::success totalScore: 70/70 PASS :::