# CS:APP 3/e Proxy Lab
contributed by < `type59ty` >
###### tags: `sysprog2018`
> [原始程式碼](https://github.com/type59ty/proxylab)
## 事前準備
- Download the [handout](http://csapp.cs.cmu.edu/3e/labs.html)
- Study the [write up](http://csapp.cs.cmu.edu/3e/proxylab.pdf)
- Study CSAPP ch 10,11,12
## 參考資料
- [代理伺服器](https://zh.wikipedia.org/zh-tw/%E4%BB%A3%E7%90%86%E6%9C%8D%E5%8A%A1%E5%99%A8)
- [區網控制者: Proxy 伺服器](http://linux.vbird.org/linux_server/0420squid.php)
- [CSAPP: Proxy lab](https://blog.csdn.net/u012336567/article/details/52056089)
## Proxy 介紹
Web proxy 是一種 Web browser 跟 end server 之間的中介程式,使用者在瀏覽網頁時並不是直接連到 end server ,而是透過 proxy 接收 request , 由 proxy 將 request 發送給 end server , 再由 proxy 將 end server 的回應 (e.g 網頁內容) 傳送到使用者的 browser ,因此 proxy 同時扮演 client 和 server 兩種角色。
一些閘道器、路由器等網路裝置具備 proxy 功能。一般認為 proxy 服務有利於保障網路終端的隱私或安全,防止攻擊。
![](https://i.imgur.com/C1ZVXzL.png)
## 作業要求
寫一個簡單的 HTTP proxy , 可以將 web 內容暫存。 此 lab 有3個部份要完成:
1. 設定 proxy 基本功能,接收 incoming connections,讀取並解析 request , forward requests 到 web servers,讀取 server 的 responses,最後將 responds forward 給對應的 clients。 此部份將學到基本的 HTTP 操作、了解如何運用 sockets 寫一個能在網路上溝通的程式。
2. 將 proxy 擴充,使其能夠同時處理多個連線。
3. 加入 cache 機制,用一個簡單的 main memory cache 記錄最近連線的網頁內容。
## Practice : echo server and client
根據課本 p.663, 664,建立一個簡單的 client 和 server 程式,藉此熟悉這些 function 的操作。
第 15 行的 Open_clientfd 用來建立與 server 的連接
- echoclient.c
```c=
#include "csapp.h"
int main(int argc, char **argv) {
int clientfd;
char *host, *port, buf[MAXLINE];
rio_t rio;
if (argc != 3) {
fprintf(stderr, "usage: %s <host> <port>\n", argv[0]);
exit(0);
}
host = argv[1];
port = argv[2];
clientfd = Open_clientfd(host, port);
Rio_readinitb(&rio, clientfd);
while (Fgets(buf, MAXLINE, stdin) != NULL) {
Rio_writen(clientfd, buf, strlen(buf));
Rio_readlineb(&rio, buf, MAXLINE);
Fputs(buf, stdout);
}
Close(clientfd);
exit(0);
}
```
- echoserver.c
```c=
#include "csapp.h"
void echo(int connfd);
int main(int argc, char **argv) {
int listenfd, connfd;
socklen_t clientlen;
struct sockaddr_storage clientaddr;
char client_hostname[MAXLINE], client_port[MAXLINE];
if (argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(0);
}
listenfd = Open_listenfd(argv[1]);
while (1) {
clientlen = sizeof(struct sockaddr_storage);
connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
Getnameinfo((SA *) &clientaddr, clientlen, client_hostname, MAXLINE,
client_port, MAXLINE, 0);
printf("Connected to (%s, %s)\n", client_hostname, client_port);
echo(connfd);
Close(connfd);
}
exit(0);
}
void echo(int connfd) {
size_t n;
char buf[MAXLINE];
rio_t rio;
Rio_readinitb(&rio, connfd);
while ((n=Rio_readlineb(&rio,buf,MAXLINE)) != 0) {
printf("server received %d bytes\n", (int)n);
Rio_writen(connfd, buf, n);
}
}
```
#### 用途:
先將 echoserver 打開,並指定一個 port,作為 client 要連接該 server 的 port
```shell
$ ./echoserver 4000
```
再來從 client 端連接
```shell
$ ./echoclient hostname 4000
```
server 端將會顯示
```shell
$ ./echoserver 4000
Connected to (localhost, 49492)
```
代表成功連接,然後就能從 client 端送出 request,此例會回傳 client 送出的字串
```shell
$ ./echoclient hostname 4000
hello
hello
no
no
echo
echo
```
## Part I: Implementing a sequential web proxy
- 目標: 實做 sequential proxy 處理 HTTP/1.0 GET requests
![](https://i.imgur.com/xQB5sOy.png)
參考 CSAPP p.667 ,這邊先定義兩個功能 :
1. HTTP request:
一個 **request line** ( line 5 ) 後面跟隨零個或多個 **request header** ( line 6 ) ,再跟隨一行 empty text line 來終止 header list ( line 7 ) 。
- request line 格式:
```
method URI version
```
- request header 格式:
```
header-name: header-data v
```
2. HTTP response:
和 HTTP request 相似, 它是由一個 **response line** ( line 8 ) ,後面跟隨零個或多個 **response header** ( line 9~13 ) ,再跟隨一行終止 header list 的 empty line ( line 14 ) ,再跟隨一個 response body ( line 15~17 )。
:::info
大部分的架構可參考 p.671 TINY Web server
:::
### Makefile
```c
CC = gcc
CFLAGS = -g -Wall
LDFLAGS = -lpthread
all: proxy
csapp.o: csapp.c csapp.h
$(CC) $(CFLAGS) -c csapp.c
proxy.o: proxy.c csapp.h
$(CC) $(CFLAGS) -c proxy.c
proxy: proxy.o csapp.o
$(CC) $(CFLAGS) proxy.o csapp.o -o proxy $(LDFLAGS)
driver.sh --exclude port-for-user.pl --exclude free-port.sh --exclude ".*")
clean:
rm -f *~ *.o proxy core *.tar *.zip *.gzip *.bzip *.gz
```
### main
```c=
int main(int argc,char **argv)
{
int listenfd,connfd;
socklen_t clientlen;
char hostname[MAXLINE],port[MAXLINE];
struct sockaddr_storage clientaddr;
if(argc != 2){
fprintf(stderr,"usage :%s <port> \n",argv[0]);
exit(1);
}
listenfd = Open_listenfd(argv[1]);
while(1){
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd,(SA *)&clientaddr,&clientlen);
/*print accepted message*/
Getnameinfo((SA*)&clientaddr,clientlen,hostname,MAXLINE,port,MAXLINE,0);
printf("Accepted connection from (%s %s).\n",hostname,port);
/*sequential handle the client transaction*/
doit(connfd);
Close(connfd);
}
return 0;
}
```
### doit
```c=
/*handle the client HTTP transaction*/
void doit(int connfd)
{
int end_serverfd;/*the end server file descriptor*/
char buf[MAXLINE],method[MAXLINE],uri[MAXLINE],version[MAXLINE];
char endserver_http_header [MAXLINE];
/*store the request line arguments*/
char hostname[MAXLINE],path[MAXLINE];
int port;
rio_t rio,server_rio;/*rio is client's rio,server_rio is endserver's rio*/
Rio_readinitb(&rio,connfd);
Rio_readlineb(&rio,buf,MAXLINE);
sscanf(buf,"%s %s %s",method,uri,version); /*read the client request line*/
if(strcasecmp(method,"GET")){
printf("Proxy does not implement the method");
return;
}
/*parse the uri to get hostname,file path ,port*/
parse_uri(uri,hostname,path,&port);
/*build the http header which will send to the end server*/
build_http_header(endserver_http_header,hostname,path,port,&rio);
/*connect to the end server*/
end_serverfd = connect_endServer(hostname,port,endserver_http_header);
if(end_serverfd<0){
printf("connection failed\n");
return;
}
Rio_readinitb(&server_rio,end_serverfd);
/*write the http header to endserver*/
Rio_writen(end_serverfd,endserver_http_header,strlen(endserver_http_header));
/*receive message from end server and send to the client*/
size_t n;
while((n=Rio_readlineb(&server_rio,buf,MAXLINE))!=0)
{
printf("proxy received %d bytes,then send\n",n);
Rio_writen(connfd,buf,n);
}
Close(end_serverfd);
}
```
### build_http_header
```c=
void build_http_header(char *http_header,
char *hostname,char *path,int port,rio_t *client_rio)
{
char buf[MAXLINE],request_hdr[MAXLINE],other_hdr[MAXLINE],host_hdr[MAXLINE];
/*request line*/
sprintf(request_hdr,requestlint_hdr_format,path);
/*get other request header for client rio and change it */
while(Rio_readlineb(client_rio,buf,MAXLINE)>0)
{
if(strcmp(buf,endof_hdr)==0) break;/*EOF*/
if(!strncasecmp(buf,host_key,strlen(host_key)))/*Host:*/
{
strcpy(host_hdr,buf);
continue;
}
if(!strncasecmp(buf,connection_key,strlen(connection_key))
&&!strncasecmp(buf,proxy_connection_key,strlen(proxy_connection_key))
&&!strncasecmp(buf,user_agent_key,strlen(user_agent_key)))
{
strcat(other_hdr,buf);
}
}
if(strlen(host_hdr)==0)
{
sprintf(host_hdr,host_hdr_format,hostname);
}
sprintf(http_header,"%s%s%s%s%s%s%s",
request_hdr,
host_hdr,
conn_hdr,
prox_hdr,
user_agent_hdr,
other_hdr,
endof_hdr);
return ;
}
```
### connect_endServer
```c
/*Connect to the end server*/
inline int connect_endServer(char *hostname,int port,char *http_header){
char portStr[100];
sprintf(portStr,"%d",port);
return Open_clientfd(hostname,portStr);
}
```
### parse_uri
```c=
/*parse the uri to get hostname,file path ,port*/
void parse_uri(char *uri,char *hostname,char *path,int *port)
{
*port = 80;
char* pos = strstr(uri,"//");
pos = pos!=NULL? pos+2:uri;
char*pos2 = strstr(pos,":");
if(pos2!=NULL)
{
*pos2 = '\0';
sscanf(pos,"%s",hostname);
sscanf(pos2+1,"%d%s",port,path);
}
else
{
pos2 = strstr(pos,"/");
if(pos2!=NULL)
{
*pos2 = '\0';
sscanf(pos,"%s",hostname);
*pos2 = '/';
sscanf(pos2,"%s",path);
}
else
{
sscanf(pos,"%s",hostname);
}
}
return;
}
```
- 測試
```
$ ./driver.sh
*** Basic ***
Starting tiny on 24009
Starting proxy on 2533
……
basicScore: 40/40
*** Concurrency ***
……
concurrencyScore: 0/15
*** Cache ***
……
cacheScore: 0/15
totalScore: 40/70
```
## Part II: Dealing with multiple concurrent requests
原本的 sequential 版本一次只能處理一個 request , part II 的目的就是要加入 thread 的功能,使 proxy 可以一次處理多個 request
### main
```c=
int main(int argc,char **argv)
{
……
while(1) {
……
Pthread_create(&tid,NULL,thread,(void *)connfd);
}
}
```
### *thread
```c=
void *thread(void *vargp){
int connfd = (int)vargp;
Pthread_detach(pthread_self());
doit(connfd);
Close(connfd);
}
```
## Part III: Caching web objects
加入 cache 機制,用一個簡單的 main memory cache 記錄最近連線的網頁內容。
### Cache structure
```c=
typedef struct {
char cache_obj[MAX_OBJECT_SIZE];
char cache_url[MAXLINE];
int LRU;
int isEmpty;
int readCnt; /*count of readers*/
sem_t wmutex; /*protects accesses to cache*/
sem_t rdcntmutex; /*protects accesses to readcnt*/
int writeCnt;
sem_t wtcntMutex;
sem_t queue;
} cache_block;
typedef struct {
cache_block cacheobjs[CACHE_OBJS_COUNT]; /*ten cache blocks*/
int cache_num;
} Cache;
```
### main
```c=
int main(int argc,char **argv)
{
……
cache_init();
……
}
```
### doit
```c=
void doit(int connfd)
{
……
char url_store[100];
strcpy(url_store,uri); /*store the original url */
……
/*the uri is cached ? */
int cache_index;
if((cache_index=cache_find(url_store))!=-1){
/*in cache then return the cache content*/
readerPre(cache_index);
Rio_writen(connfd,cache.cacheobjs[cache_index].cache_obj,
strlen(cache.cacheobjs[cache_index].cache_obj));
readerAfter(cache_index);
cache_LRU(cache_index);
return;
}
……
/*store it*/
if(sizebuf < MAX_OBJECT_SIZE){
cache_uri(url_store,cachebuf);
}
}
```
### Cache functions
```c=
void cache_init(){
cache.cache_num = 0;
int i;
for(i=0;i<CACHE_OBJS_COUNT;i++){
cache.cacheobjs[i].LRU = 0;
cache.cacheobjs[i].isEmpty = 1;
Sem_init(&cache.cacheobjs[i].wmutex,0,1);
Sem_init(&cache.cacheobjs[i].rdcntmutex,0,1);
cache.cacheobjs[i].readCnt = 0;
cache.cacheobjs[i].writeCnt = 0;
Sem_init(&cache.cacheobjs[i].wtcntMutex,0,1);
Sem_init(&cache.cacheobjs[i].queue,0,1);
}
}
void readerPre(int i){
P(&cache.cacheobjs[i].queue);
P(&cache.cacheobjs[i].rdcntmutex);
cache.cacheobjs[i].readCnt++;
if(cache.cacheobjs[i].readCnt==1) P(&cache.cacheobjs[i].wmutex);
V(&cache.cacheobjs[i].rdcntmutex);
V(&cache.cacheobjs[i].queue);
}
void readerAfter(int i){
P(&cache.cacheobjs[i].rdcntmutex);
cache.cacheobjs[i].readCnt--;
if(cache.cacheobjs[i].readCnt==0) V(&cache.cacheobjs[i].wmutex);
V(&cache.cacheobjs[i].rdcntmutex);
}
void writePre(int i){
P(&cache.cacheobjs[i].wtcntMutex);
cache.cacheobjs[i].writeCnt++;
if(cache.cacheobjs[i].writeCnt==1) P(&cache.cacheobjs[i].queue);
V(&cache.cacheobjs[i].wtcntMutex);
P(&cache.cacheobjs[i].wmutex);
}
void writeAfter(int i){
V(&cache.cacheobjs[i].wmutex);
P(&cache.cacheobjs[i].wtcntMutex);
cache.cacheobjs[i].writeCnt--;
if(cache.cacheobjs[i].writeCnt==0) V(&cache.cacheobjs[i].queue);
V(&cache.cacheobjs[i].wtcntMutex);
}
/*find url is in the cache or not */
int cache_find(char *url){
int i;
for(i=0;i<CACHE_OBJS_COUNT;i++){
readerPre(i);
if((cache.cacheobjs[i].isEmpty==0) && (strcmp(url,cache.cacheobjs[i].cache_url)==0)) break;
readerAfter(i);
}
if(i>=CACHE_OBJS_COUNT) return -1; /*can not find url in the cache*/
return i;
}
/*find the empty cacheObj or which cacheObj should be evictioned*/
int cache_eviction(){
int min = LRU_MAGIC_NUMBER;
int minindex = 0;
int i;
for(i=0; i<CACHE_OBJS_COUNT; i++)
{
readerPre(i);
if(cache.cacheobjs[i].isEmpty == 1){/*choose if cache block empty */
minindex = i;
readerAfter(i);
break;
}
if(cache.cacheobjs[i].LRU< min){ /*if not empty choose the min LRU*/
minindex = i;
readerAfter(i);
continue;
}
readerAfter(i);
}
return minindex;
}
/*update the LRU number except the new cache one*/
void cache_LRU(int index){
writePre(index);
cache.cacheobjs[index].LRU = LRU_MAGIC_NUMBER;
writeAfter(index);
int i;
for(i=0; i<index; i++) {
writePre(i);
if(cache.cacheobjs[i].isEmpty==0 && i!=index){
cache.cacheobjs[i].LRU--;
}
writeAfter(i);
}
i++;
for(i; i<CACHE_OBJS_COUNT; i++) {
writePre(i);
if(cache.cacheobjs[i].isEmpty==0 && i!=index){
cache.cacheobjs[i].LRU--;
}
writeAfter(i);
}
}
/*cache the uri and content in cache*/
void cache_uri(char *uri,char *buf){
int i = cache_eviction();
writePre(i);/*writer P*/
strcpy(cache.cacheobjs[i].cache_obj,buf);
strcpy(cache.cacheobjs[i].cache_url,uri);
cache.cacheobjs[i].isEmpty = 0;
writeAfter(i);/*writer V*/
cache_LRU(i);
}
```
- 測試
```shell
$ ./driver.sh
*** Basic ***
Starting tiny on 10614
Starting proxy on 19175
1: home.html
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
2: csapp.c
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
3: tiny.c
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
4: godzilla.jpg
Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy
Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
5: tiny
Fetching ./tiny/tiny into ./.proxy using the proxy
Fetching ./tiny/tiny into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
Killing tiny and proxy
basicScore: 40/40
*** Concurrency ***
Starting tiny on port 1644
Starting proxy on port 31209
Starting the blocking NOP server on port 12414
Trying to fetch a file from the blocking nop-server
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Fetching ./tiny/home.html into ./.proxy using the proxy
Checking whether the proxy fetch succeeded
Success: Was able to fetch tiny/home.html from the proxy.
Killing tiny, proxy, and nop-server
concurrencyScore: 15/15
*** Cache ***
Starting tiny on port 24273
Starting proxy on port 12555
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Killing tiny
Fetching a cached copy of ./tiny/home.html into ./.noproxy
Success: Was able to fetch tiny/home.html from the cache.
Killing proxy
cacheScore: 15/15
totalScore: 70/70
```
:::success
totalScore: 70/70 PASS
:::