2019q1 Homework5 (daemon) === contributed by < `martinxluptak` > I connected to the daemon using python and sent some data: ``` >>> tn = telnetlib.Telnet("localhost", 12345) >>> tn.write('123'.encode('ascii')) >>> tn.write('123'.encode('ascii')) >>> tn.write('123'.encode('ascii')) >>> l = tn.read_some() >>> l b'123\xc5e:\xcc\x91vices/pci0000:00/0000:00:1c.0/0000:01:00.0' ``` The expected contents of `l` are `123123123`. This kecho implementation starts reading from out of bounds memory until an appropriate end of string sequence is found. This results in leaking kernel memory to userland. This implementation also interacts weirdly with the shell telnet client because it handles newline as EOS - this means that after receiving a longer message and a shorter message afterwards, the buffer is repeated along with newline until EOS. This is an example output after receiving messages `3333333` and `44444` (notice the last `33`s are repeated): ``` 3333333 44444 33 ``` I created a simple python script to measure the performance of kecho when handling 10 clients. I measured this on my host machine using time elapsed since the beginning of the python script. I did not touch any kernel or qemu timing utilities. The former kecho implementation handled (mean) 22000 messages per second. My kecho implementation handled (mean) 25000 messages per second. I do not know what exactly to contribute this performance to - cmwq is a cleaner solution and merging `set_request()` and `get_request()` into one function spares some function calls, but I'm unsure whether this is the best way. I used chunks of code from Kernel HTTPd to implement my own `echo_server.c` with cmwq. I also fixed the issue of reaching out of bounds memory. I had trouble with setting up networking / sockets in QEMU so all was tested on my host machine. ``` #include <linux/kernel.h> #include <linux/kthread.h> #include <linux/sched/signal.h> #include <linux/tcp.h> #include "fastecho.h" #define BUF_SIZE 4096 static struct workqueue_struct *my_wq; struct work_struct_data { struct work_struct my_work; struct socket *client; }; static int handle_request(struct socket *sock, unsigned char *buf, size_t size) { struct msghdr msg; struct kvec vec; int length; /* kvec setting */ vec.iov_len = size; vec.iov_base = buf; /* msghdr setting */ msg.msg_name = 0; msg.msg_namelen = 0; msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_flags = 0; /* get msg */ length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); if (length == 0) return length; printk("len,msg: %s, %ld\n", buf, strlen(buf)); length = kernel_sendmsg(sock, &msg, &vec, 1, strlen(buf) - 1); return length; } static void work_handler(struct work_struct *work) { unsigned char *buf; struct work_struct_data *wsdata = (struct work_struct_data *) work; int res; buf = kmalloc(BUF_SIZE, GFP_KERNEL); if (!buf) { printk("server: recvbuf kmalloc error!\n"); return; } memset(buf, 0, BUF_SIZE); while (1) { res = handle_request(wsdata->client, buf, BUF_SIZE - 1); if (res <= 0) { if (res) { printk(KERN_ERR MODULE_NAME ": get request error = %d\n", res); } break; } } kfree(buf); kernel_sock_shutdown(wsdata->client, SHUT_RDWR); sock_release(wsdata->client); printk("exiting work_handler...\n"); } int echo_server_daemon(void *arg) { struct echo_server_param *param = arg; struct socket *sock; int error; allow_signal(SIGKILL); allow_signal(SIGTERM); #ifdef HIGH_PRI struct workqueue_attrs *attr; attr = alloc_workqueue_attrs(__GFP_HIGH); apply_workqueue_attrs(my_wq, attr); #else my_wq = create_workqueue("my_queue"); #endif // my_wq = alloc_workqueue("my_queue",WQ_MEM_RECLAIM | WQ_HIGHPRI, 1); while (1) { struct work_struct_data *wsdata; /* using blocking I/O */ error = kernel_accept(param->listen_sock, &sock, 0); printk("accepted socket\n"); if (error < 0) { if (signal_pending(current)) break; printk(KERN_ERR MODULE_NAME ": socket accept error = %d\n", error); continue; } if (my_wq) { /*set workqueue data*/ wsdata = (struct work_struct_data *) kmalloc( sizeof(struct work_struct_data), GFP_KERNEL); wsdata->client = sock; /*put task into workqueue*/ if (wsdata) { printk("starting work\n"); INIT_WORK(&wsdata->my_work, work_handler); error = queue_work(my_wq, &wsdata->my_work); } } printk("server: accept ok, Connection Established.\n"); } return 0; } ``` I cloned KernelHTTP to inspect it. Both KernelHTTP implementations use a concurrency management workqueue, which is a task wrapper / interface for concurrency provided directly by the kernel. The first KernelHTTP implementation uses a kernel socket to connect the client and redirects his request to a python server on another port which generates an appropriate response and sends it back to the kernel socket. Implementing a web server this way in kernel is unnecessary, because the python server is already entirely in userland and poses a significant bottleneck to request processing. In the no-python implementation, instead, request and response are both mediated solely through kernel resources, which results in a ten-fold boost to performance - according to the measurements of the author, the no-python implementation can handle 50000 requests per second, while the one that proxies requests to userland caps at 5000 requests per second.