2022q1 Homework1 (lab0)

作業要求

實驗環境

$ cat /proc/version
Linux version 5.13.0-30-generic (buildd@lcy02-amd64-003) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #33~20.04.1-Ubuntu SMP Mon Feb 7 14:25:10 UTC 2022

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

開發記錄

Note : current progress

# Github Action
$ Run make test || (cat scripts/weeping.raw ; exit 1)
...
---     TOTAL		100/100

# Local machine
$ make test
...
+++ TESTING trace trace-14-perf:
# Test performance of insert_tail, reverse, and sort
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
---     trace-14-perf   0/6
...
---     TOTAL		94/100

q_new

q_new is expected to return a list_head object that acts as the head of list.

It allocates the list_head object that points to itself, implemented in INIT_LIST_HEAD(head)

struct list_head *q_new()
{
    struct list_head *head = malloc(sizeof *head);
    if (head == NULL)
        return NULL;
    INIT_LIST_HEAD(head);
    return head;
}

q_free

q_free iterates through the queue then free all storage used by queue.

list_for_each_entry_safe(...) is used because the function would remove element of current iteration of loop. list_for_each_entry_safe(...) would have a pointer safe to store the address of entry->next, so that when entry->next points to other address, list_for_each_entry_safe can still find the address for next iteration.

q_free() intention is to free every memory space allocated by the for the queue. Which includes the list_head structure. And so, the function would iterate through the queue one by one, releasing memory of both the char * value and the memory for container itself, using q_release_element that was provided by default.

When the iterator exits the loop, every element in the queue is freed, and only the list_head or queue is remain, as the objective of the q_free() function is to release every memory space allocated for the queue, the remaining list_head structure is released as well.

Improve your writing. You should mention the approach how you release the intended resouces.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

void q_free(struct list_head *l)
{
    if (l == NULL)
        return;
    element_t *elem, *safe;
    list_for_each_entry_safe (elem, safe, l, list) {
        q_release_element(elem);
    }
    free(l);
}

q_insert_head

q_insert_head inserts an element to head of queue, which next of head of list points to.

Linux kernel API already had list_add(...) function that does that.

q_insert_head assigns the value field then insert the element after head of list.

If any of the malloc() fails, free every previously allocated memory then return false.

bool q_insert_head(struct list_head *head, char *s)
{   
    if (head == NULL) 
        return false; 
    element_t *elem = malloc(sizeof *elem);
    if (elem == NULL)
        return false;
    elem->value = malloc(strlen(s) + 1);
    if(elem->value == NULL){
        free(elem);
        return false;
    }
    strncpy(elem->value, s, strlen(s) + 1);
    list_add(&elem->list, head);
    return true;
}

q_insert_tail

q_insert_tail inserts an element to tail of queue, which prev of head of list points to.

Linux kernel API already had list_add_tail(...) function that does that.

q_insert_tail assigns the value field then insert the element before head of list.

If any of the malloc() fails, free every previously allocated memory then return false.

bool q_insert_tail(struct list_head *head, char *s)
{
    if (head == NULL)
        return false;
    element_t *elem = malloc(sizeof *elem);
    if (elem == NULL)
        return false;
    elem->value = malloc(strlen(s) + 1);
    if(elem->value == NULL){
        free(elem);
        return false;
    }
    strncpy(elem->value, s, strlen(s) + 1);
    list_add_tail(&elem->list, head);
    return true;
}

q_remove_head

q_remove_head removes the head of queue, which next of head of list points to. Linux Kernel API has a function list_del(...) that unlinks a node from the list. q_remove_head indicates the head of queue to unlink with list_del(...).

The value of removed node would be copied to sp. As it is required to make this function works in constant time, the character is copied one at a time until it reaches end of string. After that, it would continue doing some assignment operation to ensure the loop will run for bufsize - 1 iterations no matter then length of string to be copied.

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    if (head == NULL || list_empty(head))
        return NULL;
    struct list_head *node = head->next;
    list_del_init(node);

    if (sp != NULL) {
        size_t len = (bufsize - 1) ^
                     (((bufsize - 1) ^
                       (strlen(list_entry(node, element_t, list)->value))) &
                      -(strlen(list_entry(node, element_t, list)->value) <
                        (bufsize - 1)));
        char *psp = list_entry(node, element_t, list)->value;
        for (int i = 0; i < bufsize - 1; ++i) {
            sp[i ^ ((i ^ len) & -(i > len))] = *(psp + (i & -(i < len)));
        }   
        sp[len] = '\0';
    }   

    return list_entry(node, element_t, list);
}

q_remove_tail

q_remove_tail removes the head of queue, which next of head of list points to. Linux Kernel API has a function list_del(...) that unlinks a node from the list. q_remove_tail indicates the tail of queue to unlink with list_del(...)

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{   
    if (head == NULL || list_empty(head))
        return NULL; 
    struct list_head *node = head->prev;
    list_del_init(node);
    if (sp != NULL) {
        strncpy(sp, list_entry(node, element_t, list)->value, bufsize);
        sp[bufsize - 1] = '\0';
    }
    return list_entry(node, element_t, list);
}

q_size

q_size returns the number of elements in the queue.

It uses list_for_each(...) macro from Linux Kernel API that iterates through the list, which would tranverse through each element exactly once in the process.

int q_size(struct list_head *head)
{
    int size = 0;
    if (head == NULL || list_empty(head))
        return size;
    struct list_head *node;
    list_for_each (node, head) {
        ++size;
    }   
    return size;
}

q_delete_mid

q_delete_mid deletes the middle element ( take floor if q_size is odd ) of the queue.

As the queue is implemented as a circular, doubly-linked list, it is possible to find the middle element of the queue by going from both tail-to-head and head-to-tail, a step at a time.

The function would first moves forward then backward as the desired result is floor of q_size.

bool q_delete_mid(struct list_head *head)
{
    if (head == NULL || list_empty(head))
        return false;
    struct list_head *forward = NULL, *backward = NULL;
    for(forward = head->next,backward = head->prev;forward != backward;backward = backward->prev){
        forward = forward->next;
        if (forward == backward)
            break;
    }
    list_del_init(forward);
    q_release_element(list_entry(forward, element_t, list));
    return true;
}

Rewrite with for iteration. Shorten the code.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

Question :
Why is for loop better than while loop?
Even though the code is shorter, i feel like while loop is easier to read / understand

original code

bool q_delete_mid(struct list_head *head)
{
    if (head == NULL || list_empty(head))
        return false;
    struct list_head *forward, *backward;
    forward = head->next;
    backward = head->prev;
    while (forward != backward) {
        forward = forward->next;
        if (forward == backward)
            break;
        backward = backward->prev;
    }
    list_del_init(forward);
    q_release_element(list_entry(forward, element_t, list));
    return true;
}

q_delete_dup

q_delete_dup removes every element that has one or more duplicate copy of itself.

As stated in the documentation, it is only used after sorting the queue, which means every duplicates must be consecutive.

The fuction is currently implemented with a foreach loop, which then it will check if for current and next entry is duplicate of each other, then delete if it is.
After it found a new element when deleting the duplicates, it will then delete once again for the last remaining duplicate.

In theory, we could delete all the duplicate element by detaching them from the original queue. However, as we have still have to iterate through each node to release them, there should not be much imporvement in performance.

bool q_delete_dup(struct list_head *head)
{
    if (head == NULL)
        return false;

    element_t *elem, *safe, *last = NULL;
    list_for_each_entry_safe (elem, safe, head, list) {
        if (&safe->list == head)
            break;
        if (!strcmp(elem->value, safe->value)) {
            last = safe;
            list_del_init(&elem->list);
            q_release_element(elem);
        } else {
            if (last != NULL && last == elem) {
                list_del_init(&elem->list);
                q_release_element(elem);
            }
        }
    }
    if (last == list_last_entry(head, element_t, list)) {
        list_del_init(&elem->list);
        q_release_element(elem);
    }
    return true;
}

q_swap

q_swap would swap every two adjacent elements in the queue.

The implementation is pretty straight forward as it simply tranverse through the list and relink all 6 edges connecting the two elements.

void q_swap(struct list_head *head)
{
    int phase = 0;
    struct list_head *node, *safe;
    list_for_each_safe (node, safe, head) {
        if (phase) {
            struct list_head *swap = node->prev;
            swap->prev->next = node;
            node->prev->next = safe;
            node->next = swap;
            safe->prev = swap;
            node->prev = swap->prev;
            swap->prev = node;
        }
        phase ^= 1;
    }
}

q_reverse

q_reverse reverses the queue.

Similar the rule of a LIFO structure, when repeatedly inserting an element to the head of the queue, the very first element inserted would became the last element by the time the function reaches the end of the queue.

void q_reverse(struct list_head *head)
{
    if (head == NULL || list_empty(head))
        return;
    struct list_head *node, *safe;
    list_for_each_safe (node, safe, head) {
        list_del(node);
        list_add(node, head);
    }
}

q_sort

q_sort sorts the queue.

The implementation is with a simple merge sort algorithm.
The merge sort algorithm consists of three functions, merge, merge_sort, and q_sort.

The body of merge comes mainly from lib/list_sort.c, the only modification made is on the comparison if statement.

q_sort itself only breaks the circular, doubly-linked list into a non-circular, singly-linked list before merge_sort, then connects them back into a circular, doubly-linked list after merge_sort.

The algorithm will search for the middle element then breaks the queue into 2 singly-linked list, then it will merge them recursively.

Middle of list is found by using Floyd's Tortoise and Hare algorithm which originally used as algorithm to show that list is cirular. Conveniently, the " tortoise " would stop by the middle of list at the end of the algorithm. I opt for this algorithm to find the middle node instead of the algorithm in q_delete_middle as the queue is no longer a circular linked list at this point.

Reference :

Linked List Sort
Over 1/2 of the documentation in lab0 had Floyd's Tortoise and Hare algorithm included

struct list_head *merge(struct list_head *a, struct list_head *b)
{
    struct list_head *head = NULL, **tail = &head;
    for (;;) {
        if (strcmp(list_entry(a, element_t, list)->value, 
                   list_entry(b, element_t, list)->value) <= 0) {
            *tail = a;
            tail = &a->next;
            a = a->next;
            if (!a) { 
                *tail = b;
                break;
            }
        } else {
            *tail = b;
            tail = &b->next;
            b = b->next;
            if (!b) { 
                *tail = a;
                break;
            }
        }
    }
    return head;
}

struct list_head *merge_sort(struct list_head *head)
{   
    if (head == NULL || head->next == NULL)  
        return head;

    struct list_head *left = head;
    struct list_head *right = left->next;

    while (right && right->next) {
        left = left->next;
        right = right->next->next;
    }
    right = left->next;
    left->next = NULL;
    
    left = merge_sort(head);
    right = merge_sort(right);
    
    return merge(left, right);
}       

void q_sort(struct list_head *head)
{
    if (head == NULL || list_empty(head) || list_is_singular(head))
        return;
    struct list_head *start = head->next;
    struct list_head *final = head->prev;
    start->prev = final->next = NULL;
    INIT_LIST_HEAD(head);

    head->next = merge_sort(start);
    head->next->prev = head;

    struct list_head *temp = head->next;
    for (; temp->next != NULL; temp = temp->next) {
        temp->next->prev = temp;
    }
    head->prev = temp;
    temp->next = head;
}

Indent the above source listing. Don't paste from the terminal. Instead, use proper editors which fit the use of system-wide clipboard.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

在 `qtest` 提供新的命令 `shuffle`

In Makefile, we have -g in it which means we gdb is available to trace the program.

CFLAGS = -O1 -g -Wall -Werror -Idudect -I.

To trace how qtest actually work, I place a break point under a function qtest would invoke.

(gdb) list q_new
15      /*
16       * Create empty queue.
17       * Return NULL if could not allocate space.
18       */
19      struct list_head *q_new()
20      {
21          struct list_head *head = malloc(sizeof *head);
22          if (head == NULL)
23              return NULL;
24          INIT_LIST_HEAD(head);
(gdb) b 21
Breakpoint 1 at 0x67b8: file queue.c, line 21.

Then I run bt to print backtrace of function stack.

(gdb) bt
#0  q_new () at queue.c:21
#1  0x000055555555825b in do_new (argc=<optimized out>, argv=<optimized out>) at qtest.c:129
#2  0x00005555555592e1 in interpret_cmda (argc=argc@entry=1, argv=argv@entry=0x555555566e20) at console.c:185
#3  0x000055555555982d in interpret_cmd (cmdline=<optimized out>, cmdline@entry=0x5555555669f0 "new") at console.c:208
#4  0x000055555555a2dd in run_console (infile_name=<optimized out>) at console.c:649
#5  0x00005555555587f6 in main (argc=1, argv=0x7fffffffdec8) at qtest.c:992

In interpret_cmda, I found cmd_list that should be where the commands are initialized.

cmd_ptr next_cmd = cmd_list;

Following cmd_list, the macro ADD_COMMAND and function add_cmd is found in init_cmd.

ADD_COMMAND(help, "                | Show documentation");
ADD_COMMAND(option, " [name val]     | Display or set options");
ADD_COMMAND(quit, "                | Exit program");
ADD_COMMAND(source, " file           | Read commands from source file");
ADD_COMMAND(log, " file           | Copy output to file");
ADD_COMMAND(time, " cmd arg ...    | Time command execution");
add_cmd("#", do_comment_cmd, " ...            | Display comment");
add_param("simulation", &simulation, "Start/Stop simulation mode", NULL);
add_param("verbose", &verblevel, "Verbosity level", NULL);
add_param("error", &err_limit, "Number of errors until exit", NULL);
add_param("echo", &echo, "Do/don't echo commands", NULL);

The implementation of ADD_COMMAND and prototype of add_cmd in console.h.

/* Add a new command */
void add_cmd(char *name, cmd_function operation, char *documentation);
#define ADD_COMMAND(cmd, msg) add_cmd(#cmd, do_##cmd, msg)

Commands of operations on queue are found in console_init of qtest.c.

ADD_COMMAND(new, "                | Create new queue");
ADD_COMMAND(free, "                | Delete queue");
ADD_COMMAND(
	ih, 
	" str [n]        | Insert string str at head of queue n times. "
	"Generate random string(s) if str equals RAND. (default: n == 1)");
ADD_COMMAND(
	it, 
	" str [n]        | Insert string str at tail of queue n times. "
	"Generate random string(s) if str equals RAND. (default: n == 1)");
ADD_COMMAND(
	rh,
	" [str]          | Remove from head of queue.  Optionally compare "
	"to expected value str");
ADD_COMMAND(
	rt,
	" [str]          | Remove from tail of queue.  Optionally compare "
	"to expected value str");
ADD_COMMAND(
	rhq,
	"                | Remove from head of queue without reporting value.");
ADD_COMMAND(reverse, "                | Reverse queue");
ADD_COMMAND(sort, "                | Sort queue in ascending order");
ADD_COMMAND(
	size, " [n]            | Compute queue size n times (default: n == 1)");
ADD_COMMAND(show, "                | Show queue contents");
ADD_COMMAND(dm, "                | Delete middle node in queue");
ADD_COMMAND(
	dedup, "                | Delete all nodes that have duplicate string");
ADD_COMMAND(swap,
			"                | Swap every two adjacent nodes in queue");

In console_init, I use swap as reference to implement ADD_COMMAND for shuffle.

ADD_COMMAND(shuffle, "                | Shuffle list");

Same goes to do_shuffle, do_swap is used as reference.

static bool do_shuffle(int argc, char *argv[])
{
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }

    if (!l_meta.l)
        report(3, "Warning: Calling shuffle on null queue");
    error_check();

    int cnt = q_size(l_meta.l);
    if (cnt < 2)
        report(3, "Warning: Calling shuffle on single node");
    error_check();

    set_noallocate_mode(true);
    if (exception_setup(true))
        q_shuffle(l_meta.l);
    exception_cancel();
    set_noallocate_mode(false);

    bool ok = true;

    show_queue(3);
    return ok && !error_check();
}

As the header file cannot be modified, prototype of q_shuffle is included in q_test.c.

extern void q_shuffle(struct list_head *);

q_shuffle is implemented following Fisher-Yates shuffle algorithm.

void q_shuffle(struct list_head *head)
{
    int size = q_size(head);
    struct list_head *node, *end = head;
    while (size > 0) {
        node = head->next;
        int roll = rand() % size--;
        for (int i = 0; i < roll; ++i)
            node = node->next;
        list_move_tail(end->prev, node);
        list_move_tail(node, end);
        end = end->prev;
    }
}

Image from Wikipedia

Result from qtest.

cmd> new
l = []
cmd> it RAND 8
l = [tardudv ummpv mbnegzc mfsffj fhfgbs eglgvqo gtlll skhzpncj]
cmd> shuffle
l = [skhzpncj tardudv eglgvqo gtlll fhfgbs mfsffj ummpv mbnegzc]
cmd> shuffle
l = [skhzpncj eglgvqo mfsffj fhfgbs tardudv mbnegzc ummpv gtlll]
cmd> sort
l = [eglgvqo fhfgbs gtlll mbnegzc mfsffj skhzpncj tardudv ummpv]
cmd> reverse
l = [ummpv tardudv skhzpncj mfsffj mbnegzc gtlll fhfgbs eglgvqo]
cmd> sort
l = [eglgvqo fhfgbs gtlll mbnegzc mfsffj skhzpncj tardudv ummpv]
cmd> shuffle
l = [eglgvqo tardudv skhzpncj ummpv mbnegzc gtlll mfsffj fhfgbs]
cmd> shuffle
l = [ummpv skhzpncj tardudv eglgvqo fhfgbs mbnegzc gtlll mfsffj]

在 `qtest` 提供新的命令 `web`

All codes of this section comes from lab-0 description and tiny-web-server with slight modification.

I study from tiny-web-server to better grasp how a web-server works.
After reverting to the inital commit, I removed some of the codes from main function that are not essential.

main

int main(int argc, char** argv) {
    struct sockaddr_in clientaddr;
    int default_port = 9999, clientlen = sizeof clientaddr;
    int listenfd, connfd;
	
    listenfd = open_listenfd(default_port);
	
    while(1) {
        connfd = accept(listenfd, (SA *)&clientaddr, &clientlen);
        process(connfd, &clientaddr);
        close(connfd);
    }
    return 0;
}

The program would run a web-server that let its user browse his directory.

It is clear that the main function only listen the port and would accept a file descriptor that are requesting for service.
Then it leaves everything else to process.

process

void process(int fd, struct sockaddr_in *clientaddr) {
    rio_t rio;
    char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];

    rio_readinitb(&rio, fd);
    rio_readlineb(&rio, buf, MAXLINE);

    sscanf(buf, "%s %s %s", method, uri, version);

    /* read all */
    while(buf[0] != '\n' && buf[1] != '\n') { /* \n || \r\n */
        rio_readlineb(&rio, buf, MAXLINE);
    }
    char* filename = uri;
    if(uri[0] == '/') {
        filename = uri + 1;
        if (strlen(filename) == 0){
            filename = ".";
        }
    }

    struct stat sbuf;
    int status = 200, size = 0;
    int ffd = open(filename, O_RDONLY, 0);
    if(ffd <= 0) {
        status = 404;
        char *msg = "File not found";
        client_error(fd, status, "Not found", msg);
        size = strlen(msg);
    } else {
        fstat(ffd, &sbuf);
        if(S_ISREG(sbuf.st_mode)) {
            size = sbuf.st_size;
            server_static(fd, ffd ,sbuf.st_size);
        } else if(S_ISDIR(sbuf.st_mode)) {
            status = 200;
            char *msg = "Directory listing is not implemented yet";
            size = strlen(msg);
            client_error(fd, status, "OK", msg);
        } else {
            status = 400;
            char *msg = "Unknow Error";
            size = strlen(msg);
            client_error(fd, status, "Error", msg);
        }
    }
    log_access(status, size, clientaddr, filename);
    close(ffd);
}

Currently, the web-server handles request by browsing user's directory.
I modify the program slighty to turn the program into an echo web-server.

Essential parts of program

// process
void process(int fd, struct sockaddr_in *clientaddr)
{
	http_request req;
	parse_request(fd, &req);
	handle_request(fd, req.filename);
	/* ... */
	char *p = req.filename;
	while(*p && (*p) != '\0'){
		++p;
		if(*p == '/'){
			*p = ' ';
		}
	}   
}

// handle_request
void handle_request(int out_fd, char *filename)
{
	writen(out_fd, filename, strlen(filename));
	close(out_fd);
}

Result

Shell 1

curl --http0.9 localhost:9999/hello/world

Shell 2

$ ./tiny
listen on port 9999, fd is 3
accept request, fd is 4, pid is 25999
127.0.0.1:60416 200 - 'hello world' (text/plain)

I then follow the description to port tiny.c to qtest.

Using ADD_COMMAND, I moved entiretly of main function of tiny.c to do_web().

After fixing some dependencies for tiny.c and rest of the program, I call web from qtest.

<cmd> web now opens the echo web-server, however, I can no longer make a request via a console with the web-server on.

Following the description, I try to make the program respond to both stdin_fd from console and listenfd from web-server.

select() system call does exactly this, with the code from description, the program would try to respond to both stdin and socket.

if (listenfd) {
	fd_set set; 

	FD_ZERO(&set);
	FD_SET(listenfd, &set);
	FD_SET(stdin_fd, &set);
	int rv = select(listenfd + 1, &set, NULL, NULL, NULL);
	struct sockaddr_in clientaddr;
	socklen_t clientlen = sizeof clientaddr;
	int connfd;

	switch (rv) {
		case -1:
			perror("select"); /* an error occurred */
			continue;
		case 0:
			printf("timeout occurred\n"); /* an timeout occurred */
			continue;
		default:
			if (FD_ISSET(listenfd, &set)) {
				connfd = accept(listenfd, (SA *) &clientaddr, &clientlen);
				char *p = process(connfd, &clientaddr);
				strncpy(buf, p, strlen(p) + 1);
				close(connfd);
				free(p);
				return strlen(p);
			} else if (FD_ISSET(stdin_fd, &set)) {
				nread = read(l.ifd, &c, 1);
				if (nread <= 0)
					return l.len;
			}    
			break;
	}    
} else {
	nread = read(l.ifd, &c, 1);
	if (nread <= 0)
		return l.len;
}

When the web-server is not active yet, the program will execute code as like before. If web-server is active, it will use select() system call to monitor both the file descriptors as readfds.

select system call

From select(),

Note well: Upon return, each of the file descriptor sets is modified in place to indicate which file descriptors are currently "ready". Thus, if using select() within a loop, the sets must be reinitialized before each call.

First we adds file descriptor to set. Then after file descriptor returns, we can use FD_ISSET() to determine which of the file descriptor are currently "ready". Which means they are requesting for service.

After setting-up the system call, the program would send a message to the console but qtest would not read the command from the web-server.

So following the description again I made modification to run_console() to handle the request instead of handling request in do_web(). do_web() now would only open the port to listen to the webserver.

qtest can now handle request from the web-server.

Improve your writing!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

Result

Shell 1

$ ./qtest
cmd> web
listen on port 9999, fd is 3
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60420 200 - 'new' (text/plain)

l = []
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60422 200 - 'something not a command' (text/plain)

Unknown command 'something'
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60424 200 - 'it RAND 10' (text/plain)

l = [yqllkmd mdkaoeer uvjlzx rbido bbafug yggjrhg mynrmzf qcugofg ouxir qerxyef]
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60426 200 - 'reverse' (text/plain)

l = [qerxyef ouxir qcugofg mynrmzf yggjrhg bbafug rbido uvjlzx mdkaoeer yqllkmd]
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60428 200 - 'sort' (text/plain)

l = [bbafug mdkaoeer mynrmzf ouxir qcugofg qerxyef rbido uvjlzx yggjrhg yqllkmd]
cmd> accept request, fd is 4, pid is 27571
127.0.0.1:60430 200 - 'quit' (text/plain)

Freeing queue

Shell 2

$ curl --http0.9 localhost:9999/new
I received new
$ curl --http0.9 localhost:9999/something/not/a/command
I received something/not/a/command
$ curl --http0.9 localhost:9999/it/RAND/10
I received it/RAND/10
$ curl --http0.9 localhost:9999/reverse
I received reverse
$ curl --http0.9 localhost:9999/sort
I received sort
$ curl --http0.9 localhost:9999/quit
I received quit

比較 `lib/list_sort.c` 与 `q_sort` 的效能落差

First look

Raw Data ( ms unit )

size, sort, list_sort
5000, 005, 002
10000, 010, 005
15000, 015, 009
20000, 021, 012
25000, 028, 017
30000, 033, 020
35000, 041, 024
40000, 051, 030
45000, 069, 040
50000, 077, 042
55000, 079, 046
60000, 090, 057
65000, 108, 064
70000, 121, 068
75000, 133, 083
80000, 152, 082
85000, 158, 095
90000, 179, 114
95000, 200, 117
100000, 197, 126
105000, 221, 134
110000, 232, 140
115000, 254, 157
120000, 267, 166
125000, 285, 175
130000, 303, 182
135000, 306, 190
140000, 327, 196
145000, 336, 198
150000, 347, 212
155000, 363, 222
160000, 384, 226
165000, 396, 244
170000, 409, 256
175000, 438, 263
180000, 455, 271
185000, 470, 288
190000, 467, 288
195000, 488, 301
200000, 512, 308
205000, 519, 310
210000, 547, 328
215000, 559, 333
220000, 587, 342
225000, 602, 358
230000, 618, 368
235000, 621, 372
240000, 652, 392
245000, 671, 402
250000, 687, 414
255000, 705, 424
260000, 720, 425
265000, 749, 427
270000, 740, 437
275000, 765, 423
280000, 753, 453
285000, 786, 460
290000, 779, 457
295000, 782, 468
300000, 807, 475
305000, 822, 483
310000, 884, 506
315000, 902, 528
320000, 911, 534
325000, 944, 556
330000, 928, 548
335000, 929, 551
340000, 951, 564

Data was gained by writing a simple trace file for qtest interpreter and send into a simple program as input to extract only their run time and size with time sort and size respectively.

It should be clear that list_sort outperforms q_sort at both large and small data size.

Both list_sort and q_sort uses merge sort to do the sorting, in theory, they both have the same Time Complexity

O (n l o g n)

When we draw two function

f (x) = x

and

f 2 (x) = x^{2}

on top of the graphs, where their start and end point matches,

Rate of growth ( Time Complexity ) of both q_sort and list_sort have tendency to go inbetween their

f (x)

and

f 2 (x)

counterparts, if we observe their second derivative closely.

Hence it should be safe to say that they both have time complexity

O (n) < T (n) < O (n^{2})

, which I assume is

O (n l o g n)

Hence, the disparity in performance does not come from their time complexity. Instead, they come from the hidden constant in

O (n l o g n)

First, I analyze the list_sort function.

`list_sort` 的運作原理

Full code of list_sort()

__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp)
{
	struct list_head *list = head->next, *pending = NULL;
	size_t count = 0;

	if (list == head->prev)
		return;

	head->prev->next = NULL;

	do {
		size_t bits;
		struct list_head **tail = &pending;

		for (bits = count; bits & 1; bits >>= 1)
			tail = &(*tail)->prev;
		
		if (likely(bits)) {
			struct list_head *a = *tail, *b = a->prev;

			a = merge(priv, cmp, b, a);
			a->prev = b->prev;
			*tail = a;
		}

		list->prev = pending;
		pending = list;
		list = list->next;
		pending->next = NULL;
		count++;
	} while (list);

	list = pending;
	pending = pending->prev;
	for (;;) {
		struct list_head *next = pending->prev;

		if (!next)
			break;
		list = merge(priv, cmp, pending, list);
		pending = next;
	}
	merge_final(priv, cmp, head, pending, list);
}

__attribute__((nonnull(2,3)))

From the gcc Compiler User Guide, we know that this line simply indicates which referenced argument of a function must be non-null pointer. Along with the fact cmp does not need an addition priv data, it is safe to pass NULL as the function first argument when calling list_sort.

struct list_head *list = head->next, *pending = NULL;
size_t count = 0;

if (list == head->prev)
	return;

head->prev->next = NULL;

This segment of code simply initialize some variables of the list. It also convert the list into a null-terminated singly-linked list.
We should also note that list acts as iterator for the loop and pending acts as a sorted sublists waiting to be merged.

do {
	size_t bits;
	struct list_head **tail = &pending;

	for (bits = count; bits & 1; bits >>= 1)
		tail = &(*tail)->prev;

	if (likely(bits)) {
		struct list_head *a = *tail, *b = a->prev;

		a = merge(priv, cmp, b, a);
		a->prev = b->prev;
		*tail = a;
	}

	list->prev = pending;
	pending = list;
	list = list->next;
	pending->next = NULL;
	count++;
} while (list);

This do while loop is the main body of the function, every iteration, it would seperate an element as a new sublist out from list, then, it will merge two sublist as soon as they consists of

2^{k}, k \in Z

elements.

To trace the code line-by-line, we would first reach the bottom part of the loop as count is zero initially.

list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;

This segment of code seperate an element from list to form a new sublist. Previous sublist is not lost as prev pointer is still maintaining a "list of lists".

The first 2 iteration can represents what this segment does, as the program won't enter if (likely(bits)) branch when count is either 0 or 1.

Before count = 0

count = 0

count = 1

As the program would take the list as a singuly-linked list, head is ommited in the graphical representation.
Any edges that are not showing up can be assume as it is pointing to NULL.

struct list_head **tail = &pending;

for (bits = count; bits & 1; bits >>= 1)
	tail = &(*tail)->prev;

The program will always try to merge the first two sublist with equal number of elements starting from pending. The very last sublist is at the previous most node starting from pending. What would be indicated by tail is the list about to be merge, with tail->prev.

From the comment of the original source of code, we could get a hint of what bits & 1 is trying to do.

Each time we increment count, we set one bit (bit k) and clear bits k-1 .. 0. Each time this happens (except the very first time for each bit, when count increments to
$2^{k}$ ), we merge two lists of size
$2^{k}$ into one list of size
$2^{k + 1}$ .

The comment states that the merge only NOT happen when count increments to

2^{k}

. Which is when count would have value of

2^{k} - 1

, and the set bit(s) must always be consecutive and located at the less-significant portion of the binary value. As the "count" of pending elements had

2^{0}

2^{1}

2^{2}

. . .

2^{k}

elements, no sublist have the same number of elements.

How many time to iterate through the list of lists though, is following the value of count. However, as the nearest sublist to pending with same elements in it have to be selected, the carry isn't added to directly, and would only add up when there is nothing else by the right to its digit number to add. When this happens, the column would be the one merging.This number happens to be the first clear bit of count counting from the least significant bit.

Example, from

8

(

0 b 1000

)

\begin{array}{ccc} 0 & 1 & 0 & 0 & 0 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 0 & 0 & 1 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 0 & 1 & 0 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 0 & 1 & 1 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 1 & 0 & 0 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 1 & 0 & 1 \\ + & 1 \\ 1 \end{array}

\begin{array}{ccc} 0 & 1 & 1 & 1 & 0 \\ + & 1 \\ 1 \end{array}

Every digit with carry align with the least most significant clear bit, with one exception of the addition of least most significant bit.

These carry at
$d t h$ digit means the
$d t h$ sublist from pending should be the tail

\begin{array}{ccc} 0 & 1 & 1 & 1 & 1 \\ + & 1 \\ 1 \end{array}

The only exception that don't merge is when the digit is first set.

if (likely(bits)) {
	struct list_head *a = *tail, *b = a->prev;

	a = merge(priv, cmp, b, a);
	a->prev = b->prev;
	*tail = a;
}

likely can be seen as a macro that compiler optimization to do branch prediction, provided the probability of how 'likely' it is not

0

This segment of code mainly merges two lists tail and tail->prev, which is sublist with the same number of element.

list = pending;
pending = pending->prev;
for (;;) {
	struct list_head *next = pending->prev;

	if (!next)
		break;
	list = merge(priv, cmp, pending, list);
	pending = next;
}

This segment is reached after list reach the end of list. It merges all of pending list to the final sorted list except the last pending.

merge_final(priv, cmp, head, pending, list);

This function would merge the list one final time. Quoting from the comments in the source.

Combine final list merge with restoration of standard doubly-linked list structure. This approach duplicates code from merge(), but runs faster than the tidier alternatives of either a separate final prev-link restoration pass, or maintaining the prev links throughout.

This functions rebuilds the list into the original circular, doubly-linked list.

效能分析

2^{k + 1}

2^{k}

(cache, unbalanced compares)
uses prev

As stated before, both q_sort and list_sort uses merge sort to sort the queue. And both of them are optimal mergesort, that is , in worst case, they did the minimal number of comparison.

worst-case number of comparison

A fact about minimal number of comparison that can be performed while executing merge sort is

(\sum_{l \in t r e e} h (l)) - (n - 1)

, where the external path lengths

h (l)

is minimal for each leaves.

binary tree of q_sort

binary tree of list_sort

Both tree, even though has a different structure, is optimal merge sort.

We uses number of element to compare to represent each node of tree, and merge sort can be seen as optimal when their external path length is always minimal.

q_sort would always split the list in half in every iteration of sorting. Which resembles how a balanced binary tree works, and from its definition, the external path length of a balanced binary tree is always minimal. Hence, q_sort is an optimal merge sort.

Referenced from commit message of list_sort.c commit 043b3f7, queue merge sort

list_sort could be seen as a variant of queue merge sort, which resembles Huffman encoding, where all of its leaf is 1. By induction, it is proven that the external path length of the merge tree is minimal. Hence, list_sort is an optimal merge sort.

Screenshot to show list_sort is a variant of queue_merge_sort

With equal number of comparison, q_sort does have a major flaw. That is, the algorithm to find the merge point is very inefficient.

q_sort would use fast and slow to tranverse from head of list, taking

T (n) = n / 2 = O (n)

, the time complexity is linear time.

list_sort however, would tranverse through the pending list, taking

T (n) = l o g_{2} n = O (l o g n)

, the time complexity is logarithmitic.

From what I've red from the documentation of list_sort, they did take into consideration of the size of cache. So long as the cache can fit in

3 * 2^{k}

elements, the algorithm can avoid cache thrashing.

q_sort however, does not take cache size into consideration at all. When the element is large enough, the CPU utilization may be lower as q_sort would take in a large portion of memory at a time, slowing down the system as the program is waiting for I/O. Then CPU would give even more CPU time resource to the program, making the program even slower.

開啟 `Address Sanitizer` 修正 `qtest` 執行過程中的錯誤

From lab0 requirements, it is stated that calling help command in qtest interpreter will trigger error with Address Sanitizer, however, I wasn't able to reproduce the error in my machine.

However, Address Sanitizer did report another error that isn't caught by the compiler and cppcheck.

Which is called heap-buffer-overflow.

Log

=================================================================                                                                                                                                   [1/1969]
==12654==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6040000001fe at pc 0x7f5a91a76480 bp 0x7ffeddf02680 sp 0x7ffeddf01e28
READ of size 1024 at 0x6040000001fe thread T0
    #0 0x7f5a91a7647f  (/lib/x86_64-linux-gnu/libasan.so.5+0x9b47f)
    #1 0x55f95ab81a7d in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
    #2 0x55f95ab81a7d in q_remove_head /home/user/Desktop/linux2022/lab0-c/queue.c:112
    #3 0x55f95ab7c44b in do_remove /home/user/Desktop/linux2022/lab0-c/qtest.c:395
    #4 0x55f95ab7c699 in do_rh /home/user/Desktop/linux2022/lab0-c/qtest.c:454
    #5 0x55f95ab7ed33 in interpret_cmda /home/user/Desktop/linux2022/lab0-c/console.c:189
    #6 0x55f95ab7f6d2 in interpret_cmd /home/user/Desktop/linux2022/lab0-c/console.c:212
    #7 0x55f95ab80f4b in run_console /home/user/Desktop/linux2022/lab0-c/console.c:658
    #8 0x55f95ab7de77 in main /home/user/Desktop/linux2022/lab0-c/qtest.c:1030
    #9 0x7f5a916c10b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #10 0x55f95ab7aced in _start (/mnt/lda/linux2022/lab0-c/qtest+0x8ced)

0x6040000001fe is located 0 bytes to the right of 46-byte region [0x6040000001d0,0x6040000001fe)
allocated by thread T0 here:
    #0 0x7f5a91ae8bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #1 0x55f95ab8128f in test_malloc /home/user/Desktop/linux2022/lab0-c/harness.c:138

SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib/x86_64-linux-gnu/libasan.so.5+0x9b47f) 
Shadow bytes around the buggy address:
  0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff8000: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff8010: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff8020: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
=>0x0c087fff8030: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00[06]
  0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==12654==ABORTING

From the Log, we can determine that the overflow came from memcpy of q_remove_head.

READ of size 1024 at 0x6040000001fe thread T0
    #0 0x7f5a91a7647f  (/lib/x86_64-linux-gnu/libasan.so.5+0x9b47f)
    #1 0x55f95ab81a7d in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
    #2 0x55f95ab81a7d in q_remove_head /home/user/Desktop/linux2022/lab0-c/queue.c:112
    #3 0x55f95ab7c44b in do_remove /home/user/Desktop/linux2022/lab0-c/qtest.c:395
    #4 0x55f95ab7c699 in do_rh /home/user/Desktop/linux2022/lab0-c/qtest.c:454
    #5 0x55f95ab7ed33 in interpret_cmda /home/user/Desktop/linux2022/lab0-c/console.c:189
    #6 0x55f95ab7f6d2 in interpret_cmd /home/user/Desktop/linux2022/lab0-c/console.c:212
    #7 0x55f95ab80f4b in run_console /home/user/Desktop/linux2022/lab0-c/console.c:658
    #8 0x55f95ab7de77 in main /home/user/Desktop/linux2022/lab0-c/qtest.c:1030
    #9 0x7f5a916c10b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #10 0x55f95ab7aced in _start (/mnt/lda/linux2022/lab0-c/qtest+0x8ced)

My implementation of memcpy would always tries to copy bufsize - 1 characters to sp, but the size allocated for list_entry(...,element_t,...)->value is not always bufsize - 1, in fact, it would be bufsize - 1 in size only at few circustances. Hence, heap-buffer-overflow occurs.

memcpy(sp, list_entry(node, element_t, list)->value, buf_size - 1);
sp[buf_size - 1] = '\0';

A simple fix is to simply replace buf_size - 1 with the smaller entry of buf_size - 1 and size of list_entry(...,element_t,...)->value.

size_t len = (bufsize - 1) ^ (((bufsize - 1) ^ (strlen(list_entry(node, element_t, list)->value))) & -(strlen(list_entry(node, element_t, list)->value) < (bufsize - 1)));
memcpy(sp, list_entry(node, element_t, list)->value, len);
sp[len] = '\0';

運用 `Valgrind` 排除 `qtest` 實作的記憶體錯誤

When using Valgrind to run qtest directly with $ valgrind ./qtest, there would not be any error messages shown by Valgrind. However, when running it with Makefile $ make valgrind, the error messages would show up.

==13243== 5 bytes in 1 blocks are still reachable in loss record 1 of 2
==13243==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13243==    by 0x4A5250E: strdup (strdup.c:42)
==13243==    by 0x111F4F: linenoiseHistoryAdd (linenoise.c:1276)
==13243==    by 0x112C71: linenoiseHistoryLoad (linenoise.c:1365)
==13243==    by 0x10DA17: main (qtest.c:1019)
==13243== 
==13243== 160 bytes in 1 blocks are still reachable in loss record 2 of 2
==13243==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13243==    by 0x111F0F: linenoiseHistoryAdd (linenoise.c:1264)
==13243==    by 0x112C71: linenoiseHistoryLoad (linenoise.c:1365)
==13243==    by 0x10DA17: main (qtest.c:1019)
==13243==

After some attempts to recreate the error, I found out that the error would only show up only when qtest is trying to interpret command from other source file. And the error messages are clearly associated with linenoiseHistory.

When I was trying to implement the web-server, I learnt that linenoise provides some tool that allows a more friendly interface for reading user-input. Which they are not needed when reading from a external file.

The problem came from the program somehow still allocate memory for history even though linenoise is not needed at all.

Following the backtrace, I discover that the main function from qtest.c would still call linenoiseHistoryLoad() which would then call linenoiseHistoryAdd() that would allocate memory.

Normally, the allocated memory would be freed in run_console() of console.c. However, this is not the case, as run_console() would test for external file with has_infile and will never call freeHistory().

My fix is to simply add a boolean guard has_infile to stop qtest.c from calling any of the linenoise function if it is reading from external file.

if(!has_infile){
	linenoiseSetCompletionCallback(completion);
	linenoiseHistorySetMaxLen(HISTORY_LEN);
	linenoiseHistoryLoad(HISTORY_FILE);
}

〈Dude, is my code constant time?〉

t-distribution

Central Limit Theorem and normal distribution assumed standard deviation is known. This assumption is not reasonable in real scenarios however. In said scenarios, is where t-distribution could be useful.

t-Distribution estimate the standard deviation of population with standard deviation of sample, if sample size is large enough, the distribution will not differs by alot from the standard normal distribution.

A statistics called degree of freedom is used to measure how close t-Distribution is to normal distribution. Where more distribution means larger sample size.

When we look at the shape of t-Distribution, we can see that the tail is thicker at lower degree of freedom, as the confidence value is based on area of distribution (p = 0.95), when the tail is thicker, the threshold to reject is higher, which means its harder to reject the value.

This is useful especially when we don't have a large sample size (data), as the null hypothesis is less likely to be rejected before we got a larger sample size, unless the program is very confident of the result.

dudect

The program would perform a leakage detection test, then test if the execution time is statistically different. If they are, the code is not constant time. The tool is said to not rely on static analysis but on statistical analysis, and don't need to model underlying hardware to do the test.

measure time

fix-vs-random leakage detection

a class of input data is fixed into constant value, and other is randomly chosen.

constant value

// in prepare_inputs
randombytes(input_data, n_measure * chunk_size); 
for (size_t i = 0; i < n_measure; i++) {
	classes[i] = randombit();
	if (classes[i] == 0)
		memset(input_data + (size_t) i * chunk_size, 0, chunk_size);
}

random value

for (size_t i = 0; i < N_MEASURE; ++i) {
	/* Generate random string */
	randombytes((uint8_t *) random_string[i], 7); 
	random_string[i][7] = 0;
}

extract cycle count directly from cycle counters provided by modern CPUs.

lower end CPU would use external equipment or high-resolution timers.

extraction of x86 cycle counter

unsigned int hi, lo;                                                                                                                                                                                    
__asm__ volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi));
return ((int64_t) lo) | (((int64_t) hi) << 32);

apply post-processing

cropping

the program would discard measurement larger than a fixed, class-independent threshold.

I don't see cropping at report() and t-test only test for first-order statistics

Higher-order preprocessing

the program would apply some higher order pre-processing to mimick higher-order attacks.

Apply statistical test

t-test

check ttest.c

uses Welch's t-test, failure implies first-order timing information leakage.

Problems

False positives

The paper did include False positives as a topic in the discussion section.

When I test for constant time with trace-17, I don't seem to pass the test on my local machine, however the code never fail the test at Github Action. Implying either false positive at github action virtual ware, or the test depends on the underlying hardware (which defeats the point of this paper is no need to model underlying hardware to measure leakage).

2022q1 Homework1 (lab0)

實驗環境

開發記錄

q_new

q_free

q_insert_head

q_insert_tail

q_remove_head

q_remove_tail

q_size

q_delete_mid

q_delete_dup

q_swap

q_reverse

q_sort

在 qtest 提供新的命令 shuffle

在 qtest 提供新的命令 web

比較 lib/list_sort.c 与 q_sort 的效能落差

First look

list_sort 的運作原理

效能分析

開啟 Address Sanitizer 修正 qtest 執行過程中的錯誤

運用 Valgrind 排除 qtest 實作的記憶體錯誤

〈Dude, is my code constant time?〉

t-distribution

dudect

measure time

apply post-processing

Apply statistical test

Problems

False positives

Read more

2022q1 quiz15

linux internals 2022

2022-04-04 cantfindagoodname

2022q1 Homework3 (fibdrv)

在 `qtest` 提供新的命令 `shuffle`

在 `qtest` 提供新的命令 `web`

比較 `lib/list_sort.c` 与 `q_sort` 的效能落差

`list_sort` 的運作原理

開啟 `Address Sanitizer` 修正 `qtest` 執行過程中的錯誤

運用 `Valgrind` 排除 `qtest` 實作的記憶體錯誤