2019q1 Homework1 (lab0)

# 2019q1 Homework1 (lab0) contributed by <`johnnylord` > ###### tags: `sysprog2019` ## Implementation of queue ### Design of the queue data structure As the docs said, `q_insert_tail` and `q_size` should operate in time ==O(1)==, I need to **add additional field** to track the tail and the size of queue. ```clike /* Queue structure */ typedef struct { list_ele_t *head; // Track the head of queue list_ele_t *tail; // Track the tail of queue unsigned int size; // Track the size of queue } queue_t; ``` ### q_new() Create a new, empty queue. ```clike queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); // If there is no free space for malloc if (!q) { return NULL; } q->head = NULL; q->tail = NULL; q->size = 0; return q; } ``` When construct or new a brand new queue, **make sure you initialize all the fields in it**. Otherwise, there is a big chance that other functions(e.g `q_insert_head`, `q_insert_tail`, etc) might use the wrong value of these fields. ### q_free() Free all storage used by a queue ```clike void q_free(queue_t *q) { list_ele_t *last = NULL; // Check if q can be freed or not if (!q) { return; } /* Free all the list elements, including their strings */ for (list_ele_t *ptr = q->head; ptr != NULL; last = ptr, ptr = ptr->next) { if (last) { free(last); } free(ptr->value); } // Free the last element left by above loop if exist if (last) { free(last); } free(q); } ``` As **the space of the string** of each list element is allocated from the **heap**, I have to free the space of string **before** freeing the list element. Otherwise, there will be lots of **memory leak** in the heap space. ### q_insert_head() Attempt to insert a new element at the head of the queue (LIFO discipline). ```clike bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh; char *value; // Check if q is valid queue or not if (!q) { return false; } // Allocate memory for new element if (!(newh = malloc(sizeof(list_ele_t)))) { return false; } if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) { free(newh); return false; } // copy string into value strcpy(value, s); newh->value = value; // Update linked list newh->next = q->head; q->head = newh; if (!q_size(q)) { q->tail = newh; } q->size++; return true; } ``` Thanks to the static analysis feature provided by `cppcheck`, it helps me find sections that cause memory leak. Because I've already allocated some space at Line 12 in the source code, I need to free the space if there is any exception occurs in the following section. ```clike=16 if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) { free(newh); return false; } ``` ### q_insert_tail() Attempt to insert a new element at the tail of the queue. ```clike bool q_insert_tail(queue_t *q, char *s) { list_ele_t *newh; char *value; // Check if q is valid queue or not if (!q) { return false; } // Allocate memory for new element if (!(newh = malloc(sizeof(list_ele_t)))) { return false; } if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) { free(newh); return false; } // copy string into value strcpy(value, s); newh->value = value; // Update linked list newh->next = NULL; if (!q_size(q)) { q->head = q->tail = newh; } else { q->tail->next = newh; q->tail = newh; } q->size++; return true; } ``` There is no big difference between `q_insert_tail` and `q_insert_head`. Just be careful when dealing with the update of the linked list. As I've kept the information of tail of the queue in `q->tail`, I can append the new element directly next to `q->tail` without traversing the queue all the way from `q->head` to the end of the queue. ### q_remove_head() Attempt to remove the element at the head of the queue. ```clike bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { list_ele_t *remove_ele; if (!q || !q_size(q)) { return false; } // Update linked list remove_ele = q->head; q->head = q->head->next; // Copy removed element's value to sp if (sp) { strncpy(sp, remove_ele->value, bufsize - 1); sp[bufsize - 1] = '\0'; } // Free the removed element free(remove_ele->value); free(remove_ele); q->size--; return true; } ``` Just write the code as the docs said. Easy~ ### q_size() Compute the number of elements in the queue. ```clike int q_size(queue_t *q) { if (!q) { return 0; } return q->size; } ``` As I keep the `size` information in the queue data structure itself, I don't have to traverse the queue and count the element. Just return the size information of the queue. Easiest part~~ **However, I think the return type of this function should be `unsigned int`. The size of the queue won't be negative.** ### q_reverse() Reorder the list so that the queue elements are reversed in order. ```clike void q_reverse(queue_t *q) { list_ele_t *last, *ptr, *next, *tmp; if (!q || !q_size(q)) { return; } // Reverse the linked-list order last = NULL; ptr = q->head; next = q->head->next; while (ptr != NULL) { ptr->next = last; last = ptr; ptr = next; if (next) { next = next->next; } } // Update the head and tail fields of the q tmp = q->head; q->head = q->tail; q->tail = tmp; return; } ``` ## How qtest works Here is the entry point of `qtest`. First of all, do associated action based on the options and arguments user type in. ```clike #define BUFSIZE 256 int main(int argc, char *argv[]) { /* To hold input file name */ char buf[BUFSIZE]; char *infile_name = NULL; char lbuf[BUFSIZE]; char *logfile_name = NULL; int level = 4; int c; while ((c = getopt(argc, argv, "hv:f:l:")) != -1) { switch (c) { case 'h': usage(argv[0]); break; case 'f': strncpy(buf, optarg, BUFSIZE); buf[BUFSIZE - 1] = '\0'; infile_name = buf; break; case 'v': level = atoi(optarg); break; case 'l': strncpy(lbuf, optarg, BUFSIZE); buf[BUFSIZE - 1] = '\0'; logfile_name = lbuf; break; default: printf("Unknown option '%c'\n", c); usage(argv[0]); break; } } // Skip ... } ``` > Reference > `$ man 3 getopt` Then, it does some essential initialization(e.g queue, console, etc). ```clike int main(int argc, char *argv[]) { // Skip ... queue_init(); init_cmd(); console_init(); // Skip ... } ``` * **queue_init()** Initialize the essential global variable for queue and register two signal handlers. ```clike /* Signal handlers */ void sigsegvhandler(int sig) { trigger_exception( "Segmentation fault occurred. You dereferenced a NULL or invalid " "pointer"); } void sigalrmhandler(int sig) { trigger_exception( "Time limit exceeded. Either you are in an infinite loop, or your " "code is too inefficient"); } static void queue_init() { fail_count = 0; q = NULL; signal(SIGSEGV, sigsegvhandler); signal(SIGALRM, sigalrmhandler); } ``` > Use `sigaction` instead of `signal` as the docs said below :::danger Portability The only portable use of signal() is to set a signal's disposition to SIG_DFL or SIG_IGN. **The semantics when using signal() to establish a signal handler vary across systems** (and POSIX.1 explicitly permits this variation); do not use it for this purpose. **POSIX.1 solved the portability mess by specifying sigaction(2)**, which provides explicit control of the semantics when a signal handler is invoked; use that interface instead ofsignal(). > Reference > `man 2 signal` > Section 'Portability' ::: * **init_cmd()** Setup the commands that can be used in the console. ```clike // In console.c void init_cmd() { cmd_list = NULL; param_list = NULL; err_cnt = 0; quit_flag = false; add_cmd("help", do_help_cmd, " | Show documentation"); add_cmd("option", do_option_cmd, " [name val] | Display or set options"); add_cmd("quit", do_quit_cmd, " | Exit program"); add_cmd("source", do_source_cmd, " file | Read commands from source file"); add_cmd("log", do_log_cmd, " file | Copy output to file"); add_cmd("time", do_time_cmd, " cmd arg ... | Time command execution"); add_cmd("#", do_comment_cmd, " ... | Display comment"); add_param("verbose", &verblevel, "Verbosity level", NULL); add_param("error", &err_limit, "Number of errors until exit", NULL); add_param("echo", &echo, "Do/don't echo commands", NULL); #if 0 add_param("megabytes", &mblimit, "Maximum megabytes allowed", NULL); add_param("seconds", &timelimit, "Maximum seconds allowed", change_timeout); #endif init_in(); init_time(&last_time); first_time = last_time; } ``` Here's the data type of some variables used in `init_cmd`. As they are all `static` variable, these variables are only visible in `console.c`. ```clike // In console.c static cmd_ptr cmd_list = NULL; static param_ptr param_list = NULL; static int err_cnt = 0; static bool quit_flag = false; ``` ```clike // In console.h /* Implementation of simple command-line interface */ /* Each command defined in terms of a function */ typedef bool (*cmd_function)(int argc, char *argv[]); /* Information about each command */ /* Organized as linked list in alphabetical order */ typedef struct CELE cmd_ele, *cmd_ptr; struct CELE { char *name; cmd_function operation; char *documentation; cmd_ptr next; }; ``` We can see that the type of `cmd_list` is `cmd_ptr`. And `cmd_ptr` is an alias of `struct CELE*`. When look into the `struct CELE` further, it's a one way linked-list, and each element stores the information of its command metadata(e.g `name` and `documentation`) and its associated **function's address**. > Here is the circumstance that we can do sorting on the linked-list(if the `cmd_list` is out of order), as the `cmd_list` is sorted in alphabetical order. ==Jserv teacher asked student, "Why we need to do sorting on linked-list?"== Let's take a look at how `add_cmd` work... ```clike /* Add a new command */ void add_cmd(char *name, cmd_function operation, char *documentation) { cmd_ptr next_cmd = cmd_list; cmd_ptr *last_loc = &cmd_list; while (next_cmd && strcmp(name, next_cmd->name) > 0) { last_loc = &next_cmd->next; next_cmd = next_cmd->next; } cmd_ptr ele = (cmd_ptr) malloc_or_fail(sizeof(cmd_ele), "add_cmd"); ele->name = name; ele->operation = operation; ele->documentation = documentation; ele->next = next_cmd; *last_loc = ele; } ``` Here is a interesting part, the function use double pointer, `cmd_ptr*`, to keep track of the last location(either the address of **head of the list** or the address of `next` field inside the structure). **In this way, the code can be written in a more general way**. If I implements this, I might write the code in this way... I have to use `if` statement to determine the last location, and take associated action. (Not general) ```clike void add_cmd(char *name, cmd_function operation, char *documentation) { cmd_ptr prev = NULL; cmd_ptr ptr = cmd_list; while (ptr && strcmp(name, ptr->name) > 0) { prev = ptr; ptr = ptr->next; } cmd_ptr ele = (cmd_ptr) malloc_or_fail(sizeof(cmd_ele), "add_cmd"); ele->name = name; ele->operation = operation; ele->documentation = documentation; ele->next = ptr; if (pre) pre->next = ele; else cmd_list = ele; } ``` ### Console section The technique that it used to deal with the I/O is something called [RIO(Robust I/O)](https://hackmd.io/s/H1TtmVTTz). RIO is a buffered IO. One of the advantage of buffered I/O is that it can decrease the number of time it calls system call(e.g read, write, etc) Here is a simple demo that show the difference when we used buffered I/O and unbuffered I/O. * **buffered version** ```clike // buffered.c # include <stdio.h> int main(int argc, char *argv[]) { /* Buffered input */ printf("a.."); printf("b.."); printf("c.."); printf("d.."); fflush(NULL); // or use printf("\n"); return 0; } ``` ``` $ gcc -o buffered buffered.c $ strace -e write ./buffered write(1, "a..b..c..d..", 12a..b..c..d..) = 12 ``` * **non-buffered version** ```clike // nonbuffered.c # include <stdio.h> int main(int argc, char *argv[]) { /* nonbuffered input */ printf("a..\n"); printf("b..\n"); printf("c..\n"); printf("d..\n"); return 0; } ``` ``` $ gcc -o nonbuffered nonbuffered.c $ strace -e write ./nonbuffered write(1, "a..\n", 4a.. ) = 4 write(1, "b..\n", 4b.. ) = 4 write(1, "c..\n", 4c.. ) = 4 write(1, "d..\n", 4d.. ) = 4 ``` I use `strace` tool to analyze the system call the program calls. If I don't buffered the input, like `nonbuffered.c`, there will be lots of I/O system call. > The more system calls, the more context-switch. And in consequence, slow down the program. Here is the data structure of RIO. ```clike #define RIO_BUFSIZE 8192 typedef struct RIO_ELE rio_t, *rio_ptr; struct RIO_ELE { int fd; /* File descriptor */ int cnt; /* Unread bytes in internal buffer */ char *bufptr; /* Next unread byte in internal buffer */ char buf[RIO_BUFSIZE]; /* Internal buffer */ rio_ptr prev; /* Next element in stack */ }; rio_ptr buf_stack; ``` In the process, all the opened test files will have their own RIO data structure. `fd` will keep the file descriptor of the file, and `buf` is the buffer for its file. And here is how the program utilize this data structure ```clike= /* Read command from input file. When hit EOF, close that file and return NULL */ static char *readline() { int cnt; char c; char *lptr = linebuf; if (buf_stack == NULL) return NULL; for (cnt = 0; cnt < RIO_BUFSIZE - 2; cnt++) { if (buf_stack->cnt <= 0) { /* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; if (buf_stack->cnt <= 0) { /* Encountered EOF */ pop_file(); if (cnt > 0) { /* Last line of file did not terminate with newline. */ /* Terminate line & return it */ *lptr++ = '\n'; *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; } else return NULL; } } /* Have text in buffer */ c = *buf_stack->bufptr++; *lptr++ = c; buf_stack->cnt--; if (c == '\n') break; } if (c != '\n') { /* Hit buffer limit. Artificially terminate line */ *lptr++ = '\n'; } *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; } ``` Basically, this function will contiuously read content from the file until it encounter `\n` or reach `EOF`... ```clike=15 /* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; ``` And then it take out each character in the buffered, and process then. ```clike static char *readline() { int cnt; char c; char *lptr = linebuf; if (buf_stack == NULL) return NULL; for (cnt = 0; cnt < RIO_BUFSIZE - 2; cnt++) { if (buf_stack->cnt <= 0) { /* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; // skip ... } /* Have text in buffer */ c = *buf_stack->bufptr++; *lptr++ = c; buf_stack->cnt--; if (c == '\n') break; } // Skip ... return linebuf; } ``` After reading a line, we need to interpret the line. The process will then call some functions in order `...` -> `parse_args` -> `interpret_cmda` -> `...`. The `parse-args` will transform the line into a list of arguments. And send the number of arguments(`argc`) and the list of arguments(`argv`) to `interpret_cmda` for further processing. Inside `interpret_cmda`... ```clike= /* Execute a command that has already been split into arguments */ static bool interpret_cmda(int argc, char *argv[]) { if (argc == 0) return true; /* Try to find matching command */ cmd_ptr next_cmd = cmd_list; bool ok = true; while (next_cmd && strcmp(argv[0], next_cmd->name) != 0) next_cmd = next_cmd->next; if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); } else { report(1, "Unknown command '%s'", argv[0]); record_error(); ok = false; } return ok; } ``` We can see that it will try to find the function named `argv[0]` in the `cmd_list`, and call the function. If the return value or the value of `ok` is not true, then it will update the global variable `err_cnt` in `record_error()`. ```clike=7 while (next_cmd && strcmp(argv[0], next_cmd->name) != 0) next_cmd = next_cmd->next; if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); } ``` ```clike void record_error() { err_cnt++; if (err_cnt >= err_limit) { report(1, "Error limit exceeded. Stopping command execution"); quit_flag = true; } } ``` Then it will check if there is any error(`err_cnt` should be `0`) during the process and return associated value. ```clike bool run_console(char *infile_name) { if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } while (!cmd_done()) { cmd_select(0, NULL, NULL, NULL, NULL); } return err_cnt == 0; } ``` Finally, the process return `1` if there is any error that occurs during the process, and `0` otherwise. ```clike #define BUFSIZE 256 int main(int argc, char *argv[]) { /* To hold input file name */ char buf[BUFSIZE]; char *infile_name = NULL; char lbuf[BUFSIZE]; char *logfile_name = NULL; int level = 4; int c; // Skip ... bool ok = true; ok = ok && run_console(infile_name); ok = ok && finish_cmd(); return ok ? 0 : 1; } ``` ## How automatic grading system works By seeing the `Makefile`, I can tell that it executes the `driver.py` script in the scripts/ folder. ```make test: qtest scripts/driver.py scripts/driver.py ``` Thte `Tracer` stores the information about each test case, and store the score we can get. ```python= class Tracer: traceDirectory = "./traces" qtest = "./qtest" command = qtest verbLevel = 0 autograde = False useValgrind = False traceDict = { 1 : "trace-01-ops", 2 : "trace-02-ops", 3 : "trace-03-ops", 4 : "trace-04-ops", 5 : "trace-05-ops", 6 : "trace-06-string", 7 : "trace-07-robust", 8 : "trace-08-robust", 9 : "trace-09-robust", 10 : "trace-10-malloc", 11 : "trace-11-malloc", 12 : "trace-12-malloc", 13 : "trace-13-perf", 14 : "trace-14-perf", 15 : "trace-15-perf" } traceProbs = { 1 : "Trace-01", 2 : "Trace-02", 3 : "Trace-03", 4 : "Trace-04", 5 : "Trace-05", 6 : "Trace-06", 7 : "Trace-07", 8 : "Trace-08", 9 : "Trace-09", 10 : "Trace-10", 11 : "Trace-11", 12 : "Trace-12", 13 : "Trace-13", 14 : "Trace-14", 15 : "Trace-15" } maxScores = [0, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7] # ... ``` This script will try to run each test case and figure out the score we get. ```python def run(self, tid = 0): # Skip ... for t in tidList: tname = self.traceDict[t] if self.verbLevel > 0: print("+++ TESTING trace %s:" % tname) ok = self.runTrace(t) maxval = self.maxScores[t] tval = maxval if ok else 0 print("---\t%s\t%d/%d" % (tname, tval, maxval)) score += tval maxscore += maxval scoreDict[t] = tval print("---\tTOTAL\t\t%d/%d" % (score, maxscore)) # Skip ... ``` ```python def runTrace(self, tid): # Skip ... fname = "%s/%s.cmd" % (self.traceDirectory, self.traceDict[tid]) vname = "%d" % self.verbLevel clist = [self.command, self.qtest, "-v", vname, "-f", fname] try: retcode = subprocess.call(clist) except Exception as e: print("Call of '%s' failed: %s" % (" ".join(clist), e)) return False return retcode == 0 ``` ## Things that I'm still working on... - Learn how to use `gdb` to trace the code and analyze the behaviour during the process > Inspired by [ztex](https://hackmd.io/s/BkYsI4FBN)'s hackmd - Learn `valgrind` to analyze memory usage > Inspired by [Naetw](https://hackmd.io/s/BysQssYHN)'s hackmd