Try   HackMD

2019q1 Homework1 (lab0)

contributed by <johnnylord >

tags: sysprog2019

Implementation of queue

Design of the queue data structure

As the docs said, q_insert_tail and q_size should operate in time O(1), I need to add additional field to track the tail and the size of queue.

/* Queue structure */
typedef struct {
    list_ele_t *head; // Track the head of queue
    list_ele_t *tail; // Track the tail of queue
    unsigned int size; // Track the size of queue
} queue_t;

q_new()

Create a new, empty queue.

queue_t *q_new()
{
    queue_t *q = malloc(sizeof(queue_t));

    // If there is no free space for malloc
    if (!q) {
        return NULL;
    }

    q->head = NULL;
    q->tail = NULL;
    q->size = 0;
    return q;
}

When construct or new a brand new queue, make sure you initialize all the fields in it. Otherwise, there is a big chance that other functions(e.g q_insert_head, q_insert_tail, etc) might use the wrong value of these fields.

q_free()

Free all storage used by a queue

void q_free(queue_t *q)
{
    list_ele_t *last = NULL;

    // Check if q can be freed or not
    if (!q) {
        return;
    }

    /* Free all the list elements, including their strings */
    for (list_ele_t *ptr = q->head; ptr != NULL; last = ptr, ptr = ptr->next) {
        if (last) {
            free(last);
        }
        free(ptr->value);
    }

    // Free the last element left by above loop if exist
    if (last) {
        free(last);
    }

    free(q);
}

As the space of the string of each list element is allocated from the heap, I have to free the space of string before freeing the list element. Otherwise, there will be lots of memory leak in the heap space.

q_insert_head()

Attempt to insert a new element at the head of the queue (LIFO discipline).

bool q_insert_head(queue_t *q, char *s)
{
    list_ele_t *newh;
    char *value;

    // Check if q is valid queue or not
    if (!q) {
        return false;
    }

    // Allocate memory for new element
    if (!(newh = malloc(sizeof(list_ele_t)))) {
        return false;
    }

    if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) {
        free(newh);
        return false;
    }

    // copy string into value
    strcpy(value, s);
    newh->value = value;

    // Update linked list
    newh->next = q->head;
    q->head = newh;
    if (!q_size(q)) {
        q->tail = newh;
    }

    q->size++;
    return true;
}

Thanks to the static analysis feature provided by cppcheck, it helps me find sections that cause memory leak. Because I've already allocated some space at Line 12 in the source code, I need to free the space if there is any exception occurs in the following section.

if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) { free(newh); return false; }

q_insert_tail()

Attempt to insert a new element at the tail of the queue.

bool q_insert_tail(queue_t *q, char *s)
{
    list_ele_t *newh;
    char *value;

    // Check if q is valid queue or not
    if (!q) {
        return false;
    }

    // Allocate memory for new element
    if (!(newh = malloc(sizeof(list_ele_t)))) {
        return false;
    }

    if (!(value = malloc(sizeof(char) * (strlen(s) + 1)))) {
        free(newh);
        return false;
    }

    // copy string into value
    strcpy(value, s);
    newh->value = value;

    // Update linked list
    newh->next = NULL;
    if (!q_size(q)) {
        q->head = q->tail = newh;
    } else {
        q->tail->next = newh;
        q->tail = newh;
    }

    q->size++;
    return true;
}

There is no big difference between q_insert_tail and q_insert_head. Just be careful when dealing with the update of the linked list. As I've kept the information of tail of the queue in q->tail, I can append the new element directly next to q->tail without traversing the queue all the way from q->head to the end of the queue.

q_remove_head()

Attempt to remove the element at the head of the queue.

bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
    list_ele_t *remove_ele;

    if (!q || !q_size(q)) {
        return false;
    }

    // Update linked list
    remove_ele = q->head;
    q->head = q->head->next;

    // Copy removed element's value to sp
    if (sp) {
        strncpy(sp, remove_ele->value, bufsize - 1);
        sp[bufsize - 1] = '\0';
    }

    // Free the removed element
    free(remove_ele->value);
    free(remove_ele);
    q->size--;
    return true;
}

Just write the code as the docs said. Easy~

q_size()

Compute the number of elements in the queue.

int q_size(queue_t *q)
{
    if (!q) {
        return 0;
    }

    return q->size;
}

As I keep the size information in the queue data structure itself, I don't have to traverse the queue and count the element. Just return the size information of the queue. Easiest part~~
However, I think the return type of this function should be unsigned int. The size of the queue won't be negative.

q_reverse()

Reorder the list so that the queue elements are reversed in order.

void q_reverse(queue_t *q)
{
    list_ele_t *last, *ptr, *next, *tmp;

    if (!q || !q_size(q)) {
        return;
    }

    // Reverse the linked-list order
    last = NULL;
    ptr = q->head;
    next = q->head->next;

    while (ptr != NULL) {
        ptr->next = last;
        last = ptr;
        ptr = next;
        if (next) {
            next = next->next;
        }
    }

    // Update the head and tail fields of the q
    tmp = q->head;
    q->head = q->tail;
    q->tail = tmp;
    return;
}

How qtest works

Here is the entry point of qtest. First of all, do associated action based on the options and arguments user type in.

#define BUFSIZE 256

int main(int argc, char *argv[])
{
    /* To hold input file name */
    char buf[BUFSIZE];
    char *infile_name = NULL;
    char lbuf[BUFSIZE];
    char *logfile_name = NULL;
    int level = 4;
    int c;

    while ((c = getopt(argc, argv, "hv:f:l:")) != -1) {
        switch (c) {
        case 'h':
            usage(argv[0]);
            break;
        case 'f':
            strncpy(buf, optarg, BUFSIZE);
            buf[BUFSIZE - 1] = '\0';
            infile_name = buf;
            break;
        case 'v':
            level = atoi(optarg);
            break;
        case 'l':
            strncpy(lbuf, optarg, BUFSIZE);
            buf[BUFSIZE - 1] = '\0';
            logfile_name = lbuf;
            break;
        default:
            printf("Unknown option '%c'\n", c);
            usage(argv[0]);
            break;
        }
    }
    
    // Skip ...

}

Reference
$ man 3 getopt

Then, it does some essential initialization(e.g queue, console, etc).

int main(int argc, char *argv[]) {
    
    // Skip ...
    
    queue_init();
    init_cmd();
    console_init();
    
    // Skip ...
}
  • queue_init()
    Initialize the essential global variable for queue and register two signal handlers.
/* Signal handlers */
void sigsegvhandler(int sig)
{
    trigger_exception(
        "Segmentation fault occurred.  You dereferenced a NULL or invalid "
        "pointer");
}

void sigalrmhandler(int sig)
{
    trigger_exception(
        "Time limit exceeded.  Either you are in an infinite loop, or your "
        "code is too inefficient");
}


static void queue_init()
{
    fail_count = 0;
    q = NULL;
    signal(SIGSEGV, sigsegvhandler);
    signal(SIGALRM, sigalrmhandler);
}

Use sigaction instead of signal as the docs said below

Portability
The only portable use of signal() is to set a signal's disposition to SIG_DFL or SIG_IGN.
The semantics when using signal() to establish a signal handler vary across systems (and POSIX.1 explicitly permits this variation); do not use it for this purpose.

POSIX.1 solved the portability mess by specifying sigaction(2), which provides explicit control of the semantics when a signal handler is invoked; use that interface instead ofsignal().

Reference
man 2 signal > Section 'Portability'

  • init_cmd()
    Setup the commands that can be used in the console.
// In console.c

void init_cmd()
{
    cmd_list = NULL;
    param_list = NULL;
    err_cnt = 0;
    quit_flag = false;
    add_cmd("help", do_help_cmd, "                | Show documentation");
    add_cmd("option", do_option_cmd,
            " [name val]     | Display or set options");
    add_cmd("quit", do_quit_cmd, "                | Exit program");
    add_cmd("source", do_source_cmd,
            " file           | Read commands from source file");
    add_cmd("log", do_log_cmd, " file           | Copy output to file");
    add_cmd("time", do_time_cmd, " cmd arg ...    | Time command execution");
    add_cmd("#", do_comment_cmd, " ...            | Display comment");
    add_param("verbose", &verblevel, "Verbosity level", NULL);
    add_param("error", &err_limit, "Number of errors until exit", NULL);
    add_param("echo", &echo, "Do/don't echo commands", NULL);
#if 0
    add_param("megabytes", &mblimit, "Maximum megabytes allowed", NULL);
    add_param("seconds", &timelimit, "Maximum seconds allowed",
              change_timeout);
#endif
    init_in();
    init_time(&last_time);
    first_time = last_time;
}

Here's the data type of some variables used in init_cmd. As they are all static variable, these variables are only visible in console.c.

// In console.c
static cmd_ptr cmd_list = NULL;
static param_ptr param_list = NULL;
static int err_cnt = 0;
static bool quit_flag = false;
// In console.h
/* Implementation of simple command-line interface */

/* Each command defined in terms of a function */
typedef bool (*cmd_function)(int argc, char *argv[]);

/* Information about each command */
/* Organized as linked list in alphabetical order */
typedef struct CELE cmd_ele, *cmd_ptr;
struct CELE {
    char *name;
    cmd_function operation;
    char *documentation;
    cmd_ptr next;
};

We can see that the type of cmd_list is cmd_ptr. And cmd_ptr is an alias of struct CELE*. When look into the struct CELE further, it's a one way linked-list, and each element stores the information of its command metadata(e.g name and documentation) and its associated function's address.

Here is the circumstance that we can do sorting on the linked-list(if the cmd_list is out of order), as the cmd_list is sorted in alphabetical order. Jserv teacher asked student, "Why we need to do sorting on linked-list?"

Let's take a look at how add_cmd work

/* Add a new command */
void add_cmd(char *name, cmd_function operation, char *documentation)
{
    cmd_ptr next_cmd = cmd_list;
    cmd_ptr *last_loc = &cmd_list;
    while (next_cmd && strcmp(name, next_cmd->name) > 0) {
        last_loc = &next_cmd->next;
        next_cmd = next_cmd->next;
    }
    cmd_ptr ele = (cmd_ptr) malloc_or_fail(sizeof(cmd_ele), "add_cmd");
    ele->name = name;
    ele->operation = operation;
    ele->documentation = documentation;
    ele->next = next_cmd;
    *last_loc = ele;
}

Here is a interesting part, the function use double pointer, cmd_ptr*, to keep track of the last location(either the address of head of the list or the address of next field inside the structure). In this way, the code can be written in a more general way.

If I implements this, I might write the code in this way I have to use if statement to determine the last location, and take associated action. (Not general)

void add_cmd(char *name, cmd_function operation, char *documentation)
{
    cmd_ptr prev = NULL;
    cmd_ptr ptr = cmd_list;
    
    while (ptr && strcmp(name, ptr->name) > 0) {
        prev = ptr;
        ptr = ptr->next;
    }
    
    cmd_ptr ele = (cmd_ptr) malloc_or_fail(sizeof(cmd_ele), "add_cmd");
    ele->name = name;
    ele->operation = operation;
    ele->documentation = documentation;
    ele->next = ptr;
    
    if (pre)
        pre->next = ele;
    else
        cmd_list = ele;
}

Console section

The technique that it used to deal with the I/O is something called RIO(Robust I/O). RIO is a buffered IO. One of the advantage of buffered I/O is that it can decrease the number of time it calls system call(e.g read, write, etc)

Here is a simple demo that show the difference when we used buffered I/O and unbuffered I/O.

  • buffered version
// buffered.c
# include <stdio.h>

int main(int argc, char *argv[])
{
    /* Buffered input */
    printf("a..");
    printf("b..");
    printf("c..");
    printf("d..");
    fflush(NULL); // or use printf("\n");
    return 0;
}
$ gcc -o buffered buffered.c
$ strace -e write ./buffered

write(1, "a..b..c..d..", 12a..b..c..d..)            = 12
  • non-buffered version
// nonbuffered.c
# include <stdio.h>

int main(int argc, char *argv[])
{
    /* nonbuffered input */
    printf("a..\n");
    printf("b..\n");
    printf("c..\n");
    printf("d..\n");
    return 0;
}
$ gcc -o nonbuffered nonbuffered.c
$ strace -e write ./nonbuffered

write(1, "a..\n", 4a..
)                    = 4
write(1, "b..\n", 4b..
)                    = 4
write(1, "c..\n", 4c..
)                    = 4
write(1, "d..\n", 4d..
)                    = 4

I use strace tool to analyze the system call the program calls. If I don't buffered the input, like nonbuffered.c, there will be lots of I/O system call.

The more system calls, the more context-switch. And in consequence, slow down the program.

Here is the data structure of RIO.

#define RIO_BUFSIZE 8192
typedef struct RIO_ELE rio_t, *rio_ptr;

struct RIO_ELE {
    int fd;                /* File descriptor */
    int cnt;               /* Unread bytes in internal buffer */
    char *bufptr;          /* Next unread byte in internal buffer */
    char buf[RIO_BUFSIZE]; /* Internal buffer */
    rio_ptr prev;          /* Next element in stack */
};

rio_ptr buf_stack;

In the process, all the opened test files will have their own RIO data structure. fd will keep the file descriptor of the file, and buf is the buffer for its file.

And here is how the program utilize this data structure

/* Read command from input file. When hit EOF, close that file and return NULL */ static char *readline() { int cnt; char c; char *lptr = linebuf; if (buf_stack == NULL) return NULL; for (cnt = 0; cnt < RIO_BUFSIZE - 2; cnt++) { if (buf_stack->cnt <= 0) { /* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; if (buf_stack->cnt <= 0) { /* Encountered EOF */ pop_file(); if (cnt > 0) { /* Last line of file did not terminate with newline. */ /* Terminate line & return it */ *lptr++ = '\n'; *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; } else return NULL; } } /* Have text in buffer */ c = *buf_stack->bufptr++; *lptr++ = c; buf_stack->cnt--; if (c == '\n') break; } if (c != '\n') { /* Hit buffer limit. Artificially terminate line */ *lptr++ = '\n'; } *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; }

Basically, this function will contiuously read content from the file until it encounter \n or reach EOF

/* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf;

And then it take out each character in the buffered, and process then.

static char *readline()
{
    int cnt;
    char c;
    char *lptr = linebuf;

    if (buf_stack == NULL)
        return NULL;

    for (cnt = 0; cnt < RIO_BUFSIZE - 2; cnt++) {
        if (buf_stack->cnt <= 0) {
            /* Need to read from input file */
            buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE);
            buf_stack->bufptr = buf_stack->buf;
            
            // skip ...
        }
        /* Have text in buffer */
        c = *buf_stack->bufptr++;
        *lptr++ = c;
        buf_stack->cnt--;
        if (c == '\n')
            break;
    }

    // Skip ...

    return linebuf;
}

After reading a line, we need to interpret the line. The process will then call some functions in order ... -> parse_args -> interpret_cmda -> ....

The parse-args will transform the line into a list of arguments. And send the number of arguments(argc) and the list of arguments(argv) to interpret_cmda for further processing.

Inside interpret_cmda

/* Execute a command that has already been split into arguments */ static bool interpret_cmda(int argc, char *argv[]) { if (argc == 0) return true; /* Try to find matching command */ cmd_ptr next_cmd = cmd_list; bool ok = true; while (next_cmd && strcmp(argv[0], next_cmd->name) != 0) next_cmd = next_cmd->next; if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); } else { report(1, "Unknown command '%s'", argv[0]); record_error(); ok = false; } return ok; }

We can see that it will try to find the function named argv[0] in the cmd_list, and call the function. If the return value or the value of ok is not true, then it will update the global variable err_cnt in record_error().

while (next_cmd && strcmp(argv[0], next_cmd->name) != 0) next_cmd = next_cmd->next; if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); }
void record_error()
{
    err_cnt++;
    if (err_cnt >= err_limit) {
        report(1, "Error limit exceeded.  Stopping command execution");
        quit_flag = true;
    }
}

Then it will check if there is any error(err_cnt should be 0) during the process and return associated value.

bool run_console(char *infile_name)
{
    if (!push_file(infile_name)) {
        report(1, "ERROR: Could not open source file '%s'", infile_name);
        return false;
    }
    while (!cmd_done()) {
        cmd_select(0, NULL, NULL, NULL, NULL);
    }
    return err_cnt == 0;
}

Finally, the process return 1 if there is any error that occurs during the process, and 0 otherwise.

#define BUFSIZE 256

int main(int argc, char *argv[])
{
    /* To hold input file name */
    char buf[BUFSIZE];
    char *infile_name = NULL;
    char lbuf[BUFSIZE];
    char *logfile_name = NULL;
    int level = 4;
    int c;

    // Skip ...

    bool ok = true;
    ok = ok && run_console(infile_name);
    ok = ok && finish_cmd();
    return ok ? 0 : 1;
}

How automatic grading system works

By seeing the Makefile, I can tell that it executes the driver.py script in the scripts/ folder.

test: qtest scripts/driver.py
	scripts/driver.py

Thte Tracer stores the information about each test case, and store the score we can get.

class Tracer: traceDirectory = "./traces" qtest = "./qtest" command = qtest verbLevel = 0 autograde = False useValgrind = False traceDict = { 1 : "trace-01-ops", 2 : "trace-02-ops", 3 : "trace-03-ops", 4 : "trace-04-ops", 5 : "trace-05-ops", 6 : "trace-06-string", 7 : "trace-07-robust", 8 : "trace-08-robust", 9 : "trace-09-robust", 10 : "trace-10-malloc", 11 : "trace-11-malloc", 12 : "trace-12-malloc", 13 : "trace-13-perf", 14 : "trace-14-perf", 15 : "trace-15-perf" } traceProbs = { 1 : "Trace-01", 2 : "Trace-02", 3 : "Trace-03", 4 : "Trace-04", 5 : "Trace-05", 6 : "Trace-06", 7 : "Trace-07", 8 : "Trace-08", 9 : "Trace-09", 10 : "Trace-10", 11 : "Trace-11", 12 : "Trace-12", 13 : "Trace-13", 14 : "Trace-14", 15 : "Trace-15" } maxScores = [0, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7] # ...

This script will try to run each test case and figure out the score we get.

    def run(self, tid = 0):
        
        # Skip ...
        
        for t in tidList:
            tname = self.traceDict[t]
            if self.verbLevel > 0:
                print("+++ TESTING trace %s:" % tname)
            ok = self.runTrace(t)
            maxval = self.maxScores[t]
            tval = maxval if ok else 0
            print("---\t%s\t%d/%d" % (tname, tval, maxval))
            score += tval
            maxscore += maxval
            scoreDict[t] = tval
        print("---\tTOTAL\t\t%d/%d" % (score, maxscore))
        
        # Skip ...
    def runTrace(self, tid):
        # Skip ...
        
        fname = "%s/%s.cmd" % (self.traceDirectory, self.traceDict[tid])
        vname = "%d" % self.verbLevel
        clist = [self.command, self.qtest, "-v", vname, "-f", fname]
        try:
            retcode = subprocess.call(clist)
        except Exception as e:
            print("Call of '%s' failed: %s" % (" ".join(clist), e))
            return False
        return retcode == 0

Things that I'm still working on

  • Learn how to use gdb to trace the code and analyze the behaviour during the process

Inspired by ztex's hackmd

  • Learn valgrind to analyze memory usage

Inspired by Naetw's hackmd