Try   HackMD

2024q1 Homework1 (lab0)

contributed by < YangYeh-PD >

Development Environment

$ gcc --version
gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
    CPU family:          6
    Model:               142
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            10
    CPU(s) scaling MHz:  81%
    CPU max MHz:         3400.0000
    CPU min MHz:         400.0000
    BogoMIPS:            3600.00

Queue implementations

The task is not simply to build APIs for queue operations, but rather to execute the fundamental operations adhering to the specified queue interface. Pay attention to how you express this.

To construct APIs for queue, we first need to know the definitions and how it should be constructed. Consider the definition of the element element_t in a queue

typedef struct {
    char *value;
    struct list_head list;
} element_t;

where we can store a string by the character pointer value, and list_head is defined in header file <list.h>

struct list_head {
    struct list_head *prev;
    struct list_head *next;
};






Doubly Linked List



node1

prev

next



node2

prev

next



node1:c->node2





node2:c->node1





node3

prev

next



node2:c->node3





node3:c->node2





Thus, the implementations of a node are

  1. Declare a pointer of charater to store a string.
  2. Embedding the whole list_head into the structure.

I came up with the wrong queue picture, here is the correct one.ChenYang YehSat, Feb 24, 2024 19:41 PM







queue


cluster node2



cluster queue 2



cluster queue 3



cluster queue 1



cluster node1




node0_2

prev

next



node3_2

prev

next



node0_2:c->node3_2





node0_1

head



node0

prev

next



node0_1:c->node0





node0_3

size



node0_4

id



node1_1

value



node1_2

prev

next



node1_2:c->node0





node2_1

value



node2_2

prev

next



node2_2:c->node0





node3_2:c->node0_2





node4_2

prev

next



node3_2:c->node4_2





node3_1

head



node3_3

size



node3_4

id



node4_2:c->node3_2





node4_1

head



node4_3

size



node4_4

id



node0:c->node1_2





node0:c->node2_2





where the head of the queue is defined as

typedef struct {
    struct list_head *q;
    struct list_head chain;
    int size;
    int id;
} queue_contex_t;
  • q : the pointer of the linked list defined in <lish.h> that is embedded in queue_contex_t.
  • chain : used to chain different queues.
  • size : a integer that stores the size of the queue.
  • id : the identification number.

Here is the list of implementations (queue.c) of queue that support both FIFO and LIFO.

Instead of focusing on the LIFO and FIFO properties, you should explain why the queue_context_t was designed with its specific members in a C structure.

Thus, queue_contex_t not only stores queue basic information, but also connects each independent queues into a chain, which helps us manipulate data.ChenYang YehSun, Feb 25, 2024 03:16 AM

q_new()

Address your motivations and considerations here.

Since we import <lish.h>, we can use the function in the library rather than make the new one, or operate by myself.

Inspired by <csotaku0926>

struct list_head *q_new()
{
+   struct list_head *head = malloc(sizeof(struct list_head));
-   queue_contex_t *queue = malloc(sizeof(queue_contex_t));
+   if (!head) {
        return NULL;
    }
-   queue->q->next = queue->q;
-   queue->q->prev = queue->q;
+   INIT_LIST_HEAD(head);
    return head;
}

q_free()

Address your motivations and considerations here.

Because we need to free all the memories of the queue, we need to know the pointer of each node, not only the pointer to the structure member list_head.
The macro container_of() is defined in <lish.h> which calculates address of structure that contains address pointer. Using macro list_for_each_entry_safe() to make code less complicated.
Since the member of struct list_head *p is a pointer itself, we need to pass the indirect pointer &head which represent the address of the pointer head.

    struct list_head *head = l;
+   l = l->next;  
-   l = l->next;  
-   while (l->next != head) {
-       free(container_of(l->next, element_t, list));
-       l = l->next;
+       q_release_element(container_of(l->prev, element_t, list));
-       q_release_element(container_of(l->prev, element_t, list));
-   }
-   free(container_of(head, queue_contex_t, chain); 
+   free(container_of(&head, queue_contex_t, q));
-   free(container_of(&head, queue_contex_t, q));
+   l = l->next;
-   l = l->next;
+   q_release_element(container_of(l->prev, element_t, list));
-   q_release_element(container_of(l->prev, element_t, list));
+   if (!l) {
+       return;
+   }
+   element_t *curr, *next;
+   list_for_each_entry_safe (curr, next, l, lsit) {
+       list_del(&curr->list);
+       q_release_element(curr);
+   }
+   free(l);
+   return;

q_insert_head()

Address your motivations and considerations here.

We can incorporate whether head == NULL or the failure memory allocation in the same if statement. And again, we use the API list_add() instead of adding by myself.

Well, actually we cannot, since it will cause memory leak when first allocating memory to node and when head == NULLChenYang YehSat, Mar 2, 2024 12:50 PM

We need to allocated memory and copy the string into s and cannot directly assign the pointer to value. Note that we store strlen(s) into len to reduce the function call strlen().
If node is not added into the list successfully, we need to free the space allocated by node, or it will again cause memory leak.

bool q_insert_head(struct list_head *head, char *s)
{
+   if (!head) {
+       return false;
+   }
    element_t *node = malloc(sizeof(element_t));
-   if (!head || !node) {
+   if (!node)
        return false;
    }
-   (node->list).next = head->next;
-   (node->list).prev = head;
-   head->next = &(node->list);
-   ((node->list).next)->prev = &(node->list);
-   node->value = s;
+   size_t len = strlen(s);
+   node->value = malloc((len + 1) * sizeof(char));
+   if(!(node->value)) {
-      q_release_element(node);
+       free(node);
+       return false;
+   }
-   memcpy(node->value, s, strlen(s));
-   *(node->value + strlen(s)) = '\0';
+   list_add(&node->list, head);
+   if (!node->list.next || !node->list.prev) {
+       q_release_element(node);
+       return false;
+   }
+   strncpy(node->value, s, len + 1);
    return true;
}

Inspired by <marvin0102>

Since the member `list` in `element_t` is ==not a pointer==, we have to use **dot (.) operator** to select the member.
  • line 49 and 50 add the node into the queue by modifying its prev and next
  • line 51 and 52 modify next of the head and prev of the next node to the inserted node.

Given that the preceding is merely a brief segment of code, it's not necessary to indicate specific line numbers for reference. You may simply emphasize the code of interest by incorporating it directly into the ongoing discussion for clarity.

Always write meaningful and informative technical reports with great fluency.

To ensure this report is both informative and flows well, it's important to start by outlining the motivation, considerations, and examples for the suggested functions before compiling a list of code snippets.

Guide to Technical Report Writing

Thanks!ChenYang YehSat, Feb 24, 2024 08:10 AM

q_remove_head()

According to the definition in ISO/IEC 9899 §7.21.2.1, memcpy() only copies n characters, which means we still need to add \0 at the end of the string.

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to s1. If copying takes place between objects that overlap, the behavior is undefined.

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
-   if(!head || !(head->next)) {
+   if(!head || list_empty(head)) {
        return NULL;
    }
-   element_t *del_node = container_of(head->next, element_t, list);
+   element_t *del_node = list_entry(head->next, element_t, list);
+   char *del_str = del_node->value;    
    list_del_init(head->next);
    if(sp) {
+       size_t del_len = strlen(del_str);
+       size_t len = (del_len > (bufsize - 1)) ? bufsize - 1 : del_len;
-       memcpy(sp, del_str, len);
+       strncpy(sp, del_str, len);
        *(sp + len) = '\0';
    }
    return del_node;
}

But why don't we use strcpy() instead of memcpy()?
In ISO/IEC 9899 §7.21.2.4, strcpy() copies the string s2 into s1 without checking whether the memory allocated in s1 is enough, which may cause undefined behavior.

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1.

In §6.5.10 footnote 92)

If prior invalid pointer operations (such as accesses outside array bounds), produced undefined behavior.

However, it still cannot pass make test normally.

ERROR: Removed value dolphinU��� != expected value dolphin
ERROR: Removed value bearU��� != expected value bear

I don't know why I should add -5 behind of the strlen(), but it works.
ChenYang YehSun, Feb 25, 2024 00:48 AM

Remove the hacky code and utilize the tools to resolve memory issues.

The problem is solved. It raised from memcpy() in insert_head() and insert_tail(), because I forgot to add '\0' at the end of the string.
ChenYang YehTue, Feb 27, 2024 06:19 AM

q_delete_mid()

I figured out three different ways for implementation.

  1. We defined fast and slow pointers. The fast pointer moving 2 nodes at once, while the slow one moving 1 node only.






Circular Linked List



node1

1



node2

2



node1->node2





node2->node1





node3

3



node2->node3





node3->node2





node4

4



node3->node4





node4->node3





node5

5



node4->node5





node5->node4





node0__
fast




node0__->node1__






node1__->node2__





node0_
slow




node0_->node1_





Eventually, once the fast pointer come back to the head, the slow pointer would point to the middle point. But what is the time complexity?
Since the fast pointer traverse 2 nodes and the slow one traverse a node each time, the time complexity would be

O(n)+O(n2)=O(32n).
2. We also can use q_size() to get the total size of queue, then divided it by 2 and traverse to the destination point. Since q_size() takes
O(n)
time, and traversing takes
O(n2)
, the time complexity would also be
O(n)+O(n2)=O(32n).

3. Last but not least, we can make best use of the property of circular doubly linked list. We define left and right pointers, then traverse from head in different ways until they meet, then the meeting point will definitely be the midpoint of the queue.
(Since both left and right pointers have traversed half of the queue.)







%0


cluster node4



cluster node2



cluster node1




node3_1

value



node3_2

prev

next



node0

prev

next



node3_2:c->node0





node0:c->node3_2





node2_2

prev

next



node0:c->node2_2





node2_1

value



node2_2:c->node0





node1_2

prev

next



node2_2:c->node1_2





node1_1

value



node1_2:c->node2_2





node0__
right




node0__->node1__





node0_
left





node1_->node0_





bool q_delete_mid(struct list_head *head)
{
    // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
    if (!head || list_empty(head)) {
        return false;
    }
    struct list_head *right = head->next;
    struct list_head *left = head->prev;
    while (!((left == right) || ((left->prev) == right))) {
        right = right->next;
        left = left->prev;
    }
    list_del(left);
    q_release_element(container_of(left, element_t, list));
    return true;
}

Enforce the consistent style when you are going to list the code snip.

Since both left and right pointers take only

O(n2)

the time complexity would be

O(n2)+O(n2)=O(n).

Check this note and identify the performance concerns by measurements.

Done!ChenYang YehThu, Feb 29, 2024 07:32 AM

The operational mechanism behind the processor

In programs with good temporal locality, a memory location referenced once may be referred to multiple times shortly thereafter. In programs with good spatial locality, if a memory location is referenced, the program is likely to reference nearby memory locations in subsequent operations.

From the perspective of spatial locality, the storage locations of nodes are not regular in the linked list, meaning each node is scattered in memory, resulting in poor spatial locality. Therefore, it can be inferred that the spatial locality performance of all algorithms above is the same.

From the view of temporal locality, in the "fast and slow pointer" approach, before the loop concludes and after the fast pointer has completed its traversal of the first half of the linked list, the slow pointer has accessed the same node. Thus time interval between two accesses of the same node is relatively shorter. However, in "single pointer" approach, the second access to the same node occurs after the completion of the first traversal. The time interval between two accesses of the same node is relatively longer.

Method Temporal Locality Spatial Locality
Single pointer shorter same
Fast and slow pointer longer same

Thus, "Fast and slow pointer" is more cache friendly than single pointer, since it does not frequently trigger cache eviction by changing the contents in memory and reduces the occurrence of cache misses.

TODO: Use perf to show preliminary measurements.

Test the cache performance of both algorithms by perf

Consider a singly linked list defined as

typedef struct list {
    int value;
    unsigned long buffer1, buffer2, buffer3;
    struct list *next;
} list

TODO: Increase the number of members in the structure above to render cache misses more expensive and enable straightforward comparisons between different approaches.

After we construct the list, we can find the midpoint by both single or fast-slow pointer as we have mentioned above. The followings are c implementations respectively.
Single porinter

list *list_mid(list *head)
{
    int size = 0;
    list *traverse = head;
    while (traverse) {
        size++;
        traverse = traverse->next;
    }
    traverse = head;
    for (int i = 0; i < size/2; i++) {
        traverse = traverse->next;
     }
     return traverse;
}

Fast-slow pointer

list *list_mid(list *head) 
{
    list *slow = head;
    list *fast = slow->next;
    
    while (fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    }
    return slow->next;
}

Here are the results of cache reference and cache miss obtained through perf testing with different input sizes.

Cache Reference
Cache Miss

In most cases, the version employing the fast-slow pointer tends to have fewer cache references and cache misses compared to the single pointer version.

q_delete_dup()

We first define two pointers to element_t, reference and traverse, where reference = head->next and traverse = reference->next at the beginning, which means traverse is two steps ahead of the head.







Circular Linked List



head_

prev

next



node1

1



head_:c->node1





node1->head_





node2

1



node1->node2





node2->node1





node3

1



node2->node3





node3->node2





node4

1



node3->node4





node4->node3





node5

2



node4->node5





node5->node4





head
head



head->head_





reference
reference



reference->node1





traverse
traverse



traverse->node2





If traverse == reference, then traverse will keep traversing until it reach the node where traverse != reference.







Circular Linked List



head_

prev

next



node1

1



head_:c->node1





node1->head_





node2

1



node1->node2





node2->node1





node3

1



node2->node3





node3->node2





node4

1



node3->node4





node4->node3





node5

2



node4->node5





node5->node4





head
head



head->head_





reference
reference



reference->node1





traverse
traverse



traverse->node5





Then, reference will move ahead and delete the previous node until it reach the traverse node.







Circular Linked List



head_

prev

next



node1

2



head_:c->node1





node1->head_





head
head



head->head_





reference
reference



reference->node1





traverse
traverse



traverse->node1





bool q_delete_dup(struct list_head *head)
{
    if (!head || list_empty(head)) {
        return false;
    }
+   struct list_head *reference = head->next, *traverse, *delete;
-   struct list_head *traverse;
    while ((reference->next != head) && (reference != head)) {
        traverse = reference->next;
        char *str_ref = container_of(reference, element_t, list)->value;
        char *str_tra = container_of(traverse, element_t, list)->value;
        if (!strcmp(str_ref, str_tra)) {
            while (!(traverse == head) && !strcmp(str_ref, str_tra)) {
                traverse = traverse->next;
+               str_tra = container_of(traverse, element_t, list)->value;
            }
            while (reference != traverse) {
+               delete = reference;
                reference = reference->next;
-               list_del(reference->prev);
+               list_del(delete);
-               q_release_element(container_of(reference->prev, element_t, list));
+               q_release_element(container_of(delete, element_t, list));
            }
        } else {
            reference = traverse;
        }
    }

But there is a error message in test trace 06.

ERROR: Attempted to free unallocated block. Address = 0x5555555555555555

This may result from deleting the head node by q_release_element. But I have checked my code and ensured that such the behaviors is avoided. So I still cannot figure it out.

The question has been resolved. There have two problems in the original code.

  1. I forget to renew str_tra when traverse to the next node.
  2. Because I delete the previous node of the reference node, if the deleted node is coincidentally the first node of the queue, then we call the function
    q_release_element(container_of(reference->prev, element_t, list))
    is equivalent to
    q_release_element(container_of(head, element_t, list))
    then the segmentation fault occured. Thus, we need the pointer delete to store the deleted node, then free it.
    ChenYang YehThu, Feb 29, 2024 07:02 AM

q_reverseK()

This took me very long time to deal with it. I devided this question into three different parts.

  1. Check if the number of node is
    Nk
    .
    It is reletively simple, we use HEAD to record the starting point, and TAIL to traverse the queue to ensure that the queue is long enough for reverse. To avoid the special case of the first node, we declare TAIL as the indirect pointer to the next;
struct list_head **TAIL = &(head->next);






Circular Linked List


cluster node2



cluster node1



cluster node3




node0

prev

next



node1_2

prev

next



node0:c->node1_2





node1_1

3



node1_2:c->node0





node2_2

prev

next



node1_2:c->node2_2





node2_1

5



node2_2:c->node1_2





node3_2

prev

next



node2_2:c->node3_2





node3_1

8



node3_2:c->node2_2





head
head



head->node0





HEAD
HEAD



HEAD->node1_1





TAIL
TAIL



TAIL->node2_2:next





        // Check if the number of nodes is < k
        struct list_head *HEAD = *TAIL;
        for (int count = 0; count < k; count++) {
            if (*TAIL == head) {
                return;
            }
            TAIL = &(*TAIL)->next;
        }

  1. Delete the node from list and store it to stack.
    We use HEAD declare before to delete each node and add it into the stack we just declared.






Circular Linked List


cluster node1



cluster node2




node0

prev

next



node1_2

prev

next



node0:c->node1_2





node1_1

5



node1_2:c->node0





node2_2

prev

next



node1_2:c->node2_2





node2_1

3



node2_2:c->node1_2





stack
stack



stack->node0











Circular Linked List


cluster node3




node0

prev

next



node3_2

prev

next



node0:c->node3_2





node3_1

8



node3_2:c->node0





head
head



head->node0





HEAD
HEAD



HEAD->node3_1





        // Delete the nodes from the list
        struct list_head stack;
        INIT_LIST_HEAD(&stack);
        for (int count = 0; count < k; count++) {
            struct list_head *delete = HEAD;
            HEAD = HEAD->next;
            list_del(delete);
            list_add(delete, &stack);
        }

  1. Last but not least, we use list_splice() to concatenate the list together.
    (It will be automatically in reverse order because of the LIFO property of stack.)






Circular Linked List


cluster node3



cluster node1



cluster node2




node0

prev

next



node1_2

prev

next



node0:c->node1_2





node1_1

5



node1_2:c->node0





node2_2

prev

next



node1_2:c->node2_2





node2_1

3



node2_2:c->node1_2





node3_2

prev

next



node2_2:c->node3_2





node3_1

8



node3_2:c->node2_2





head
head



head->node0





HEAD
HEAD



HEAD->node3_1





TAIL
TAIL



TAIL->node2_2:next





        // Insert the list in reverse order
        list_splice(&stack, HEAD->prev);
        TAIL = &HEAD->prev->next;

Can you shorten your code by facilitating List APIs?

I can't. Since I don't want to iterate the whole queue, so I cannot use such a macro of list_for_each_entry(_safe). I need to write the loop by myself.ChenYang YehSat, Mar 2, 2024 13:36 PM

q_sort()

I use classic merge sort as implementation.

  1. Find the middle point of the list
    We can use the algorithm in q_delete_mid() to find the midpoint.
    struct list_head *right = head->next;
    struct list_head *left = head->prev;
    while (!((left == right) || ((left->prev) == right))) {
        right = right->next;
        left = left->prev;
    }
  1. Break into 2 lists (head and stack)
    After we find the midpoint of the list, we need to divided the list into 2 parts, which is called head and stack in my code.
    struct list_head stack;
    INIT_LIST_HEAD(&stack);
    for (struct list_head *traverse = head->next, *safe = traverse->next;
        traverse != left; traverse = safe, safe = safe->next) {
        list_del(traverse);
        list_add_tail(traverse, &stack);
    }

    q_sort(head, descend);
    q_sort(&stack, descend);
  1. Merge two lists
    Last but not least, we merge two lists into head in ascending or decending order.
    struct list_head *head_tra = head->next, *stack_del;
    while (head_tra != head && !list_empty(&stack)) {
        char *str_head = list_entry(head_tra, element_t, list)->value;
        char *str_stack = list_entry(stack.next, element_t, list)->value;
        if (descend ? (strcmp(str_head, str_stack) > 0)
                    : (strcmp(str_head, str_stack) < 0)) {
            head_tra = head_tra->next;
        } else {
            stack_del = stack.next;
            list_del(stack_del);
            list_add(stack_del, head_tra->prev);
        }
    }
    while (!list_empty(&stack)) {
        stack_del = stack.next;
        list_del(stack_del);
        list_add(stack_del, head_tra->prev);
    }

q_merge()

I merge all of queue by the merging technique above. However, since I merge every queue into the first one in order, so the size of queues at the end of the algorithm may be quite unbalanced.







queue


cluster queue 5



cluster queue 3



cluster queue 2



cluster queue 1



cluster queue 6



cluster queue 4




node1_2

prev

next



node2_2

prev

next



node1_2:c->node2_2





node1_1

head



node4_1

head



node1_1->node4_1





node1_3

size



node1_4

queue 1



node2_2:c->node1_2





node3_2

prev

next



node2_2:c->node3_2





node2_1

head



node2_3

size



node2_4

queue 2



node3_2:c->node2_2





node3_1

head



node3_3

size



node3_4

queue 3



node4_2

prev

next



node5_2

prev

next



node4_2:c->node5_2





node6_1

head



node4_1->node6_1





node4_3

size



node4_4

queue 1



node5_2:c->node4_2





node5_1

head



node5_3

size



node5_4

queue 3



node6_2

prev

next



node6_3

size



node6_4

queue 1



    queue_contex_t *first = list_entry(head->next, queue_contex_t, chain);
    if (head->next->next == head) {
        return q_size(first->q);
    }
    queue_contex_t *traverse =
        list_entry((first->chain).next, queue_contex_t, chain);
    while (&traverse->chain != head) {
        struct list_head *first_tra = first->q->next;
        while (!list_empty(traverse->q) && first_tra != first->q) {
            char *first_str = list_entry(first_tra, element_t, list)->value;
            char *traverse_str =
                list_entry(traverse->q->next, element_t, list)->value;
            if (descend ? (strcmp(first_str, traverse_str) > 0)
                        : (strcmp(first_str, traverse_str) < 0)) {
                first_tra = first_tra->next;
            } else {
                struct list_head *delete = traverse->q->next;
                list_del(delete);
                list_add(delete, first_tra->prev);
            }
        }
        while (!list_empty(traverse->q)) {
            struct list_head *delete = traverse->q->next;
            list_del(delete);
            list_add(delete, first_tra->prev);
        }
        traverse = list_entry((traverse->chain).next, queue_contex_t, chain);
    }
    return q_size(first->q);

Fisher-Yates Shuffle

The Fisher-Yates Shuffle, also known as the Knuth Shuffle, is an algorithm used for randomly shuffling the elements of an array or a list. It was first proposed by Ronald Fisher and Frank Yates in 1938 and later popularized by Donald Knuth in his book "The Art of Computer Programming."

The basic idea of the Fisher-Yates Shuffle is to iterate through the array from the first element to the last, moving each element with a randomly selected element to the tail. This process continues until the entire array is shuffled.







shuffle



node6_1

1



node6_3

3



node6_1->node6_3





node6_2

2



node6_5

5



node6_3->node6_5





node6_4

4



node6_4->node6_1





node6_5->node6_2





tail5

tail



node5_2

2



tail5->node5_2





node5_1

1



node5_3

3



node5_1->node5_3





node5_4

4



node5_2->node5_4





node5_5

5



node5_3->node5_5





node5_4->node5_1





tail4

tail



node4_5

5



tail4->node4_5





node4_1

1



node4_3

3



node4_1->node4_3





node4_2

2



node4_2->node4_5





node4_4

4



node4_4->node4_1





node4_5->node4_4





tail3

tail



node3_5

5



tail3->node3_5





node3_1

1



node3_2

2



node3_3

3



node3_2->node3_3





node3_3->node3_5





node3_4

4



node3_4->node3_1





node3_5->node3_4





tail2

tail



node2_5

5



tail2->node2_5





node2_1

1



node2_2

2



node2_1->node2_2





node2_3

3



node2_2->node2_3





node2_3->node2_5





node2_4

4



node2_5->node2_4





tail1

tail



node1_5

5



tail1->node1_5





node1_1

1



node1_2

2



node1_1->node1_2





node1_3

3



node1_2->node1_3





node1_4

4



node1_3->node1_4





node1_4->node1_5





q_shuffle()

I added the shuffle command in qtest.c and implemented q_shuffle() using the Fisher-Yates method in queue.c.

bool q_shuffle(struct list_head *head) 
{
    if (!head || list_empty(head)) {
        return false;
    } 
  
    /* Set seed */
    srand(time(NULL));
    int len = q_size(head);

    while (len > 0) {
        /* Select a random entry */
        int number = rand() % len;
        struct list_head *traverse = head;
        for (; number >= 0; number--) {
            traverse = traverse->next;
        }
         
        /* Move it to the tail */
        list_del(traverse);
        list_add_tail(traverse, head);
        len--;
        }
        return true;
    }       

and here is the testing result in qtest.

cmd> sort
l = [alpha bravo charlie delta echo foxtrot golf hotel]
cmd> shuffle
l = [charlie alpha hotel golf bravo echo foxtrot delta]

Testing the Randomness of Shuffling

To test the randomness of shuffling, I designed an experiment for the Fisher-Yates Shuffle.
Using the Fisher-Yates Shuffle, I changed the order of the string "abc".

/* The number of {abc, acb, bac, cab, bca, cba} */
int str_count[] = {0, 0, 0, 0, 0, 0};

After shuffling, I calculated the occurrences of the 6 possible different permutations. The experiment was repeated 90,000 times.

int main(int argc, char *argv[]) 
{
    /* Initialize str to "abc" */
    char str[3 + 1];
    strcpy(str, "abc");
    str[4 - 1] = '\0';

    /* Set Seed */
    srand(time(NULL));

    /* Test 100000 times */
    for (int i = 0; i < 90000; i++) {
        str_shuffle(str);

        /* Add it to the counter */
        if (!strcmp(str, "abc")) {
            str_count[0]++;
        } else if (!strcmp(str, "acb")) {
            str_count[1]++;
        } else if (!strcmp(str, "bac")) {
            str_count[2]++;
        } else if (!strcmp(str, "cab")) {
            str_count[3]++;
        } else if (!strcmp(str, "bca")) {
            str_count[4]++;
        } else {
            str_count[5]++;
        }

        /* reset the string */
        strcpy(str, "abc");
        str[4 - 1] = '\0';            
    }

    /* Print the counter */
    char *str_const[] = {"abc", "acb", "bac", "cab", "bca", "cba"};
    for (int i = 0; i < 6; i++) {
        printf("# of %s : %d\n", str_const[i], str_count[i]);
    }
     
    return 0;
}

The implementation of shuffle() is

bool str_shuffle(char *str) {
  if (!str) {
    return false;
  }

  int len = strlen(str);

  while (len > 0) {
    /* Select a random character */
    int random = rand() % len;
    char final = str[random];

    /* Move the character to the tail */
    for (char traverse = str[random + 1]; traverse != '\0';
         ++random, traverse = str[random + 1]) {
      str[random] = traverse;
    }
    str[random] = final;
    len--;
  }
  return true;
}

and here is the result

# of abc : 14962
# of acb : 14999
# of bac : 14958
# of cab : 15090
# of bca : 15019
# of cba : 14972

shuffle

Chi-Square Test

In statistics, after shuffling, we hope that the probability of each permutation appearing is equal.
That is, if we denote by

Xi a random variable representing any possible permutation in
i
th experiment, then
XiiidU(α,β).

where
U(α,β)
is uniform distribution. (
α
= "abc",
β
= "cba")

We use hypothesis testing to verify whether if

Xi is uniformly distributed or not. Assume that

{H0:P(Xi)=16,i=1,2,3,4,5,6Ha:at least one P(Xi) unequal.

If the hypothesis

H0 is true (cannot be rejected), then the random variable
Qk1=1kXinpi0npi0

has an approximate chi-square distribution with
k1
degrees of freedom.

Reference: Introduction to Mathematical Statistics by Hogg, McKean and Craig 8/e P.286

Setting the significance value

α=0.05, and
npi0=nP(Ai)=9000016=15000
, then
Q61=Q5=(1496215000)215000+(1499915000)215000+(1495815000)215000+(1509015000)215000+(1501915000)215000+(1497215000)2150000.830

Chi-square Distribution

Since

χ(X0.95,r=5)11.070, and
Q5=0.830<11.070=χ(0.95,5)
, we cannot reject
H0
and hence pass chi-square test.

Shannon Entropy

We expect that a sorted array or a list, after shuffling, will exhibit no discernible pattern in its arrangement, indicating a sufficiently high level of randomness (or maximum entropy).
We first show that if

XiiidU(α,β), then the Shannon Entropy reaches at maximum. Then we check whether the entropy of Fisher-Yates Shuffle is match the one of uniform distribution.

The Shannon Entropy is defined as

S=ikpilogpi=P(xi)logP(xi)dx.

Here,

logpi=lnpi=logepi.

We first let

ϕ(x)=xlogx, then
S(p)=ikϕ(pi),i=1,2,...k
. Since
d2ϕ(x)dx2=ddx[logx+1]=1x>0,

by Jensen's Inequality, for each convex function,
ϕ(pik)ϕ(pi)kϕ(pik)S(p)kS(p)kϕ(pik)=k(pik)log(pik).

Since
pi=1
, it follows that
S(p)log(1k)=logk, where kconstant.

And note that in uniform distribution,
S=ikpilogpi=1kiklog(1k)=log(1k)=logk,

Therefore, under uniform distribution, the entropy is at its maximum value.

Reference:
Proving that Shannon Entropy is maximal for the uniform distribution using convexity.

In our experiment, since there are six possible string permutations, the probability of each string appearing under a uniform distribution is

P(Xi)=16, so the ideal entropy is
Si=i=1616log16=log61.791759

The experimental entropy is
Se=[1496290000log(1496290000)+1499990000log(1499990000)+1495890000log(1495890000)][1509090000log(1509090000)+1501990000log(1501990000)+1497290000log(1497290000)]=0.895166+8965901.791756

Which is very close to
Si
.


Tic-tac-toe Incorporation

Commands

To implement ttt command into qtest, I have done some updates and modifications in the branch of project and qtest.c respectively, including

Add ADD_COMMAND() in console_init().

static void console_init()
{
    ...
+   ADD_COMMAND(ttt, "Activate tic-tac-toe", "");
    ...
}

ADD_COMMAND() is a macro which defined in console.h.

void add_cmd(char *name, cmd_func_t operation, char *summary, char *parameter);
#define ADD_COMMAND(cmd, msg, param) add_cmd(#cmd, do_##cmd, msg, param)

Obviously, add_cmd() helps us add commands into qtest.

Create do_ttt() function.

static bool do_ttt();

Just copy the main function into do_ttt, then add functions definition of

  • record_move()
  • print_moves()
  • get_input()
    above do_ttt(), or it will cause implicit function definition.

Add all of relevant files
Here is the structure of my current project.

$ tree
.
├── console.c
├── console.h
├── CONTRIBUTING.md
├── dudect
│   ├── constant.c
│   ...
├── game.c
├── game.h
...
├── Makefile
├── mcts.c
├── mcts.h
├── mt19937-64.c
├── mt19937-64.h
├── negamax.c
├── negamax.h
├── qtest
├── qtest.c
...
├── scripts
│   ├── aspell-pws
│   ...
├── shannon_entropy.c
├── traces
│   ├── trace-01-ops.cmd
│   ...
├── util.h
├── web.c
├── web.h
├── zobrist.c
├── zobrist.h

I tried to include mcts.[ch], negamax.[ch] and util.h into my own created directory agents, however, it didn't work with makefile, as now I save them in the same hierarchy.ChenYang YehFri, Mar 8, 2024 3:17 AM

Modify Makefile
Since all of files dependency and compile message has been constructed in original Makefile, all we need to do is add .o file in OBJS.

OBJS := qtest.o report.o console.o harness.o queue.o \
    ...
+   zobrist.o negamax.o mcts.o mt19937-64.o game.o \