Assignemnt 1:RISC-V Assembly and Instruction Pipeline

contribute By<chihenliu>

Introduction

linkedlist

"linked list" is a common data structure

As shown in the two diagrams above, the concept is to use nodes to record, represent, and store data. Each node has three components: Data, Pointer, and Address. Additionally, each node's pointer points to the address of the next node, continuing until it points to Null, signifying the end of this simple linked list. The time complexity is O(N)

Count leading zero

To calculate the number of consecutive zeros, counting from the Most Significant Bit (MSB) towards the right, until the first encountered '1' in a binary number
Ex: 0000000000000010 =14

Motivation

Before taking this course, I had no prior knowledge of data structures. Linked lists were the first data structure I learned about. Therefore, I wanted to try implementing the 32-bits Count Leading Zeros operation in RISC-V to further understand it。

Implement


C code

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>  // using malloc functions

// def linkedlist structure
typedef struct Node {
    uint32_t data; // 32 bits unsigned integers
    struct Node* next;
} Node;

// calculate 32bits  unsigned int count of leading zeros
uint16_t count_leading_zeros(uint32_t x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    /* count ones (population count) */
    x -= ((x >> 1) & 0x55555555);
    x = ((x >> 2) & 0x33333333) + (x & 0x33333333);
    x = ((x >> 4) + x) & 0x0f0f0f0f;
    x += (x >> 8);
    x += (x >> 16);

    return (32 - (x & 0x1f)); // 32 bits unsigned int leading zeors
}

// calculate all linkedlists node clz and sum
uint64_t sum_of_leading_zeros(Node* head)
{
    uint64_t sum = 0;

    while (head != NULL) {
        uint16_t leadingZeros = count_leading_zeros(head->data);
        sum += leadingZeros;
        head = head->next;
    }

    return sum;
}

int main()
{
    // create a simple linked list
    Node* head = NULL;
    Node* node1 = malloc(sizeof(Node));
    node1->data = 23;
    node1->next = NULL;
    head = node1;

    Node* node2 = malloc(sizeof(Node));
    node2->data = 15;
    node2->next = NULL;
    node1->next = node2;

    Node* node3 = malloc(sizeof(Node));
    node3->data = 1
    ;
    node3->next = NULL;
    node2->next = node3;

    // calculate sum of linkedlist node leading zeors for 32bits unsigned integers
    uint32_t totalLeadingZeros = sum_of_leading_zeros(head);
    printf("Sum of Leading Zeros: %llu\n", totalLeadingZeros);

    // release linked list node memory
    while (head != NULL) {
        Node* temp = head;
        head = head->next;
        free(temp);
    }

    return 0;
}

First, I defined the structure of a linked list and created a simple linked list with three nodes. Then, I used a 32-bit CLZ (Count Leading Zeros) function to calculate the total sum of leading zeros for the three nodes in the linked list.

case1:
Input list_1:23、15、8
OutPut1:

case2:
Input list_2:10、32、56
Output2:

case3:
Input list_3:89、125、256
Output3:

Table

Input_1 Input_2 Input_3
23、15、8 10、32、56 89、125、256
Output_1 Output_2 Output_3
83 80 73

Assembly Code(RISC-V)

The following is the RISC-V implementation of 32-bits Count Leading Zeros.
I have implemented it as a function.

clz:
addi sp,sp,-16
sw ra,0(sp)
sw s0,4(sp)   
sw s1,8(sp)      #Prologue
sw s2,12(sp)

#s0 is x
add s0,x0,a0

#x|=(x>>1)
srli t0,s0,1
or s0,s0,t0

#x|=(x>>2)
srli t0,s0,2
or s0,s0,t0

#x|=(x>>4)
srli t0,s0,4
or s0,s0,t0

#x|=(x>>8)
srli t0,s0,8
or s0,s0,t0

#x|=(x>>16)
srli t0,s0,16
or s0,s0,t0

#x -= ((x>>1) & 0x55555555)
li t1,0x55555555
srli t0,s0,1
and t0,t0,t1
sub s0,s0,t0


#x = ((x>>2) & 0x33333333)+(x &0x33333333)
li t1,0x33333333
srli t0,s0,2
and t0,t0,t1
and t1,s0,t1
add s0,t0,t1

#x = ((x>>2) +4)&0x0f0f0f0f
srli t0,s0,4
add t0,t0,s0
li t1,0x0f0f0f0f
and s0,t0,t1

#x += (x>>8) 
srli t0,s0,8
add s0,t0,s0

#x += (x>>16) 
srli t0,s0,16
add s0,t0,s0

#(32-(x&0x1f))
li a0,32
andi t0,s0,0x1f
sub a0,a0,t0

lw ra,0(sp)
lw s0,4(sp)      #Epiologue
lw s1,8(sp)
lw s2,12(sp)
addi sp,sp,16

jr ra
    

This function calculates the sum of leading zeros obtained from the CLZ (Count Leading Zeros) operation on three nodes.

sum_clz_zeros:
  addi sp, sp, -16
  sw ra, 0(sp)
  sw s0, 4(sp)      #Prologue
  sw s1, 8(sp)
  sw s2, 12(sp)

  li s0, 0
    
  la s1, list  #load address for list

 loop:
  lw t0, 0(s1) 
  beqz t0,done
  mv a0, t0
  mv t2,a0         
  jal ra, clz       
  add s0, s0, a0
  addi s1,s1,4
  j loop
 

 
 done:
    mv a0,s0    #s0->a0
    lw ra,0(sp)
    lw s0,4(sp)
    lw s1,8(sp)   #Epiologue
    lw s2,12(sp)
    addi sp,sp,16
    jr ra
 

Full RISC-V code

.data
     list:.word 10,32,56
 .text    

# this is 32 unsign int clz computation for three node for list 
main:

  #load address for list 
  la  s0,list
  
  lw  a0,0(s0)   
  call clz
  jal ra, print_result
  
  

  lw a0,4(s0)
  call clz
  jal ra, print_result

  lw a0,8(s0)
  call clz
  jal ra, print_result

  #calculate Sum of three node clz leading zeros
  call sum_clz_zeros
  jal ra,print_result
  
  j exit_program   
     
print_result:


li a7,1
ecall
jr ra



clz:
addi sp,sp,-16
sw ra,0(sp)
sw s0,4(sp)   
sw s1,8(sp)      #Prologue
sw s2,12(sp)

#s0 is x
add s0,x0,a0

#x|=(x>>1)
srli t0,s0,1
or s0,s0,t0

#x|=(x>>2)
srli t0,s0,2
or s0,s0,t0

#x|=(x>>4)
srli t0,s0,4
or s0,s0,t0

#x|=(x>>8)
srli t0,s0,8
or s0,s0,t0

#x|=(x>>16)
srli t0,s0,16
or s0,s0,t0

#x -= ((x>>1) & 0x55555555)
li t1,0x55555555
srli t0,s0,1
and t0,t0,t1
sub s0,s0,t0


#x = ((x>>2) & 0x33333333)+(x &0x33333333)
li t1,0x33333333
srli t0,s0,2
and t0,t0,t1
and t1,s0,t1
add s0,t0,t1

#x = ((x>>2) +4)&0x0f0f0f0f
srli t0,s0,4
add t0,t0,s0
li t1,0x0f0f0f0f
and s0,t0,t1

#x += (x>>8) 
srli t0,s0,8
add s0,t0,s0

#x += (x>>16) 
srli t0,s0,16
add s0,t0,s0

#(32-(x&0x1f))
li a0,32
andi t0,s0,0x1f
sub a0,a0,t0





lw ra,0(sp)
lw s0,4(sp)      #Epiologue
lw s1,8(sp)
lw s2,12(sp)
addi sp,sp,16

jr ra

sum_clz_zeros:
  addi sp, sp, -16
  sw ra, 0(sp)
  sw s0, 4(sp)      #Prologue
  sw s1, 8(sp)
  sw s2, 12(sp)

  li s0, 0
    
  la s1, list  #load address for list

 loop:
  lw t0, 0(s1) 
  beqz t0,done
  mv a0, t0
  mv t2,a0         
  jal ra, clz       
  add s0, s0, a0
  addi s1,s1,4
  j loop
 

 
 done:
    mv a0,s0    #s0->a0
    lw ra,0(sp)
    lw s0,4(sp)
    lw s1,8(sp)   #Epiologue
    lw s2,12(sp)
    addi sp,sp,16
    jr ra
 



exit_program:
    la s0,list
    li t0,0
    loop_free:
        lw t1,0(s0)
        beqz t1,done_free
        lw t2,4(t1)
        sw zero,0(t1)
        mv s0,t2
        j loop_free
    done_free:      
    li a7,10
    ecall
    

For each test case, I will check the count of leading zeros obtained from the CLZ (Count Leading Zeros) operation for each node, and then sum them up

OutPut_case1 OutPut_case2 OutPut_case3
83 80 73

5-stage Pipeline Analysis

5-stage pipeline generated by Ripes


The above is my analysis of the pipeline within the main label

IF stage

  • This instruction, jalr x1, x1, 72 sets the PC (Program Counter) value to (x1 + 72) and stores the address of the next instruction in the x1 register. This operation is typically used for function calls or branching

  • Program Counter is 0x00000014, which refers to the next instruction address

  • The jalr instruction is I-type instruction

  • This instruction by RISC-V GreenCard table is R[rd]=PC+4;PC=R[rs1]+imm LSB in jalr is set to zero and jalr instruction

IMM opecode Funct3
Imm[11:0] 1100111 000
  • PC should be IR +4 if no bracnching occured

ID stage

  • This is an "auipc" instruction that loads the immediate value 0x0 into the register x1. Similarly, it is used to set the PC value, and this time the address is 0x0, indicating that the entry point of the program is the address of the main
  • auipc is U-type instruction
  • This instruction by RISC-V GreenCard table is R[rd]=PC+{imm,12'b0}

Ex stage

  • The purpose of this instruction is to read the value at the memory address pointed to by x8 and store it in register x10

  • and use two OP implement lw x10 0 x8

  • lw is I-type instruction,This instruction by RISC-V GreenCard table is R[rd]={32'bM[](31),M[R[rs1]+imm](31:0)} and Core instruction foramt is imm[11:0],rs1,funct3,rd,opcode

Mem stage

  • addi x8,x8,0 isI-type instruction,This instruction by RISC-V GreenCard table isR[rd]=R[rs1]+imm
Imm opcode Funct3
Imm[11:0] 0010011 000
  • This is an instruction that loads an immediate value doesn’t involve memory access

WB stage

  • In this stage auipc instruction that loads the immediate value 0x10000 into register x8.

CPU analysis

Conclusion

This is my first assignment that took me quite a while, constantly working between C language and the RISC-V architecture. Through the CLZ function, it has deepened my understanding of RISC-V instructions, and I have realized my clear shortcomings, requiring more time to enhance my background knowledge in this course. Thanks to my fellow students who discussed with me, it has also made me aware of how complex it can be to recreate a linked list and manage memory in RISC-V

Reference

Assignment1: RISC-V Assembly and Instruction Pipeline
The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
RISC-V Assembly Programmer's Manual
Linked List: Intro
RISC-V Datapath Part4: Pipeline
RISC-V Greensheet
Find first set

Select a repo