Selected subject: Palindrome detection with counting leading zeros.
Contributed by Ray Huang (黃柏叡).
Acknowledgment (unordered):
A palindrome number is a number that reads the same backward for forward. For example,
With the similar idea, padaray and I defines a palindrome bit sequence (abbreviated as palindrome) as follows. Given a bit sequence without leading zero, saying
For example, given
In contrast, number
The following pseudo-code can solve the problem in the intuitive way.
&
from the most significant bit to the lease significant bit until some bit is set (1), and the number of bit-wise AND minus one is the number of leading zeros." A special case is that if The 2nd step takes
In a 64-bit environment, the 3rd step manipulates on 64 bits. But looking closely for the definition of palindrome given fixed number of maximal bits (64 in this case), padaray (2023) optimize this step by only reversing one half of the given number. That is, he optimized this step to have
However, in this way, the 3rd step still takes
Reference: Lab 2 (rv32emu) provided by System Software Programming (2023).
wget
command. Do the same thing for the SHA256 checksum.
wget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v13.2.0-2/xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
wget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v13.2.0-2/xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz.sha
cat xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz.sha
# output: 52545afb900200fbf65fe05f7ce7090b8a42c64091f4f5d43cae6bb68ea2434a xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
sha256sum xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
# output: 52545afb900200fbf65fe05f7ce7090b8a42c64091f4f5d43cae6bb68ea2434a xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
tar zxf xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
echo 13.2.0-2 > xpack-riscv-none-elf-gcc-13.2.0-2/version.txt
mkdir -p ~/app
mv xpack-riscv-none-elf-gcc-13.2.0-2 ~/app/riscv-none-elf-gcc
PATH
environment variable.
echo "export PATH=\"$HOME/app/riscv-none-elf-gcc/bin:\$PATH\"" >> ~/.bashrc
source ~/.bashrc
PATH
properly.
riscv-none-elf-gcc -v
# expected output:
# ... (omitted)
# gcc version 13.2.0 (xPack GNU RISC-V Embedded GCC x86_64)
sudo apt update && sudo apt install -y libsdl2-dev libsdl2-mixer-dev make
mkdir ~/ca
git clone https://github.com/sysprog21/rv32emu ~/ca/rv32emu
cd ~/ca/rv32emu
make
make check
# expected output:
# Running hello.elf ... [OK]
# Running puzzle.elf ... [OK]
# Running pi.elf ... [OK]
hello.elf
by rv32emu.
build/rv32emu build/hello.elf
# expected output:
# Hello World!
# Hello World!
# Hello World!
# Hello World!
# Hello World!
# inferior exit code 0
-d
: Display the assembler mnemonics for the machine instructions.)
riscv-none-elf-objdump -d build/hello.elf
build/hello.elf: file format elf32-littleriscv
Disassembly of section .text:
00000000 <_start>:
0: 00000293 li t0,0
4: 00500313 li t1,5
8: 0040006f j c <_start+0xc>
c: 00000013 nop
00000010 <loop>:
10: 02628063 beq t0,t1,30 <end>
14: 04000893 li a7,64
18: 00100513 li a0,1
1c: 03c00593 li a1,60
20: 00d00613 li a2,13
24: 00000073 ecall
28: 00128293 add t0,t0,1
2c: fe5ff06f j 10 <loop>
00000030 <end>:
30: 05d00893 li a7,93
34: 00000513 li a0,0
38: 00000073 ecall
-h
: Display the ELF file header)
riscv-none-elf-readelf -h build/hello.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 4532 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 6
riscv-none-elf-size build/hello.elf
text data bss dec hex filename
73 0 0 73 49 build/hello.elf
zicsr
add. The result is __riscv_zicsr
.
echo | riscv-none-elf-gcc -march=rv32i -E -dM - > tmp-oritinal.txt
echo | riscv-none-elf-gcc -march=rv32i_zicsr -E -dM - > tmp-csr.txt
diff tmp-oritinal.txt tmp-csr.txt
# result:
# 232a233
# > #define __riscv_zicsr 2000000
_zicsr
. And the final compiling target is required to be without _zicsr
. To share the code in the two cases above, I have to detect if __riscv_zicsr
is defined in C with the following illustration. (Note: .ifdef
doesn't work in assembly.)
#ifdef __riscv_zicsr
extern uint64_t get_instret();
#endif // __riscv_zicsr
m
. The result comes with multiple defines, and I will consider only __riscv_m
in my code.
echo | riscv-none-elf-gcc -march=rv32i -E -dM - > tmp-oritinal.txt
echo | riscv-none-elf-gcc -march=rv32im -E -dM - > tmp-muldiv.txt
diff tmp-oritinal.txt tmp-muldiv.txt
# result:
# 126a127
# > #define __riscv_mul 1
# 152a154
# > #define __riscv_muldiv 1
# 212a215
# > #define __riscv_m 2000000
# 303a307
# > #define __riscv_div 1
printf
(in standard library <stdio.h>
) to assembly, I will loose the ability to link the object file in which implements printf
. As a result, I utilize the ecall to write
(with similar definition of the write command in Linux.) To convert integers to characters arrays, I need the support of modulo and division from the m
extension for convenience, although modulo and division can be implemented by integer operations.You can get the complete code from the repository coding-ray/2023-ca-hw-2 on GitHub.
The code of this version is in the directory archived/v1-base.
This version is the baseline version, without any optimization from me. In this version, I refactored the code from padaray's work. The primary principle is that after my refactoring, the algorithm should be the same as the original work, including the instructions.
In addition, I did the following refactoring:
gcc
with flags -Wall -Wextra
. (c908804)palindrome_detected
with is_palindrome
. The latter one is in the format "verb_noun." (d53f444)is_palindrome
. Specifically, move the function call of count_leading_zeros
to the first line of is_palindrome
. (d53f444)x <op>= y
(from x = x <op> y
). (d53f444)src/
. (4d17e98)The code of this version is in the directory archived/v2-optimized-reversing.
The code to reverse 64 bits is as follows before optimization. If the number of bits is
uint64_t tmp_a = a;
uint64_t reverse_a = 0;
for (int i = 0; i < 64; i++) {
reverse_a <<= 1;
reverse_a |= tmp_a & 1;
tmp_a >>= 1;
}
reverse_a >>= n_left_shift;
In this version, I apply the algorithm of reversing bits in place from Douglas W. Jones (2018), sth and Ry- (2010, 2020).
To reverse 16 bits in place, Douglas W. Jones (2018) wrote the following code in C.
unsigned int reverse( unsigned int x ){
x = ((x >> 1) & 0x55555555) + ((x & 0x55555555) << 1);
x = ((x >> 2) & 0x33333333) + ((x & 0x33333333) << 2);
x = ((x >> 4) & 0x0F0F0F0F) + ((x & 0x0F0F0F0F) << 4);
x = ((x >> 8) & 0x00FF00FF) + ((x & 0x00FF00FF) << 8);
x = ((x >> 16) ) + ((x ) << 16);
return x;
}
The basic idea is as follows.
>>
and AND &
operations pick up bits with interleaving of one bit, from the most-significant bit of x
. And then shift the picked bits right by one bit. That is, saying that x
has bits 31, 30, 29, +
operator takes bits 31, 29, 27, x
, and put them to positions of bits 30, 28, 26, +
operator pick up the rest of bits, and shift them left by one bit. That is, the bits 30, 28, x
are moved to the positions of bits 31, 29, +
operation is actually OR |
operation for the two operands. So after this operation, the bits in the pair of two bits are swapped. Bits abcd
are badc
afterwards.badc
become dcba
later.0x0000FFFF
, the result is the same.With the same idea, I wrote the following code.
// file header
#define INTERLEAVE_BY_01 0x5555555555555555
#define INTERLEAVE_BY_02 0x3333333333333333
#define INTERLEAVE_BY_04 0x0F0F0F0F0F0F0F0F
#define INTERLEAVE_BY_08 0x00FF00FF00FF00FF
#define INTERLEAVE_BY_16 0x0000FFFF0000FFFF
// function body with `r` defined previously
r = ((r >> 1) & INTERLEAVE_BY_01) + ((r & INTERLEAVE_BY_01) << 1);
r = ((r >> 2) & INTERLEAVE_BY_02) + ((r & INTERLEAVE_BY_02) << 2);
r = ((r >> 4) & INTERLEAVE_BY_04) + ((r & INTERLEAVE_BY_04) << 4);
r = ((r >> 8) & INTERLEAVE_BY_08) + ((r & INTERLEAVE_BY_08) << 8);
r = ((r >> 16) & INTERLEAVE_BY_16) + ((r & INTERLEAVE_BY_16) << 16);
r = (r >> 32) + (r << 32);
// finish reversing
With this optimization, the algorithm improves time complicity form
With different optimization levels of gcc
, the comparison between versions 1 and 2 is as follows.
Performance evaluation (number of CSR cycles) of the four test cases. These four tests are the same across all versions in the "archived" directory. These four tests are inputting 0, 1, 0x00000C0000000003 and 0x0F000000000000F0 to the is_palindrome
function, respectively. And across the versions, all the results are verified before any evaluation.
In the following table, for the cells that contain two numbers, the first number is the result from v1, and the latter is from v2. Since optimization level of -Ofast
and -O3
yield the same result for all tests in all versions, the former one is not included in the tables in this report.
Opt. | # of inst. ret. | Case 1 | Case 2 | Case 3 | Case 4 |
---|---|---|---|---|---|
-O0 |
720/727 | 3674/756 | 3653/735 | 0/0 | 3769/851 |
-O1 |
734/734 | 923/235 | 0/0 | 0/0 | 0/0 |
-O2 |
734/734 | 920/235 | 0/0 | 0/0 | 0/0 |
-O3 |
730/734 | 894/223 | 0/0 | 0/0 | 0/0 |
-Os |
784/734 | 131/260 | 0/129 | 0/0 | 0/0 |
Program size evaluation is as follows. The numbers are all the number of bytes of the entire program in decimal.
Opt. | V1 size | V2 size | V2/V1 Ratio |
---|---|---|---|
-O0 |
10428 | 11384 | 109% |
-O1 |
8796 | 9068 | 103% |
-O2 |
8792 | 9064 | 103% |
-O3 |
9124 | 9396 | 103% |
-Os |
8776 | 9032 | 103% |
The code of this version is in the directory archived/v3-write.
In this version, I replace function calls of printf
with write
, for the sake of linking error for the object file that contains printf
when I want to link my object files. In detail, I didn't successfully link /usr/lib/x86_64-linux-gnu/libc.a
or /usr/lib/x86_64-linux-gnu/libc.so
. (I am glad of it.)
Moreover, in this version, I made the following features:
main
function work with rv32emu, replace jr ra
or ret
in the end of the main function with the following assembly
li a7, 93 # "exit" syscall
add a0, x0, 0 # Use 0 return code
ecall # invoke syscall to terminate the program
write
function in rv32emu, in assembly files, replace call write
with li a7, 64
and ecall
.call __udivdi3
with divu a0,a0,a2
, and call __umoddi3
with remu a0,a0,a2
.The results of this version show that given almost the similar printed messages to the stdout, the size of the ELF in this version is much smaller than that in v2.
In the following table, for the cells that have two numbers, the first one the the result of v2, and the latter is of v3.
With the optimization level -Os
, the compiler emits errors for the undefined reference to memcpy
, __ashldi3
, __lshrdi3
. However, I didn't even call these functions in C, so I just get rid of the result for this optimization level.
Opt. | # of inst. ret. | Case 1 | Case 2 | Case 3 | Case 4 |
---|---|---|---|---|---|
-O0 |
727/34 | 756/756 | 735/735 | 0/0 | 851/752 |
-O1 |
734/38 | 235/235 | 0/0 | 0/0 | 0/0 |
-O2 |
734/37 | 235/235 | 0/0 | 0/0 | 0/0 |
-O3 |
734/42 | 223/223 | 0/0 | 0/0 | 0/0 |
Program size evaluation is as follows.
Opt. | V2 size | V3 size | V3/V2 ratio |
---|---|---|---|
-O0 |
11384 | 3324 | 29% |
-O1 |
9068 | 1012 | 11% |
-O2 |
9064 | 1008 | 11% |
-O3 |
9396 | 1340 | 14% |