Improve JPEG Encode (tweaked for rv32emu)

contributed by < JoshuaLee0321, Paintakotako >

About JPEG

JPEG is a compression algorithm, not a file format.

How to determine a file is a `jpeg` format?

Almost every binary file contains a couple of markers (or headers).

They are very crucial for making sense of a file and are used by programs like file (on Mac/Linux) to tell us details about a file. These markers define where some specific information in a file is stored.

For example:

ELF executables start with the byte 7F followed by "ELF" (7F 45 4C 46).
HTTP/2 connections are opened with the preface 0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a, or PRI *
HTTP/2.0\r\n\r\nSM\r\n\r\n. The preface is designed to avoid the processing of frames by servers and intermediaries which support earlier versions of HTTP but not 2.0.
GIF image files have the ASCII code for GIF89a (47 49 46 38 39 61) or GIF87a (47 49 46 38 37 61)
JPEG image files begin with FF D8 and end with FF D9.

References: Magic_number_(programming)

So, if we want to encode a pictrue into JPEG, we must follow the abovementioned format.

JPEG workflow

The JPEG standard basically has four different operational modes:

Baseline Sequential DCT-based Coding
Progressive DCT-based Coding
Lossless Coding
Hierarchical Coding

The commonly referred JPEG compression mode is essentially the Baseline Sequential DCT-based Coding mode. In this mode, compression is based on the Sequential Discrete Cosine Transform (DCT), which is widely used for encoding JPEG images.

Major steps of JPEG encoding:

DCT (Discrete Cosine Transformation)
Quantization
Zigzag Scan
DPCM on DC component
RLE on AC Components
Entropy Coding

JPEG steps

RGB color model

An RGB triplet (r,g,b) represents the three-dimensional coordinate of the point of the given color within the cube or its faces or along its edges.

Color Space Conversion

On this step the colors of the RGB coordinates are converted to
$Y C_{b} C_{r}$ . During this conversion we take three standard channels (RGB) and convert them into a different representation whichis based on a luminance channel and two opposing color channels. These three channels are typically less interdependent then the RGB channels. That allows us to store them with different resolution. If the image is CMYK or grayscale, it is processed without any color conversion.

basically is to convert RBG into less data

The conversion formula is as follows:

$Y = 0.299 R + 0.587 G + 0.114 B$

$C_{b} = - 0.168 R - 0.331 G + 0.499 B$

$C_{r} = 0.5 R - 0.419 G - 0.081 B$

Chroma Subsampling

It is widely known that we are much more sensitive to changes in luminance than in chromainance. It means thatneglect larger changes in the chrominance without affecting our perception of the image.

Discrete Cosine Transform (DCT)

The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT is similar to the discrete Fourier transform: it transforms a signal or image from the spatial domain to the frequency domain

Before we apply JPEG compression to an image, it converts the image into chunks of 8x8 blocks of pixels.

Ref

Run

Before running, we need to generate excutable file (ELF).

The following process can generate ELF step by step

Debugging

Follow the RISCV-asm-manual-pseudo-ops, the code which was written by the auther has some syntax error.

Assembly to object file

riscv-none-elf-as -R -march=rv32i -mabi=ilp32 -o hello.o hello.S

-R option: fold data section into text section

We can use the following assembly code to observe difference between none folding data and folding data.

# RISC-V assembly program to print "Hello World!" to stdout.

.org 0
# Provide program starting address to linker
.global _start

/* newlib system calls */
.set SYSEXIT,  93
.set SYSWRITE, 64

.section .rodata
str: .ascii "Hello World!\n"
     .set str_size, .-str    # .-: computes the length of given string size

# data section
.data 
    w1: .word 0x55555555
    w2: .word 0x33333333
    w3: .word 0x0f0f0f0f
    w4: .word 0x0000007f
    test: .word 0x00000001

.text
_start:
    li t0, 0
    li t1, 5

    # dummy test for jal instruction
.L1:
    jal x0, .L2
.L2:
    nop

loop:
    beq t0, t1, end

    li a7, SYSWRITE     # "write" syscall
    li a0, 1            # 1 = standard output (stdout)
    la a1, str          # load address of hello string
    li a2, str_size     # length of hello string
    ecall               # invoke syscall to print the string
    addi t0, t0, 1
    j loop

end:
    li a7, SYSEXIT      # "exit" syscall
    add a0, x0, 0       # Use 0 return code
    ecall               # invoke syscall to terminate the program

.org, .set, .-str: These instuction was not found in RISCV-asm manual, but in GNU assembler manual.

$ riscv-none-elf-as -R -march=rv32i -mabi=ilp32 -o hello.o hello.S 
$ riscv-none-objdump -d hello.o > h1.txt
$ riscv-none-elf-as -march=rv32i -mabi=ilp32 -o hello.o hello.S 
$ riscv-none-objdump -d hello.o > h2.txt

$ diff h1.txt h2.txt

> 
> 00000040 <w1>:
>   40: 55555555                .word   0x55555555
> 
> 00000044 <w2>:
>   44: 33333333                .word   0x33333333
> 
> 00000048 <w3>:
>   48: 0f0f0f0f                .word   0x0f0f0f0f
> 
> 0000004c <w4>:
>   4c: 0000007f                .word   0x0000007f
> 
> 00000050 <test>:
>   50: 00000001                .word   0x00000001

The .data section is inserted into the .text section.

Generating ELF file

Using Makefile can help us generate ELF efficeintly, using the following code:

# Created by Paintakotako on 2024/1/7.

# AS: assembler, convert assembly code to object file.
AS = riscv-none-elf-as

# CC: compiler, convert C code to object file.
CC = riscv-none-elf-gcc

# march: machine architecture, using rv32im, because mul and div are needed.
CFLAGS = -march=rv32im -mabi=ilp32 

# wildcard: find all files with specific suffix, *.s means all files with suffix .s
ASSEMBLY_FILES = $(wildcard *.asm)
OBJECT_FILES = $(ASSEMBLY_FILES:.asm=.o)

all: main.elf

# Rule to generate .o file from .s file
%.o: %.asm
	@$(AS) $(CFLAGS) -o $@ $<

main.elf: $(OBJECT_FILES)
	@$(CC) -g -o $@ $^
	@echo "Generate main.elf successfully!"
	@rm -f *.o
.PHONY: all clean
clean:
	@rm -f *.o *.elf *.bin *.hex

After generating ELF file, we can disassemble the elf file in order to find out whether all .asm file is linked correctly

$ riscv-none-objdump -d main.elf

By observing the disassemly code, we find that all external symbols are correctly inserted into the ELF file, which meets out expectations.

Tweaked for rv32emu

Running on rv32emu, reading the docs section, RISC-V calling conventions is slightly different to the origin version of RISC-V, when we try to run raw riscv version assembly on emulator, warning such like unknown syscall 0 arise.

So, we need to modify origin code in order to run correctly on rv32emu.

Print with rv32emu

Using the syscall provided for RISC-V does not work on rv32emu, so we need to modify the original code in order to get our output correctly.

Problems with expanding macro

When expanding a macro, we discovered that labels defined within the macro are redundantly expanded. This leads to confusion during the assembling stage, as it is uncertain which macro to jump to. The assembler then signals a duplicate label definition. To address this issue, we need to move the labels defined within the macro to the external scope to resolve the problem of redundant expansion.

Details can refer to this commit.

Performance analyze

Using system call clock_gettime in order to evaluate each functions eclapes cycles.

+ 	.equ CLKGET, 403
+ gettime:
+      addi sp,sp,-32
+	sw	ra,28(sp)
+	mv	a1,sp
+	li	a0,0
+	li	a7, CLKGET

+	call printFunct
+	lw	a1,8(sp)
+	lui	a0,%hi(.LC1)
+	addi	a0,a0,%lo(.LC1)
+	call	printf
+	lw	a0,8(sp)
+	lw	ra,28(sp)
+	addi	sp,sp,32
+	jr	ra
+	.size	gettime, .-gettime
+	.align	1
+	.globl	main
+	.type	main, @function

Details can refer to this commit

After implementing rdcycle function call, we can get the following result:

Function	Clock cycyles
rgb2y	1430
dct_c	700
dct_r	684
nq_y	919
nzigzag	144
rle	10138
nhuffman_y	5109
norganize	2486

Total cycles elapsed: 27330

Improve performance

Show me more about optimizations.

Reference

Introduction to JPEG

Implementation by Python

The 101 of ELF files on Linux: Understanding and Analysis