# Improve JPEG Encode (tweaked for rv32emu) contributed by < [`JoshuaLee0321`](https://github.com/JoshuaLee0321/JPEG_Encoder.git), [`Paintakotako`](https://github.com/Paintako) > # About JPEG **JPEG is a compression algorithm, not a file format.** ## How to determine a file is a `jpeg` format? Almost every binary file contains a couple of markers (or headers). They are very crucial for making sense of a file and are used by programs like file (on Mac/Linux) to tell us details about a file. These markers define where some specific information in a file is stored. For example: * ELF executables start with the byte `7F` followed by "ELF" (7F 45 4C 46). * HTTP/2 connections are opened with the preface `0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a`, or `PRI *` * `HTTP/2.0\r\n\r\nSM\r\n\r\n`. The preface is designed to avoid the processing of frames by servers and intermediaries which support earlier versions of HTTP but not 2.0. * GIF image files have the ASCII code for `GIF89a` (47 49 46 38 39 61) or `GIF87a` (47 49 46 38 37 61) * JPEG image files begin with `FF D8` and end with `FF D9`. References: [Magic_number_(programming)](https://en.wikipedia.org/wiki/Magic_number_(programming)) So, if we want to encode a pictrue into JPEG, we must follow the abovementioned format. ## JPEG workflow > The JPEG standard basically has four different operational modes: * Baseline Sequential DCT-based Coding * Progressive DCT-based Coding * Lossless Coding * Hierarchical Coding > The commonly referred JPEG compression mode is essentially the Baseline Sequential DCT-based Coding mode. In this mode, compression is based on the Sequential Discrete Cosine Transform (DCT), which is widely used for encoding JPEG images. > Major steps of JPEG encoding: * DCT (Discrete Cosine Transformation) * Quantization * Zigzag Scan * DPCM on DC component * RLE on AC Components * Entropy Coding ## JPEG steps 0. [RGB color model](https://en.wikipedia.org/wiki/RGB_color_model) > An RGB triplet (r,g,b) represents the `three-dimensional coordinate` of the point of the given color within the cube or its faces or along its edges. 1. [Color Space Conversion](https://en.wikipedia.org/wiki/YCbCr) > On this step the colors of the `RGB` coordinates are converted to $YC_bC_r$. During this conversion we take three standard channels (`RGB`) and convert them into a different representation whichis based on a luminance channel and two opposing color channels. These three channels are typically less interdependent then the RGB channels. That allows us to store them with different resolution. If the image is `CMYK` or `grayscale`, it is processed without any color conversion. * basically is to convert RBG into less data > The conversion formula is as follows: > $Y = 0.299R + 0.587G + 0.114B$ > $C_b = -0.168R - 0.331G + 0.499B$ > $C_r = 0.5R - 0.419G - 0.081B$ 2. Chroma Subsampling > It is widely known that we are much more sensitive to changes in luminance than in chromainance. It means thatneglect larger changes in the chrominance without affecting our perception of the image. ## Discrete Cosine Transform (DCT) > The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT is similar to the discrete Fourier transform: it transforms a signal or image from the spatial domain to the frequency domain Before we apply JPEG compression to an image, it converts the image into chunks of 8x8 blocks of pixels. [Ref](https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node231.html#sec:DCT) # Run Before running, we need to generate excutable file (ELF). The following process can generate ELF step by step ## Debugging Follow the [RISCV-asm-manual-pseudo-ops](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops), the code which was written by the auther has some syntax error. ## Assembly to object file ```sh riscv-none-elf-as -R -march=rv32i -mabi=ilp32 -o hello.o hello.S ``` * -R option: [fold data section](https://en.wikipedia.org/wiki/Code_folding) into text section We can use the following assembly code to observe difference between none folding data and folding data. ```asm # RISC-V assembly program to print "Hello World!" to stdout. .org 0 # Provide program starting address to linker .global _start /* newlib system calls */ .set SYSEXIT, 93 .set SYSWRITE, 64 .section .rodata str: .ascii "Hello World!\n" .set str_size, .-str # .-: computes the length of given string size # data section .data w1: .word 0x55555555 w2: .word 0x33333333 w3: .word 0x0f0f0f0f w4: .word 0x0000007f test: .word 0x00000001 .text _start: li t0, 0 li t1, 5 # dummy test for jal instruction .L1: jal x0, .L2 .L2: nop loop: beq t0, t1, end li a7, SYSWRITE # "write" syscall li a0, 1 # 1 = standard output (stdout) la a1, str # load address of hello string li a2, str_size # length of hello string ecall # invoke syscall to print the string addi t0, t0, 1 j loop end: li a7, SYSEXIT # "exit" syscall add a0, x0, 0 # Use 0 return code ecall # invoke syscall to terminate the program ``` `.org`, `.set`, `.-str`: These instuction was not found in [RISCV-asm manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md), but in [GNU assembler manual](https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_node/as_toc.html). ```bash $ riscv-none-elf-as -R -march=rv32i -mabi=ilp32 -o hello.o hello.S $ riscv-none-objdump -d hello.o > h1.txt $ riscv-none-elf-as -march=rv32i -mabi=ilp32 -o hello.o hello.S $ riscv-none-objdump -d hello.o > h2.txt $ diff h1.txt h2.txt > > 00000040 <w1>: > 40: 55555555 .word 0x55555555 > > 00000044 <w2>: > 44: 33333333 .word 0x33333333 > > 00000048 <w3>: > 48: 0f0f0f0f .word 0x0f0f0f0f > > 0000004c <w4>: > 4c: 0000007f .word 0x0000007f > > 00000050 <test>: > 50: 00000001 .word 0x00000001 ``` The `.data` section is inserted into the `.text` section. ## Generating ELF file Using Makefile can help us generate ELF efficeintly, using the following code: ```make # Created by Paintakotako on 2024/1/7. # AS: assembler, convert assembly code to object file. AS = riscv-none-elf-as # CC: compiler, convert C code to object file. CC = riscv-none-elf-gcc # march: machine architecture, using rv32im, because mul and div are needed. CFLAGS = -march=rv32im -mabi=ilp32 # wildcard: find all files with specific suffix, *.s means all files with suffix .s ASSEMBLY_FILES = $(wildcard *.asm) OBJECT_FILES = $(ASSEMBLY_FILES:.asm=.o) all: main.elf # Rule to generate .o file from .s file %.o: %.asm @$(AS) $(CFLAGS) -o $@ $< main.elf: $(OBJECT_FILES) @$(CC) -g -o $@ $^ @echo "Generate main.elf successfully!" @rm -f *.o .PHONY: all clean clean: @rm -f *.o *.elf *.bin *.hex ``` After generating ELF file, we can disassemble the elf file in order to find out whether all `.asm` file is linked correctly ```bash $ riscv-none-objdump -d main.elf ``` By observing the disassemly code, we find that all external symbols are correctly inserted into the ELF file, which meets out expectations. # Tweaked for [rv32emu](https://github.com/sysprog21/rv32emu) Running on [rv32emu](https://github.com/sysprog21/rv32emu), reading the docs section, `RISC-V calling conventions` is slightly different to the origin version of RISC-V, when we try to run raw riscv version assembly on emulator, warning such like `unknown syscall 0` arise. So, we need to modify origin code in order to run correctly on `rv32emu`. ## Print with rv32emu Using the syscall provided for RISC-V does not work on rv32emu, so we need to modify the original code in order to get our output correctly. ## Problems with expanding macro When expanding a macro, we discovered that labels defined within the macro are redundantly expanded. This leads to confusion during the assembling stage, as it is uncertain which macro to jump to. The assembler then signals a duplicate label definition. To address this issue, we need to move the labels defined within the macro to the external scope to resolve the problem of redundant expansion. Details can refer to this [commit](https://github.com/lorenz369/JPEG_Encoder/commit/2057a54fac1603a45f08decd3e9a88c9ba18a2bb). # Performance analyze Using system call [clock_gettime](https://linux.die.net/man/3/clock_gettime) in order to evaluate each functions eclapes cycles. ```diff + .equ CLKGET, 403 + gettime: + addi sp,sp,-32 + sw ra,28(sp) + mv a1,sp + li a0,0 + li a7, CLKGET + call printFunct + lw a1,8(sp) + lui a0,%hi(.LC1) + addi a0,a0,%lo(.LC1) + call printf + lw a0,8(sp) + lw ra,28(sp) + addi sp,sp,32 + jr ra + .size gettime, .-gettime + .align 1 + .globl main + .type main, @function ``` Details can refer to this [commit](https://github.com/lorenz369/JPEG_Encoder/commit/b91ddf64516dbebf66a443362adcfefe3b95f285) After implementing rdcycle function call, we can get the following result: | Function | Clock cycyles | | -------- | -------- | | rgb2y | 1430 | | dct_c | 700 | | dct_r | 684 | | nq_y | 919 | | nzigzag | 144 | | rle | 10138 | | nhuffman_y | 5109 | | norganize | 2486 | Total cycles elapsed: 27330 # Improve performance :::warning Show me more about optimizations. ::: # Reference [Introduction to JPEG](http://twins.ee.nctu.edu.tw/courses/soclab_04/lab_hw_pdf/proj1_jpeg_introduction.pdf) [Implementation by Python](https://yasoob.me/posts/understanding-and-writing-jpeg-decoder-in-python/#huffman-encoding) [The 101 of ELF files on Linux: Understanding and Analysis](https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/#elf-header)