# RVV-accelerated Image Codec
> 洪至謙, 曾遠哲
## What is RVV-accelerated?
The RISC-V Vector Extension is a key component of the RISC-V instruction set architecture, providing efficient vector computation capabilities.
### Scalable Vector Length
- Supports various vector register lengths (VLEN), allowing flexibility across different hardware platforms.
- Dynamically sets the vector length using the `vsetvli` instruction.
- Enables support for vectorV32IM implementatio operations of varying lengths within the same code.
### Vector Operation Instructions
- Supports basic arithmetic operations.
- Bitwise operations.
- Load/store instructions.
- Vector reduction operations.
- Vector masking operations.
## What is QOI?
QOI (Quite OK Image Format) is a lossless image composition format designed with simplicity and speed.
Speed: Offers significantly faster encoding and decoding compared to stb_image_write(20x-50x) and stb_image(3x-4x).
Supports RGB and RGBA: Handles images with and without an alpha channel.
### QOI file structure
1. Header (14 bytes):
1. magic bytes ("qoif")
The string "qoi " is used to identify that this is a valid QOI file.
2. image width
3. height
4. number of channels
**3** indicates that the image uses the RGB color mode.
**4** indicates that the image uses the RGBA color mode.
5. colorspace info
**0** indicates sRGB.
**1** indicates linear RGB.
```c
qoi_header {
char magic[4]; // magic bytes "qoif"
uint32_t width; // image width in pixels (BE)
uint32_t height; // image height in pixels (BE)
uint8_t channels; // 3 = RGB, 4 = RGBA
uint8_t colorspace; // 0 = sRGB with linear alpha
// 1 = all channels linear
};
```
Images are encoded by
1. row by row
2. left to right
3. top to bottom
Encoder/Decoder start with `{r:0, g:0, b:0, a:255}` as previous pixel value.
When all pixels within the $\text{width} \times \text{height}$ have been filled, this image is complete.
Pixels are encoded as
1. a run of the previous pixel
(run length encoding)
2. an index into an array of previously seen pixels
3. a difference to the previous pixel value in `r,g,b`
(difference must be very small, images with anti-aliasing)
4. full `r,g,b` or `r,g,b,a` values
Note: the color channels are assumed not to be premultiplied with the alpha channel.
$$ \text{index_position} = (r \times 3 + g \times 5 + b \times 7 + a \times 11) \% 64$$
:::info
This is a simple Hash algorithm that minimizes the Hash collision.
:::
Every chunk starts with a 2/8-bit tag, followed by some data bits.
All chunks are byte aligned (The bit length of chunks is divisible by 8.)
All data bits' MSB are on the left.
The 8-bit tags have precedence over the 2-bit tags.
(A decoder must check for the presence of an 8-bit tag first.)
:::danger
Reduce the indention.
:::
:::info
Finished, please check it out. Thank you.
:::
2. Data Chunks:
- QOI_OP_RGBA/QOI_OP_RGB
QOI_OP_RGB
|Byte[0]|Byte[1]|Byte[2]|Byte[3]|
|---|---|---|---|
|7 6 5 4 3 2 1 0|$7 \dots 0$|$7 \dots 0$|$7 \dots 0$|
|1 1 1 1 1 1 1 0|red|green|blue|
QOI_OP_RGBA
|Byte[0]|Byte[1]|Byte[2]|Byte[3]|Byte[4]|
|---|---|---|---|---|
|7 6 5 4 3 2 1 0|$7 \dots 0$|$7 \dots 0$|$7 \dots 0$|$7 \dots 0$|
|1 1 1 1 1 1 1 1|red|green|blue|alpha|
- QOI_OP_INDEX:
|Byte[0]|-|-|-|-|-|-|-|
|---|---|---|---|---|---|---|---|
|7|6|5|4|3|2|1|0|
|0|0|index|-|-|-|-|-|
- QOI_OP_DIFF:
|Byte[0]|-|-|-|-|-|-|-|
|---|---|---|---|---|---|---|---|
|7|6|5|4|3|2|1|0|
|0|1|dr|-|dg|-|db|-|
2-bit tag b01
2-bit red channel difference from the previous pixel -2..1
2-bit green channel difference from the previous pixel -2..1
2-bit blue channel difference from the previous pixel -2..1
:::info
The difference to the current channel values are using a wraparound operation.
E.g.:
1 - 2 -> 255
255 + 1 -> 0
Values are stored as unsigned integers with a bias of 2.
E.g.:
-2 -> 0 (b00)
1 -> 3 (b11)
:::
|Byte[0]|-|-|-|-|-|-|-|
|---|---|---|---|---|---|---|---|
|7|6|5|4|3|2|1|0|
|1|0|diff green|-|-|-|-|-|
|Byte[1]|-|-|-|-|-|-|-|
|---|---|---|---|---|---|---|---|
|7|6|5|4|3|2|1|0|
|dr-dg|-|-|-|db-dg|-|-|-|
2-bit tag b10
6-bit green channel difference from the previous pixel -32..31
4-bit red channel difference minus green channel difference -8..7
4-bit blue channel difference minus green channel difference -8..7
:::info
The `green` channel
1. indicate the general direction of change
2. encoded in 6 bits
The `red` and `blue` channels (`dr` and `db`) base their differences on the green channel difference.
I.e.:
dr_dg = (cur_px.r - prev_px.r) - (cur_px.g - prev_px.g)
db_dg = (cur_px.b - prev_px.b) - (cur_px.g - prev_px.g)
The difference to the current channel values are using a wraparound operation.
E.g.:
10 - 13 -> 253
250 + 7 -> 1
Values are stored as unsigned integers with a bias of 32 for the green channel and a bias of 8 for the red and blue channel.
:::
- QOI_OP_RUN:
|Byte[0]|-|-|-|-|-|-|-|
|---|---|---|---|---|---|---|---|
|7|6|5|4|3|2|1|0|
|1|1|index|-|-|-|-|-|
:::warning
The run-length is stored with a bias of -1.
Note that the runlengths 63 and 64 (b111110 and b111111) are illegal as they are occupied by the `QOI_OP_RGB` and `QOI_OP_RGBA` tags.
:::
3. End Marker (8 bytes):
## QOI encoder - 曾遠哲
:::danger
The following codes is untested.
The program cannot read the binary file from the QEMU emulator. Not so sure why but I am using the user mode of QEMU instead of system mode. That is, there shall not be a total isolated hardware to separate the environments.
:::
### Baseline implementation
`qoi.h`
```c
// qoi.h
#ifndef QOI_H
#define QOI_H
#ifdef __cplusplus
extern "C" {
#endif
#define QOI_SRGB 0 // Standard RGB colorspace with linear alpha
#define QOI_LINEAR 1 // All channels are linear
// Description of the image - width, height, channels, and colorspace
typedef struct {
unsigned int width;
unsigned int height;
unsigned char channels; // 3 = RGB, 4 = RGBA
unsigned char colorspace; // 0 = sRGB, 1 = linear
} qoi_desc;
// Core encoding function: converts raw pixels to QOI format
void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len);
// Core decoding function: converts QOI format back to raw pixels
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
// File handling convenience functions
int qoi_write(const char *filename, const void *data, const qoi_desc *desc);
void *qoi_read(const char *filename, qoi_desc *desc, int channels);
#ifdef __cplusplus
}
#endif
#endif // QOI_H
#ifdef QOI_IMPLEMENTATION
// Include necessary headers
#include <stdlib.h>
#include <string.h>
// If stdio functions are needed
#ifndef QOI_NO_STDIO
#include <stdio.h>
#endif
// Allow custom memory management
#ifndef QOI_MALLOC
#define QOI_MALLOC(sz) malloc(sz)
#define QOI_FREE(p) free(p)
#endif
// Allow custom array zeroing
#ifndef QOI_ZEROARR
#define QOI_ZEROARR(a) memset((a),0,sizeof(a))
#endif
// Chunk type tags
#define QOI_OP_INDEX 0x00 // 00xxxxxx - 6-bit index into color array
#define QOI_OP_DIFF 0x40 // 01xxxxxx - 2-bit RGB channel differences
#define QOI_OP_LUMA 0x80 // 10xxxxxx - Larger RGB differences
#define QOI_OP_RUN 0xc0 // 11xxxxxx - Run of pixels
#define QOI_OP_RGB 0xfe // 11111110 - Full RGB values
#define QOI_OP_RGBA 0xff // 11111111 - Full RGBA values
#define QOI_MASK_2 0xc0 // Mask for 2-bit tag
// Hash function for the color index array
#define QOI_COLOR_HASH(C) (C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11)
// Magic bytes for file identification
#define QOI_MAGIC \
(((unsigned int)'q') << 24 | ((unsigned int)'o') << 16 | \
((unsigned int)'i') << 8 | ((unsigned int)'f'))
#define QOI_HEADER_SIZE 14
// Maximum image size (400 million pixels) for safety
#define QOI_PIXELS_MAX ((unsigned int)400000000)
// Union for RGBA pixel manipulation
typedef union {
struct { unsigned char r, g, b, a; } rgba;
unsigned int v;
} qoi_rgba_t;
// End-of-stream marker
static const unsigned char qoi_padding[8] = {0,0,0,0,0,0,0,1};
// Helper functions for handling 32-bit values
static void qoi_write_32(unsigned char *bytes, int *p, unsigned int v) {
bytes[(*p)++] = (0xff000000 & v) >> 24;
bytes[(*p)++] = (0x00ff0000 & v) >> 16;
bytes[(*p)++] = (0x0000ff00 & v) >> 8;
bytes[(*p)++] = (0x000000ff & v);
}
static unsigned int qoi_read_32(const unsigned char *bytes, int *p) {
unsigned int a = bytes[(*p)++];
unsigned int b = bytes[(*p)++];
unsigned int c = bytes[(*p)++];
unsigned int d = bytes[(*p)++];
return a << 24 | b << 16 | c << 8 | d;
}
// The core encoding function
void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
int i, max_size, p, run;
int px_len, px_end, px_pos, channels;
unsigned char *bytes;
const unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px, px_prev;
// Validate input parameters
if (data == NULL || out_len == NULL || desc == NULL ||
desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
desc->height >= QOI_PIXELS_MAX / desc->width)
{
return NULL;
}
// Calculate maximum possible size
max_size = desc->width * desc->height * (desc->channels + 1) +
QOI_HEADER_SIZE + sizeof(qoi_padding);
// Allocate output buffer
bytes = (unsigned char *) QOI_MALLOC(max_size);
if (!bytes) {
return NULL;
}
// Write file header
p = 0;
qoi_write_32(bytes, &p, QOI_MAGIC);
qoi_write_32(bytes, &p, desc->width);
qoi_write_32(bytes, &p, desc->height);
bytes[p++] = desc->channels;
bytes[p++] = desc->colorspace;
// Initialize encoding state
pixels = (const unsigned char *)data;
QOI_ZEROARR(index);
run = 0;
px_prev.rgba.r = 0;
px_prev.rgba.g = 0;
px_prev.rgba.b = 0;
px_prev.rgba.a = 255;
px = px_prev;
// Calculate pixel parameters
px_len = desc->width * desc->height * desc->channels;
px_end = px_len - desc->channels;
channels = desc->channels;
// Main encoding loop
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
// Read pixel values
px.rgba.r = pixels[px_pos + 0];
px.rgba.g = pixels[px_pos + 1];
px.rgba.b = pixels[px_pos + 2];
if (channels == 4) {
px.rgba.a = pixels[px_pos + 3];
}
// Check for run of identical pixels
if (px.v == px_prev.v) {
run++;
if (run == 62 || px_pos == px_end) {
bytes[p++] = QOI_OP_RUN | (run - 1);
run = 0;
}
}
else {
// End any current run
if (run > 0) {
bytes[p++] = QOI_OP_RUN | (run - 1);
run = 0;
}
// Check index for previously seen pixel
int index_pos = QOI_COLOR_HASH(px) % 64;
if (index[index_pos].v == px.v) {
bytes[p++] = QOI_OP_INDEX | index_pos;
}
else {
// Store pixel in index
index[index_pos] = px;
// Check if we can encode a small difference
if (px.rgba.a == px_prev.rgba.a) {
signed char vr = px.rgba.r - px_prev.rgba.r;
signed char vg = px.rgba.g - px_prev.rgba.g;
signed char vb = px.rgba.b - px_prev.rgba.b;
signed char vg_r = vr - vg;
signed char vg_b = vb - vg;
if (vr > -3 && vr < 2 &&
vg > -3 && vg < 2 &&
vb > -3 && vb < 2)
{
// Small difference - use QOI_OP_DIFF
bytes[p++] = QOI_OP_DIFF |
((vr + 2) << 4) |
((vg + 2) << 2) |
(vb + 2);
}
else if (vg_r > -9 && vg_r < 8 &&
vg > -33 && vg < 32 &&
vg_b > -9 && vg_b < 8)
{
// Larger difference - use QOI_OP_LUMA
bytes[p++] = QOI_OP_LUMA | (vg + 32);
bytes[p++] = ((vg_r + 8) << 4) | (vg_b + 8);
}
else {
// Full RGB values needed
bytes[p++] = QOI_OP_RGB;
bytes[p++] = px.rgba.r;
bytes[p++] = px.rgba.g;
bytes[p++] = px.rgba.b;
}
}
else {
// Alpha changed - need full RGBA
bytes[p++] = QOI_OP_RGBA;
bytes[p++] = px.rgba.r;
bytes[p++] = px.rgba.g;
bytes[p++] = px.rgba.b;
bytes[p++] = px.rgba.a;
}
}
}
px_prev = px;
}
// Write end marker
for (i = 0; i < (int)sizeof(qoi_padding); i++) {
bytes[p++] = qoi_padding[i];
}
*out_len = p;
return bytes;
}
// Core decoding function implementing the inverse operations
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
const unsigned char *bytes;
unsigned int header_magic;
unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px;
int px_len, chunks_len, px_pos;
int p = 0, run = 0;
// Input validation
if (data == NULL || desc == NULL ||
(channels != 0 && channels != 3 && channels != 4) ||
size < QOI_HEADER_SIZE + (int)sizeof(qoi_padding))
{
return NULL;
}
// Parse header
bytes = (const unsigned char *)data;
header_magic = qoi_read_32(bytes, &p);
desc->width = qoi_read_32(bytes, &p);
desc->height = qoi_read_32(bytes, &p);
desc->channels = bytes[p++];
desc->colorspace = bytes[p++];
// Validate header
if (desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
header_magic != QOI_MAGIC ||
desc->height >= QOI_PIXELS_MAX / desc->width)
{
return NULL;
}
// Set output channels
if (channels == 0) {
channels = desc->channels;
}
// Allocate pixel buffer
px_len = desc->width * desc->height * channels;
pixels = (unsigned char *) QOI_MALLOC(px_len);
if (!pixels) {
return NULL;
}
// Initialize decoder state
QOI_ZEROARR(index);
px.rgba.r = 0;
px.rgba.g = 0;
px.rgba.b = 0;
px.rgba.a = 255;
// Main decoding loop
chunks_len = size - (int)sizeof(qoi_padding);
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
if (run > 0) {
run--;
}
else if (p < chunks_len) {
int b1 = bytes[p++];
if (b1 == QOI_OP_RGB) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
}
else if (b1 == QOI_OP_RGBA) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
px.rgba.a = bytes[p++];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
px = index[b1];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_DIFF) {
px.rgba.r += ((b1 >> 4) & 0x03) - 2;
px.rgba.g += ((b1 >> 2) & 0x03) - 2;
px.rgba.b += ( b1 & 0x03) - 2;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
int b2 = bytes[p++];
int vg = (b1 & 0x3f) - 32;
px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
px.rgba.g += vg;
px.rgba.b += vg - 8 + (b2 & 0x0f);
}
else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
run = (b1 & 0x3f);
}
index[QOI_COLOR_HASH(px) % 64] = px;
}
// Write pixel values
pixels[px_pos + 0] = px.rgba.r;
pixels[px_pos + 1] = px.rgba.g;
pixels[px_pos + 2] = px.rgba.b;
if (channels == 4) {
pixels[px_pos + 3] = px.rgba.a;
}
}
return pixels;
}
// File handling functions if stdio is enabled
#ifndef QOI_NO_STDIO
// File I/O functions continued...
int qoi_write(const char *filename, const void *data, const qoi_desc *desc) {
FILE *f = fopen(filename, "wb");
int size, err;
void *encoded;
if (!f) {
return 0;
}
// Encode the pixel data into QOI format
encoded = qoi_encode(data, desc, &size);
if (!encoded) {
fclose(f);
return 0;
}
// Write the encoded data to file
fwrite(encoded, 1, size, f);
fflush(f);
err = ferror(f);
fclose(f);
QOI_FREE(encoded);
return err ? 0 : size;
}
void *qoi_read(const char *filename, qoi_desc *desc, int channels) {
FILE *f = fopen(filename, "rb");
int size, bytes_read;
void *pixels, *data;
if (!f) {
return NULL;
}
// Get file size
fseek(f, 0, SEEK_END);
size = ftell(f);
if (size <= 0 || fseek(f, 0, SEEK_SET) != 0) {
fclose(f);
return NULL;
}
// Read entire file into memory
data = QOI_MALLOC(size);
if (!data) {
fclose(f);
return NULL;
}
// Read file content and decode
bytes_read = fread(data, 1, size, f);
fclose(f);
pixels = (bytes_read != size) ? NULL : qoi_decode(data, bytes_read, desc, channels);
QOI_FREE(data);
return pixels;
}
#endif /* QOI_NO_STDIO */
#endif /* QOI_IMPLEMENTATION */
```
Read PNG (using `stb_image.h`) and convert the PNG file to QOI format (using `vec.s`).
`main.c`
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#define STB_IMAGE_IMPLEMENTATION
#define STBI_ONLY_PNG
#define STBI_NO_LINEAR
#include "stb_image.h"
#define QOI_IMPLEMENTATION
#include "qoi.h"
struct rgba_pixel {
unsigned char r, g, b, a;
};
void encode_pixels_rvv(unsigned char *out, const struct rgba_pixel *pixels, int n);
static unsigned char* read_file(const char* filename, size_t* size_out) {
FILE* f = fopen(filename, "rb");
if (!f) {
fprintf(stderr, "Failed to open %s: %s\n", filename, strerror(errno));
return NULL;
}
struct stat st;
if (fstat(fileno(f), &st) != 0) {
fprintf(stderr, "Failed to stat %s: %s\n", filename, strerror(errno));
fclose(f);
return NULL;
}
unsigned char* buffer = malloc(st.st_size);
if (!buffer) {
fprintf(stderr, "Failed to allocate %ld bytes\n", (long)st.st_size);
fclose(f);
return NULL;
}
size_t bytes_read = fread(buffer, 1, st.st_size, f);
if (bytes_read != (size_t)st.st_size) {
fprintf(stderr, "Failed to read file: expected %ld bytes, got %ld\n",
(long)st.st_size, (long)bytes_read);
free(buffer);
fclose(f);
return NULL;
}
fclose(f);
*size_out = st.st_size;
return buffer;
}
static int write_file(const char* filename, const unsigned char* data, size_t size) {
FILE* f = fopen(filename, "wb");
if (!f) {
fprintf(stderr, "Failed to create %s: %s\n", filename, strerror(errno));
return 0;
}
size_t written = fwrite(data, 1, size, f);
if (written != size) {
fprintf(stderr, "Failed to write file: expected %ld bytes, wrote %ld\n",
(long)size, (long)written);
fclose(f);
return 0;
}
fclose(f);
return 1;
}
int main(int argc, char **argv) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <input.png> <output.qoi>\n", argv[0]);
return 1;
}
printf("Reading input file: %s\n", argv[1]);
int width, height, channels;
if (!stbi_info(argv[1], &width, &height, &channels)) {
fprintf(stderr, "Failed to read PNG header: %s\n", stbi_failure_reason());
return 1;
}
printf("Image: %dx%d, %d channels\n", width, height, channels);
channels = 4; // Force RGBA
unsigned char *png_data = stbi_load(argv[1], &width, &height, NULL, channels);
if (!png_data) {
fprintf(stderr, "Failed to load PNG: %s\n", stbi_failure_reason());
return 1;
}
int pixel_count = width * height;
struct rgba_pixel *pixels = malloc(pixel_count * sizeof(struct rgba_pixel));
if (!pixels) {
fprintf(stderr, "Failed to allocate pixel buffer\n");
stbi_image_free(png_data);
return 1;
}
// Convert to RGBA struct format
for (int i = 0; i < pixel_count; i++) {
pixels[i].r = png_data[i * 4 + 0];
pixels[i].g = png_data[i * 4 + 1];
pixels[i].b = png_data[i * 4 + 2];
pixels[i].a = png_data[i * 4 + 3];
}
stbi_image_free(png_data);
unsigned char *processed = malloc(pixel_count * sizeof(struct rgba_pixel));
if (!processed) {
fprintf(stderr, "Failed to allocate processing buffer\n");
free(pixels);
return 1;
}
printf("Processing pixels with RVV...\n");
encode_pixels_rvv(processed, pixels, pixel_count);
qoi_desc desc = {
.width = width,
.height = height,
.channels = 4,
.colorspace = QOI_SRGB
};
int qoi_size;
void *qoi_data = qoi_encode(processed, &desc, &qoi_size);
if (!qoi_data) {
fprintf(stderr, "QOI encoding failed\n");
free(pixels);
free(processed);
return 1;
}
printf("Writing output file: %s\n", argv[2]);
if (!write_file(argv[2], qoi_data, qoi_size)) {
free(pixels);
free(processed);
free(qoi_data);
return 1;
}
free(pixels);
free(processed);
free(qoi_data);
printf("Conversion successful\n");
return 0;
}
```
RVV QOI encoder implementation
`Vec.s`
```c
# vec.S - RISC-V Vector Extension implementation for QOI encoding
# Register Conventions:
# a0 = output buffer pointer
# a1 = input pixel array pointer
# a2 = number of pixels to process
# t0 = remaining pixels counter
# t1 = vector length after vsetvli
# v0-v3 = RGB(A) components
# v4 = temporary calculations
# v8-v11 = previous pixel values for difference calculation
# v16 = pixel hash results
# v24 = run length detection mask
.text
.balign 4
.global encode_pixels_rvv
# void encode_pixels_rvv(unsigned char *out, const struct rgba_pixel *pixels, int n)
encode_pixels_rvv:
# Preserve return address and callee-saved registers
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
sw s1, 4(sp)
# Initialize our working registers
mv t0, a2 # Copy pixel count to counter
mv s0, a0 # Save output buffer pointer
mv s1, a1 # Save input pixel pointer
process_loop:
# Set vector length based on remaining pixels
vsetvli t1, t0, e8, ta, ma # 8-bit elements
# Load RGBA components using strided load
# Each component is 4 bytes apart in the struct
vlse8.v v0, (s1), x4 # Load R components
addi t2, s1, 1
vlse8.v v1, (t2), x4 # Load G components
addi t2, s1, 2
vlse8.v v2, (t2), x4 # Load B components
addi t2, s1, 3
vlse8.v v3, (t2), x4 # Load A components
# Calculate QOI hash: (r*3 + g*5 + b*7 + a*11) % 64
# First, multiply components by their coefficients
vwmulu.vx v4, v0, x3 # v4 = r * 3 (16-bit result)
vwmulu.vx v6, v1, x5 # v6 = g * 5
vwmulu.vx v8, v2, x7 # v8 = b * 7
vwmulu.vx v10, v3, x11 # v10 = a * 11
# Add all components together
vadd.vv v4, v4, v6 # Add g component
vadd.vv v4, v4, v8 # Add b component
vadd.vv v4, v4, v10 # Add a component
# Perform modulo 64 (using AND since 64 is power of 2)
vand.vi v16, v4, 63 # v16 contains final hash values
# Detect runs of identical pixels
# Compare each pixel with its predecessor
vmseq.vv v24, v0, v8 # Compare R components
vmand.vv v24, v24, v25 # AND with G comparison
vmand.vv v24, v24, v26 # AND with B comparison
vmand.vv v24, v24, v27 # AND with A comparison
# Store results
# We'll store the hash values and run detection mask for the C code to process
vse8.v v16, (s0) # Store hash values
addi t2, s0, t1
vse8.v v24, (t2) # Store run detection mask
# Calculate the number of bytes processed
slli t2, t1, 2 # Multiply vector length by 4 (RGBA)
add s1, s1, t2 # Update input pointer
add s0, s0, t1 # Update output pointer for hash values
add s0, s0, t1 # Update output pointer for run mask
# Update remaining pixel count
sub t0, t0, t1 # Decrease remaining elements
# Continue if there are more pixels
bnez t0, process_loop
# Restore registers and return
lw ra, 12(sp)
lw s0, 8(sp)
lw s1, 4(sp)
addi sp, sp, 16
ret
# Additional helper functions if needed
compute_differences:
# Compute differences between consecutive pixels
vsub.vv v4, v0, v8 # R differences
vsub.vv v5, v1, v9 # G differences
vsub.vv v6, v2, v10 # B differences
vsub.vv v7, v3, v11 # A differences
ret
detect_small_diffs:
# Check if differences are within small range (-2 to 1)
vmslt.vi v20, v4, 2 # Check upper bound for R
vmsgt.vi v21, v4, -3 # Check lower bound for R
vmand.vv v20, v20, v21 # Combine R bounds
# Repeat for G and B...
ret
```
## QOI decoder - 洪至謙
```c
//set array 0
#ifndef QOI_ZEROARR
#define QOI_ZEROARR(a) memset((a),0,sizeof(a))
#endif
#define QOI_OP_DIFF 0x40 /* 01xxxxxx */
#define QOI_OP_LUMA 0x80 /* 10xxxxxx */
#define QOI_OP_INDEX 0x00 /* 00xxxxxx */
#define QOI_OP_RUN 0xc0 /* 11xxxxxx */
#define QOI_OP_RGB 0xfe /* 11111110 */
#define QOI_OP_RGBA 0xff /* 11111111 */
#define QOI_MASK_2 0xc0 /* 11000000 */
#define QOI_COLOR_HASH(C) (C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11)
//malloc and free, but risc-v don't need to free
#ifndef QOI_MALLOC
#define QOI_MALLOC(sz) malloc(sz)
#define QOI_FREE(p) free(p)
#endif
static const unsigned char qoi_padding[8] = {0,0,0,0,0,0,0,1};
//similar with struct but using same memory. Ex. onlt exist int or struct at the same time.
typedef union {
struct { unsigned char r, g, b, a; } rgba;
unsigned int v;
} qoi_rgba_t;
// read image bytes and return abcd
static unsigned int qoi_read_32(const unsigned char *bytes, int *p) {
unsigned int a = bytes[(*p)++];
unsigned int b = bytes[(*p)++];
unsigned int c = bytes[(*p)++];
unsigned int d = bytes[(*p)++];
return a << 24 | b << 16 | c << 8 | d;
}
typedef struct {
unsigned int width;
unsigned int height;
unsigned char channels;
unsigned char colorspace;
} qoi_desc;
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
const unsigned char *bytes;
unsigned int header_magic;
unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px;
int px_len, chunks_len, px_pos;
int p = 0, run = 0;
//if NULL return
if (
data == NULL || desc == NULL ||
(channels != 0 && channels != 3 && channels != 4) ||
size < QOI_HEADER_SIZE + (int)sizeof(qoi_padding)
) {
return NULL;
}
//get image
bytes = (const unsigned char *)data;
//png format
header_magic = qoi_read_32(bytes, &p);
desc->width = qoi_read_32(bytes, &p);
desc->height = qoi_read_32(bytes, &p);
desc->channels = bytes[p++];
desc->colorspace = bytes[p++];
//if formate wrong, NULL
if (
desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
header_magic != QOI_MAGIC ||
desc->height >= QOI_PIXELS_MAX / desc->width
) {
return NULL;
}
if (channels == 0) {
channels = desc->channels;
}
//count pixels
px_len = desc->width * desc->height * channels;
pixels = (unsigned char *) QOI_MALLOC(px_len);
if (!pixels) {
return NULL;
}
QOI_ZEROARR(index);
px.rgba.r = 0;
px.rgba.g = 0;
px.rgba.b = 0;
px.rgba.a = 255;
//count chunk, get format
chunks_len = size - (int)sizeof(qoi_padding);
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
if (run > 0) {
run--;
}
else if (p < chunks_len) {
int b1 = bytes[p++];
if (b1 == QOI_OP_RGB) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
}
else if (b1 == QOI_OP_RGBA) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
px.rgba.a = bytes[p++];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
px = index[b1];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_DIFF) {
px.rgba.r += ((b1 >> 4) & 0x03) - 2;
px.rgba.g += ((b1 >> 2) & 0x03) - 2;
px.rgba.b += ( b1 & 0x03) - 2;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
int b2 = bytes[p++];
int vg = (b1 & 0x3f) - 32;
px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
px.rgba.g += vg;
px.rgba.b += vg - 8 + (b2 & 0x0f);
}
else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
run = (b1 & 0x3f);
}
index[QOI_COLOR_HASH(px) % 64] = px;
}
pixels[px_pos + 0] = px.rgba.r;
pixels[px_pos + 1] = px.rgba.g;
pixels[px_pos + 2] = px.rgba.b;
if (channels == 4) {
pixels[px_pos + 3] = px.rgba.a;
}
}
return pixels;
}
```
RISC-V
```ㄏ
# void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
# define data a0
# define size a1
# define desc a2
# define channels a3
# define pixels s4
.data
QOI_OP_DIFF: .word 0x40 # QOI_OP_DIFF
QOI_OP_LUMA: .word 0x80 # QOI_OP_LUMA
QOI_OP_INDEX: .word 0x00 # QOI_OP_INDEX
QOI_OP_RUN: .word 0xc0 # QOI_OP_RUN
QOI_OP_RGB: .word 0xfe # QOI_OP_RGB
QOI_OP_RGBA: .word 0xff # QOI_OP_RGBA
QOI_MASK_2: .word 0xc0 # QOI_MASK_2
.align 4
qoi_padding:
.byte 0, 0, 0, 0, 0, 0, 0, 1
.bss
index: .space 256 # index[64]
.text
qoi_decode:
#data in a0
li s5 2147483647
li a1 1234
mv t1,a1 #size
# header_magic = qoi_read_32(t2, &p);
jal qoi_read_32 # oi_read_32
mv t1, a0 # result a0 to t1
mv t2,a2 #desc
# desc->width = qoi_read_32(t2, &p);
jal qoi_read_32 # oi_read_32
sw a0, 0(t2) # store a0 to 0(t2)
mv t2,a2 #desc
# desc->height = qoi_read_32(t2, &p);
jal qoi_read_32 # qoi_read_32
sw a0, 4(t2) # store a0 to 4(t2)
# desc->channels = t2[p++];
lb t0, 0(t5) # load t2[p]
addi t5, t5, 1 # p++
sb t0, 8(t2) # store to desc->channels
# desc->colorspace = t2[p++];
lb t0, 0(t5) # load t2[p]
addi t5, t5, 1 # p++
sb t0, 12(t2) # store to desc->colorspace
# c=if (channels == 0) { channels = desc->channels; }
mv t3,a3 #channels
addi sp, sp, -4
auipc ra, 0
sw ra, 0(sp)
beqz t3, set_channels # if t3 (channels) == 0, j set_channels
lw ra, 0(sp)
addi sp, sp, 4
#px_len = desc->width * desc->height * channels;
lw t0, 0(t2) # t0 = desc->width
lw t1, 4(t2) # t1 = desc->height
lw t2, 8(t2) # t2 = desc->channels
t2
mul t3, t0, t1 # t3 = width * height
mul t3, t3, t2 # t3 = (width * height) * channels
sw t3, 0(a0) # px_len = t3
mv a1,t3
jal QOI_MALLOC
mv s4, a0 #store pixels' address in s4
#s0=px.rgba.r,s1=px.rgba.g,s2=px.rgba.b,s3=px.rgba.a
li s0,0t2ize - sizeof(qoi_padding)
li t1, 0 # px_pos = 0
loop_start:
bge t1, t3, loop_end # if px_pos >= px_len, end
li a4, 0 # run = 0
beqz a4, process_chunks # run == 0, process_chunks
addi a4, a4, -1 # run--
j loop_continue
process_chunks:
# b1 = byte[p++]
lb t1, 0(s5)
addi p, p, 1 # p++
li t2, QOI_OP_RGB
beq t1, t2, handle_rgb # if b1 == QOI_OP_RGB, handle_rgb
li t2, QOI_OP_RGBA
beq t1, t2, handle_rgba # if b1 == QOI_OP_RGBA, handle_rgba
li t2, QOI_MASK_2
and t3, t1, t2
li t4, QOI_OP_INDEX
beq t3, t4, handle_index # if (b1 & QOI_MASK_2) == QOI_OP_INDEX, handle_index
li t4, QOI_OP_DIFF
beq t3, t4, handle_diff # if (b1 & QOI_MASK_2) == QOI_OP_DIFF, handle_diff
li t4, QOI_OP_LUMA
beq t3, t4, handle_luma # if (b1 & QOI_MASK_2) == QOI_OP_LUMA, handle_luma
li t4, QOI_OP_RUN
beq t3, t4, handle_run # if (b1 & QOI_MASK_2) == QOI_OP_RUN, handle_run
j main_loop
main_loop:
jal QOI_COLOR_HASH #return t4
li t3,64
rem t2,t2,t3 #t2,hash_px%64
slli t2,t2,2 #t2*=4
add t3,t1,t2 #t3 = index + offest
sw t0,0(index)
j loop_continue
loop_continue:
sw s0, 0(pixels) # px.rgba.r
sw s1, 1(pixels) # px.rgba.g
sw s2, 2(pixels) # px.rgba.b
beqz a1, skip_alpha # if channels == 3, skip alpha
sw s3, 3(pixels) # px.rgba.a
skip_alpha:
addi t0, t0, a1 # px_pos += channels
j loop_start
loop_end:
ret
set_channels:
lw ra, 0(sp)
lw t3, 8(t2) # desc->channels to t0 (channels)
jr ra
QOI_COLOR_HASH:
li s0, 1
li s1, 2
li s2, 3
li s3, 4
li t0, 3
mul t1, s0, t0 # t0 = C.rgba.r * 3
li t0, 5
mul t2, s1, t0 # t1 = C.rgba.g * 5
li t0, 7
mul t3, s2, t0 # t2 = C.rgba.b * 7
li t0, 11
mul t4, s3, t0 # t3 = C.rgba.a * 11
add t2, t2, t1
add t3, t3, t2
add t4, t4, t3
ret
QOI_MALLOC:
li a7, 214 # sbrk
mv a0, a1 # a1=sz
ecall
ret
# QOI_FREE:
# ret
qoi_read_32:
# li a0 00000040
addi sp, sp, -16
sw ra, 12(sp) # ra
sw t5, 8(sp) # p
mv t5, s5 # p = a0
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t1, t0 # q
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t2, t0 # b
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t3, t0 # c
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t4, t0 # d
slli t1, t1, 24 # a << 24
slli t2, t2, 16 # b << 16
slli t3, t3, 8 # c << 8
or t1, t1, t2 # a << 24 | b << 16
or t1, t1, t3 # a << 24 | b << 16 | c << 8
or t1, t1, t4 # restore p
addi sp, sp, 16
ret
handle_rgb:
lb s0, 0(s5) # px.rgba.r = bytes[p++]
addi a2, a2, 1
lb s1, 0(s5) # px.rgba.g = bytes[p++]
addi a2, a2, 1
lb s2, 0(s5) # px.rgba.b = bytes[p++]
addi a2, a2, 1
j main_loop
handle_rgba:
lb s0, 0(s5) # px.rgba.r = bytes[p++]
addi a2, a2, 1
lb s1, 0(s5) # px.rgba.g = bytes[p++]
addi a2, a2, 1
lb s2, 0(s5) # px.rgba.b = bytes[p++]
addi a2, a2, 1
lb s3, 0(s5) # px.rgba.a = bytes[p++]
addi a2, a2, 1
j main_loop
handle_index:
slli t0, t3, 2 # t0 = b1 * 4
add t1, index, t0 # t1 = index
lw t0, 0(t1) # index[b1]
mv s0, t0 # px = index[b1]
j main_loop
handle_diff:
srli t0, t3, 4 # (b1 >> 4) & 0x03
andi t0, t0, 0x03
addi s0, s0, -2 # px.rgba.r += ((b1 >> 4) & 0x03) - 2
srli t1, t3, 2 # (b1 >> 2) & 0x03
andi t1, t1, 0x03
addi s1, s1, -2 # px.rgba.g += ((b1 >> 2) & 0x03) - 2
andi t2, t3, 0x03 # b1 & 0x03
addi s2, s2, -2 # px.rgba.b += (b1 & 0x03) - 2
j main_loop
handle_luma:
lb t0, 0(s5) # b2 = bytes[p++]
addi a2, a2, 1
andi t1, t3, 0x3f # vg = (b1 & 0x3f) - 32
addi t1, t1, -32
srli t2, t0, 4 # (b2 >> 4) & 0x0f
andi t2, t2, 0x0f
sub t3, t1, t2
addi s0, s0, -8 # px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f)
add s1, s1, t1 # px.rgba.g += vg
andi t2, t0, 0x0f # b2 & 0x0f
sub t3, t1, t2
addi s2, s2, -8 # px.rgba.b += vg - 8 + (b2 & 0x0f)
j main_loop
handle_run:
andi t0, t3, 0x3f # run = b1 & 0x3f
mv a4, t0 # run to s4
j main_loop
```
RISC-V with vector extension
```ㄏ
# Vector registers used:
# v0-v3: RGBA components
# v4: temporary calculations
# v8-v11: for index operations
.data
QOI_OP_DIFF: .word 0x40
QOI_OP_LUMA: .word 0x80
QOI_OP_INDEX: .word 0x00
QOI_OP_RUN: .word 0xc0
QOI_OP_RGB: .word 0xfe
QOI_OP_RGBA: .word 0xff
QOI_MASK_2: .word 0xc0
.align 4
qoi_padding:
.byte 0, 0, 0, 0, 0, 0, 0, 1
.bss
index: .space 256 # index[64]
.text
qoi_decode:
# Configure vector unit
li t0, 32 # Set vector length to 32 bytes
vsetvli t1, t0, e8, m1 # 8-bit elements, single vector register
# Save original arguments
mv s5, a0 # Save data pointer
mv s6, a1 # Save size
mv s7, a2 # Save desc pointer
mv s8, a3 # Save channels
# Read header as before
jal qoi_read_32
mv t1, a0
# Process pixels in vector mode
process_pixels_vector:
# Load multiple pixels into vector registers
vle8.v v0, (s4) # Load R components
vle8.v v1, (s4) # Load G components
vle8.v v2, (s4) # Load B components
vle8.v v3, (s4) # Load A components
handle_diff_vector:
# Vector version of diff handling
vand.vi v4, v0, 0x3f # Mask for diff
vsub.vi v4, v4, 2 # Subtract 2
vadd.vv v0, v0, v4 # Add to red channel
vand.vi v4, v1, 0x3f
vsub.vi v4, v4, 2
vadd.vv v1, v1, v4 # Add to green channel
vand.vi v4, v2, 0x3f
vsub.vi v4, v4, 2
vadd.vv v2, v2, v4 # Add to blue channel
QOI_COLOR_HASH_vector:
# Vector version of color hash
vmul.vi v8, v0, 3 # r * 3
vmul.vi v9, v1, 5 # g * 5
vmul.vi v10, v2, 7 # b * 7
vmul.vi v11, v3, 11 # a * 11
vadd.vv v8, v8, v9 # Add components
vadd.vv v8, v8, v10
vadd.vv v8, v8, v11
# Store results back
vse8.v v0, (s4) # Store R components
vse8.v v1, (s4) # Store G components
vse8.v v2, (s4) # Store B components
vse8.v v3, (s4) # Store A components
process_chunks_vector:
# Load chunk of bytes into vector register
vsetvli t0, a1, e8, m1 # Set vector length for byte operations
vle8.v v4, (s5) # Load chunk of bytes
# Check for different opcodes in parallel
vandi.v v5, v4, 0xC0 # Apply QOI_MASK_2 to all elements
# Create masks for different opcodes
vmseq.vi v6, v4, 0xFE # Mask for QOI_OP_RGB
vmseq.vi v7, v4, 0xFF # Mask for QOI_OP_RGBA
vmseq.vi v8, v5, 0x00 # Mask for QOI_OP_INDEX
vmseq.vi v9, v5, 0x40 # Mask for QOI_OP_DIFF
vmseq.vi v10, v5, 0x80 # Mask for QOI_OP_LUMA
vmseq.vi v11, v5, 0xC0 # Mask for QOI_OP_RUN
# Handle RGB chunks
vcompress.vm v12, v0, v6 # Gather RGB chunks
vrgather.vv v0, v12, v6 # Load R components
vrgather.vv v1, v12, v6 # Load G components
vrgather.vv v2, v12, v6 # Load B components
# Handle RGBA chunks
vcompress.vm v12, v0, v7 # Gather RGBA chunks
vrgather.vv v0, v12, v7 # Load R components
vrgather.vv v1, v12, v7 # Load G components
vrgather.vv v2, v12, v7 # Load B components
vrgather.vv v3, v12, v7 # Load A components
# Handle INDEX chunks
vcompress.vm v12, v0, v8 # Gather INDEX chunks
vsll.vi v13, v12, 2 # Multiply by 4 for index lookup
vluxei8.v v14, (index), v13 # Load from index array
# Handle DIFF chunks (similar to original but vectorized)
vcompress.vm v12, v0, v9
vsra.vi v15, v12, 4 # (b1 >> 4) & 0x03
vand.vi v15, v15, 0x03
vsub.vi v15, v15, 2 # -2
vadd.vv v0, v0, v15 # Add to R
vsra.vi v15, v12, 2 # (b1 >> 2) & 0x03
vand.vi v15, v15, 0x03
vsub.vi v15, v15, 2
vadd.vv v1, v1, v15 # Add to G
vand.vi v15, v12, 0x03 # b1 & 0x03
vsub.vi v15, v15, 2
vadd.vv v2, v2, v15 # Add to B
# Continue to main_loop_vector
main_loop_vector:
# Vector version of main processing loop
vsetvli t0, a1, e8, m1 # Set vector length based on remaining pixels
vle8.v v0, (s5) # Load chunk of pixels
# Process op codes in vector mode
vand.vi v4, v0, 0xc0 # Mask for op codes
vmseq.vi v0, v4, 0x40 # Check for QOI_OP_DIFF
vmseq.vi v1, v4, 0x80 # Check for QOI_OP_LUMA
vmseq.vi v2, v4, 0x00 # Check for QOI_OP_INDEX
# Parallel processing based on op codes
vrgather.vi v8, v0, 0 # Gather DIFF operations
vrgather.vi v9, v1, 0 # Gather LUMA operations
vrgather.vi v10, v2, 0 # Gather INDEX operations
# Continue with the rest of the decoder logic
ret
# Helper functions remain mostly unchanged
qoi_read_32:
addi sp, sp, -16
sw ra, 12(sp) # save ra
sw t5, 8(sp) # save p
mv t5, s5 # p = data pointer
# Could potentially use vector load for 4 bytes at once
# but keeping scalar for header reading since it's not performance critical
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t1, t0 # q = first byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t2, t0 # b = second byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t3, t0 # c = third byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t4, t0 # d = fourth byte
# Combine bytes into 32-bit value
slli t1, t1, 24 # q << 24
slli t2, t2, 16 # b << 16
slli t3, t3, 8 # c << 8
or t1, t1, t2 # combine q and b
or t1, t1, t3 # combine with c
or t1, t1, t4 # combine with d to get final 32-bit value
mv a0, t1 # move result to return register
# Restore stack and return
lw ra, 12(sp)
lw t5, 8(sp)
addi sp, sp, 16
ret
QOI_MALLOC:
li a7, 214 # sbrk syscall number
mv a0, a1 # move size to a0
ecall # make syscall
ret # return allocated address in a0
```
## RISC-V Vector Extension in 32-bit for the encoder and decoder in Quite OK Image Format
### Enhance version
## Environment Installation
#### **Careful! This method is only applicable to version 22.04 or above. Otherwise, you might end up wasting several days installing it on version 20.04, losing a lot of time.(Just like me QQ by至謙)**
0. If your environment is brand new, you must first update it to get the required prerequisites.
Update
```bash
sudo apt update && sudo apt upgrade -y
```
Install some prerequisites (Ubuntu)
```bash
sudo apt-get install autoconf automake autotools-dev curl python3 python3-pip python3-tomli libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev
```
1. Clone the riscv-gnu-toolchain repo from the official GitHub repo.
```bash
mkdir tool && cd tool
git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git --recursive
```
2. Enter the folder `riscv-gnu-toolchain`, create and enter a folder called `build` and then configure what we need to compile in the makefile.
```bash
cd riscv-gnu-toolchain
mkdir build && cd build
../configure --prefix=$HOME/riscv-gnu-toolchain/build --with-arch=rv32gcv --with-abi=ilp32d --enable-multilib
```
3. Start to compile the 32-bit RISC-V GNU Toolchain
:::warning
This step takes a while on my shabby ASUS Mini PN41. (Around 3 hours)
:::
```bash
make
```
4. Setting up the path permanently
```bash
echo 'export PATH=$PATH:~/riscv-gnu-toolchain/build/bin' >> ~/.bashrc
source ~/.bashrc
```
5. Now, return to the folder `riscv-gnu-toolchain` and compile qemu which is compatible to the riscv-gnu-toolchain.
:::warning
Please keep working in the directory `where_you_git_clone_repo/riscv-gnu-toolchain/build`, or you will suffer from failure.
:::
```bash
make build-qemu
```
6. Here is the test file for the vector
vector-test.s
```c
.section .text
.global _start
_start:
# Initialize stack pointer
lui sp, %hi(stack_top)
addi sp, sp, %lo(stack_top)
# Print start message
lui a0, %hi(msg_start)
addi a0, a0, %lo(msg_start)
jal print_string
# Initialize vector configuration with explicit configuration
vsetvli t0, x0, e32, m1, ta, ma # SEW=32, LMUL=1, tail agnostic, mask agnostic
# Load vector register v0 with data
la a0, vector_data
vle32.v v0, (a0) # Load 32-bit elements into v0
# Add 1 to each element
li t0, 1
vadd.vx v0, v0, t0 # Add scalar t0 to each element
# Store result back to memory
la a0, vector_result
vse32.v v0, (a0) # Store 32-bit elements from v0
# Print results
la s0, vector_result # Load result address
li s1, 4 # Counter for 4 numbers
print_loop:
# Print "Result: "
lui a0, %hi(msg_result)
addi a0, a0, %lo(msg_result)
jal print_string
# Load and print original value
lui a0, %hi(msg_orig)
addi a0, a0, %lo(msg_orig)
jal print_string
la t0, vector_data
la t2, vector_result # Load address of vector_result
sub t1, s0, t2 # Now subtract registers
add t0, t0, t1
lw a0, 0(t0)
jal print_num
# Print arrow
lui a0, %hi(msg_arrow)
addi a0, a0, %lo(msg_arrow)
jal print_string
# Load and print result value
lw a0, 0(s0)
jal print_num
# Print newline
lui a0, %hi(msg_newline)
addi a0, a0, %lo(msg_newline)
jal print_string
# Move to next number
addi s0, s0, 4
addi s1, s1, -1
bnez s1, print_loop
# Print completion message
lui a0, %hi(msg_done)
addi a0, a0, %lo(msg_done)
jal print_string
# Exit success
li a0, 0
li a7, 93
ecall
# Print string function - expects pointer in a0
print_string:
addi sp, sp, -4
sw ra, 0(sp)
mv t0, a0
1: lbu t1, 0(t0)
beqz t1, 2f
addi t0, t0, 1
j 1b
2: sub t0, t0, a0
mv a2, t0
mv a1, a0
li a0, 1
li a7, 64
ecall
lw ra, 0(sp)
addi sp, sp, 4
ret
# Print number function - expects number in a0
print_num:
addi sp, sp, -20
sw ra, 16(sp)
sw s0, 12(sp)
sw s1, 8(sp)
sw s2, 4(sp)
sw s3, 0(sp)
mv s0, a0 # Save original number
li s1, 10 # Divisor
mv s2, sp # Buffer pointer
# Handle negative numbers
bgez s0, positive
neg s0, s0
li t0, '-'
li a0, 1
mv a1, sp
sb t0, 0(a1)
li a2, 1
li a7, 64
ecall
positive:
# Convert number to string (backwards)
mv t0, s2 # Current buffer position
digit_loop:
rem t1, s0, s1 # Get remainder (current digit)
addi t1, t1, '0' # Convert to ASCII
sb t1, 0(t0) # Store digit
addi t0, t0, 1 # Move buffer pointer
div s0, s0, s1 # Divide number by 10
bnez s0, digit_loop
# Print the number
mv a1, s2 # Buffer start
sub a2, t0, s2 # Calculate length
li a0, 1
li a7, 64
ecall
# Restore registers and return
lw ra, 16(sp)
lw s0, 12(sp)
lw s1, 8(sp)
lw s2, 4(sp)
lw s3, 0(sp)
addi sp, sp, 20
ret
.section .rodata
msg_start:
.string "Starting vector test...\n"
msg_done:
.string "\nVector operations completed.\n"
msg_result:
.string "Element "
msg_orig:
.string "Original: "
msg_arrow:
.string " -> Result: "
msg_newline:
.string "\n"
.section .data
.align 4
vector_data:
.word 1, 2, 3, 4 # Pre-initialized input data
.align 4
vector_result:
.word 0, 0, 0, 0 # Space for results
.section .bss
.align 4
.space 4096 # Stack
stack_top:
```
7. Compile it
```bash
riscv32-unknown-elf-as -march=rv32gcv_zba vector-test.s -o vector-test.o
riscv32-unknown-elf-ld -nostdlib vector-test.o -o vector-test
```
8. Now! It's time to witness the miracle.
```bash
qemu-riscv32 -cpu rv32,v=true,zba=true,vlen=128 ./vector-test
```
9. Result
```bash
Starting vector test...
Element Original: 1 -> Result: 2
Element Original: 2 -> Result: 3
Element Original: 3 -> Result: 4
Element Original: 4 -> Result: 5
Vector operations completed.
```
## PNG to binary
### What is PNG?
PNG (Portable Network Graphics) is a raster graphics file format that uses lossless compression, designed to replace the older GIF format.
#### PNG Format
- 1-bit: Black-and-white images.
- 2/4/8-bit: Palette-based images (up to 256 colors).
- 24-bit: True color images (16,777,216 colors per pixel).
- 32-bit: True color + alpha channel (supports transparency).
#### Structure of a PNG File
- File Header (Signature) = 89 50 4E 47 0D 0A 1A 0A
- PNG files are composed of multiple chunks, each including:
- Length (4 bytes): Length of the chunk's data.
- Type (4 bytes): Name of the chunk (e.g., IHDR).
- Data (variable length): Actual content of the chunk.
- CRC (4 bytes): Checksum for verifying data integrity.
- Key Chunk Types:
- IHDR (Image Header):
- Defines basic attributes like width, height, color depth, compression method, etc.
- PLTE (Palette):
- Palette data (used for palette-based images only).
- IDAT (Image Data):
- Contains compressed pixel data. Multiple IDAT chunks can be present.
- IEND (Image End):
- Marks the end of the file, with no data.
Here is the python script to transform the PNG file to binary file.
```python
import numpy as np
from PIL import Image
import struct
import argparse
def png_to_binary(input_png, output_binary):
"""
Convert a PNG file to a binary format with the following structure:
- First 4 bytes: width (int32)
- Next 4 bytes: height (int32)
- Next 1 byte: channels (uint8)
- Remaining bytes: pixel data in row-major order
"""
try:
# Open and read the PNG file
with Image.open(input_png) as img:
# Convert to RGB or RGBA if not already
if img.mode not in ['RGB', 'RGBA']:
img = img.convert('RGB')
# Get image dimensions and channel count
width, height = img.size
channels = len(img.getbands()) # 3 for RGB, 4 for RGBA
# Convert image to numpy array
img_array = np.array(img)
# Write to binary file
with open(output_binary, 'wb') as f:
# Write header information
f.write(struct.pack('>I', width)) # Big-endian uint32
f.write(struct.pack('>I', height)) # Big-endian uint32
f.write(struct.pack('B', channels)) # uint8
# Write pixel data
# Flatten array and ensure correct byte order
img_array.astype(np.uint8).tobytes('C')
f.write(img_array.tobytes())
return True, f"Successfully converted {input_png} to {output_binary}"
except FileNotFoundError:
return False, f"Error: Input file {input_png} not found"
except Exception as e:
return False, f"Error during conversion: {str(e)}"
def binary_to_png(input_binary, output_png):
"""
Convert our binary format back to PNG to verify the conversion worked correctly.
"""
try:
with open(input_binary, 'rb') as f:
# Read header
width = struct.unpack('>I', f.read(4))[0]
height = struct.unpack('>I', f.read(4))[0]
channels = struct.unpack('B', f.read(1))[0]
# Read pixel data
mode = 'RGBA' if channels == 4 else 'RGB'
size = width * height * channels
data = f.read(size)
# Convert to numpy array and reshape
img_array = np.frombuffer(data, dtype=np.uint8)
img_array = img_array.reshape((height, width, channels))
# Create and save image
img = Image.fromarray(img_array, mode)
img.save(output_png)
return True, f"Successfully converted {input_binary} to {output_png}"
except FileNotFoundError:
return False, f"Error: Input file {input_binary} not found"
except Exception as e:
return False, f"Error during conversion: {str(e)}"
def main():
parser = argparse.ArgumentParser(description='Convert between PNG and binary format')
parser.add_argument('input_file', help='Input file path')
parser.add_argument('output_file', help='Output file path')
parser.add_argument('--to-png', action='store_true',
help='Convert from binary to PNG (default is PNG to binary)')
args = parser.parse_args()
if args.to_png:
success, message = binary_to_png(args.input_file, args.output_file)
else:
success, message = png_to_binary(args.input_file, args.output_file)
print(message)
return 0 if success else 1
if __name__ == "__main__":
main()
```
Here is how you can use this python script.
```python
python3 png2bin.py "input_file_name" "output_file_name"
```
Example
```python
python3 png2bin.py A.png A.bin
```
## QOI format Verify
Here is an online QOI viewer. You may drag and drop a QOI format image to test the result.
[QOI Viewer - The Brain Dump - GitHub Pages](https://floooh.github.io/qoiview/qoiview.html)
Original Page

Drag and Drop the image `dice.qoi` from `floooh`'s GitHub repository.

You may download the QOI format image from `floooh`'s GitHub using the following link.
[qoiviewer](https://github.com/floooh/qoiview/tree/main)
## Reference
1. [QOI official website](https://qoiformat.org/)
2. [QOI specification](https://qoiformat.org/qoi-specification.pdf)
3. [Building an ENCODER for the "Quite OK Image Format" (QOI) - from YT](https://www.youtube.com/watch?v=GgsRQuGSrc0)
4. [Building an DECODER for QOI Images (Quite OK Image Format)](https://www.youtube.com/watch?v=5bWopQj-oQs&list=PLP29wDx6QmW4bMSK8a7rZnhPo4pjqVdQT)
5. [RVV in QEMU setting tutorial](https://github.com/brucehoult/rvv_example)
6. [libpng_rvv: A RISC-V Vector Optimized libpng](https://github.com/mschlaegl/libpng_rvv-doc)
7. [Simple RISC-V Vector example in 64 bit](https://github.com/brucehoult/rvv_example)
8. [QOI Viewer - The Brain Dump - GitHub Pages](https://floooh.github.io/qoiview/qoiview.html)
9. [qoiview](https://github.com/floooh/qoiview/tree/main)
10. [riscv-gnu-toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain)