洪至謙, 曾遠哲
The RISC-V Vector Extension is a key component of the RISC-V instruction set architecture, providing efficient vector computation capabilities.
vsetvli
instruction.QOI (Quite OK Image Format) is a lossless image composition format designed with simplicity and speed.
Speed: Offers significantly faster encoding and decoding compared to stb_image_write(20x-50x) and stb_image(3x-4x).
Supports RGB and RGBA: Handles images with and without an alpha channel.
qoi_header {
char magic[4]; // magic bytes "qoif"
uint32_t width; // image width in pixels (BE)
uint32_t height; // image height in pixels (BE)
uint8_t channels; // 3 = RGB, 4 = RGBA
uint8_t colorspace; // 0 = sRGB with linear alpha
// 1 = all channels linear
};
Images are encoded by
Encoder/Decoder start with {r:0, g:0, b:0, a:255}
as previous pixel value.
When all pixels within the
Pixels are encoded as
r,g,b
r,g,b
or r,g,b,a
valuesNote: the color channels are assumed not to be premultiplied with the alpha channel.
This is a simple Hash algorithm that minimizes the Hash collision.
Every chunk starts with a 2/8-bit tag, followed by some data bits.
All chunks are byte aligned (The bit length of chunks is divisible by 8.)
All data bits' MSB are on the left.
The 8-bit tags have precedence over the 2-bit tags.
(A decoder must check for the presence of an 8-bit tag first.)
Reduce the indention.
Finished, please check it out. Thank you.
QOI_OP_RGB
Byte[0] | Byte[1] | Byte[2] | Byte[3] |
---|---|---|---|
7 6 5 4 3 2 1 0 | |||
1 1 1 1 1 1 1 0 | red | green | blue |
QOI_OP_RGBA
Byte[0] | Byte[1] | Byte[2] | Byte[3] | Byte[4] |
---|---|---|---|---|
7 6 5 4 3 2 1 0 | ||||
1 1 1 1 1 1 1 1 | red | green | blue | alpha |
Byte[0] | - | - | - | - | - | - | - |
---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 0 | index | - | - | - | - | - |
Byte[0] | - | - | - | - | - | - | - |
---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | dr | - | dg | - | db | - |
2-bit tag b01
2-bit red channel difference from the previous pixel -2..1
2-bit green channel difference from the previous pixel -2..1
2-bit blue channel difference from the previous pixel -2..1
The difference to the current channel values are using a wraparound operation.
E.g.:
1 - 2 -> 255
255 + 1 -> 0
Values are stored as unsigned integers with a bias of 2.
E.g.:
-2 -> 0 (b00)
1 -> 3 (b11)
Byte[0] | - | - | - | - | - | - | - |
---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 0 | diff green | - | - | - | - | - |
Byte[1] | - | - | - | - | - | - | - |
---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
dr-dg | - | - | - | db-dg | - | - | - |
2-bit tag b10
6-bit green channel difference from the previous pixel -32..31
4-bit red channel difference minus green channel difference -8..7
4-bit blue channel difference minus green channel difference -8..7
The green
channel
The red
and blue
channels (dr
and db
) base their differences on the green channel difference.
I.e.:
dr_dg = (cur_px.r - prev_px.r) - (cur_px.g - prev_px.g)
db_dg = (cur_px.b - prev_px.b) - (cur_px.g - prev_px.g)
The difference to the current channel values are using a wraparound operation.
E.g.:
10 - 13 -> 253
250 + 7 -> 1
Values are stored as unsigned integers with a bias of 32 for the green channel and a bias of 8 for the red and blue channel.
QOI_OP_RUN:
Byte[0] | - | - | - | - | - | - | - |
---|---|---|---|---|---|---|---|
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 1 | index | - | - | - | - | - |
The run-length is stored with a bias of -1.
Note that the runlengths 63 and 64 (b111110 and b111111) are illegal as they are occupied by the QOI_OP_RGB
and QOI_OP_RGBA
tags.
The following codes is untested.
The program cannot read the binary file from the QEMU emulator. Not so sure why but I am using the user mode of QEMU instead of system mode. That is, there shall not be a total isolated hardware to separate the environments.
qoi.h
// qoi.h
#ifndef QOI_H
#define QOI_H
#ifdef __cplusplus
extern "C" {
#endif
#define QOI_SRGB 0 // Standard RGB colorspace with linear alpha
#define QOI_LINEAR 1 // All channels are linear
// Description of the image - width, height, channels, and colorspace
typedef struct {
unsigned int width;
unsigned int height;
unsigned char channels; // 3 = RGB, 4 = RGBA
unsigned char colorspace; // 0 = sRGB, 1 = linear
} qoi_desc;
// Core encoding function: converts raw pixels to QOI format
void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len);
// Core decoding function: converts QOI format back to raw pixels
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
// File handling convenience functions
int qoi_write(const char *filename, const void *data, const qoi_desc *desc);
void *qoi_read(const char *filename, qoi_desc *desc, int channels);
#ifdef __cplusplus
}
#endif
#endif // QOI_H
#ifdef QOI_IMPLEMENTATION
// Include necessary headers
#include <stdlib.h>
#include <string.h>
// If stdio functions are needed
#ifndef QOI_NO_STDIO
#include <stdio.h>
#endif
// Allow custom memory management
#ifndef QOI_MALLOC
#define QOI_MALLOC(sz) malloc(sz)
#define QOI_FREE(p) free(p)
#endif
// Allow custom array zeroing
#ifndef QOI_ZEROARR
#define QOI_ZEROARR(a) memset((a),0,sizeof(a))
#endif
// Chunk type tags
#define QOI_OP_INDEX 0x00 // 00xxxxxx - 6-bit index into color array
#define QOI_OP_DIFF 0x40 // 01xxxxxx - 2-bit RGB channel differences
#define QOI_OP_LUMA 0x80 // 10xxxxxx - Larger RGB differences
#define QOI_OP_RUN 0xc0 // 11xxxxxx - Run of pixels
#define QOI_OP_RGB 0xfe // 11111110 - Full RGB values
#define QOI_OP_RGBA 0xff // 11111111 - Full RGBA values
#define QOI_MASK_2 0xc0 // Mask for 2-bit tag
// Hash function for the color index array
#define QOI_COLOR_HASH(C) (C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11)
// Magic bytes for file identification
#define QOI_MAGIC \
(((unsigned int)'q') << 24 | ((unsigned int)'o') << 16 | \
((unsigned int)'i') << 8 | ((unsigned int)'f'))
#define QOI_HEADER_SIZE 14
// Maximum image size (400 million pixels) for safety
#define QOI_PIXELS_MAX ((unsigned int)400000000)
// Union for RGBA pixel manipulation
typedef union {
struct { unsigned char r, g, b, a; } rgba;
unsigned int v;
} qoi_rgba_t;
// End-of-stream marker
static const unsigned char qoi_padding[8] = {0,0,0,0,0,0,0,1};
// Helper functions for handling 32-bit values
static void qoi_write_32(unsigned char *bytes, int *p, unsigned int v) {
bytes[(*p)++] = (0xff000000 & v) >> 24;
bytes[(*p)++] = (0x00ff0000 & v) >> 16;
bytes[(*p)++] = (0x0000ff00 & v) >> 8;
bytes[(*p)++] = (0x000000ff & v);
}
static unsigned int qoi_read_32(const unsigned char *bytes, int *p) {
unsigned int a = bytes[(*p)++];
unsigned int b = bytes[(*p)++];
unsigned int c = bytes[(*p)++];
unsigned int d = bytes[(*p)++];
return a << 24 | b << 16 | c << 8 | d;
}
// The core encoding function
void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
int i, max_size, p, run;
int px_len, px_end, px_pos, channels;
unsigned char *bytes;
const unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px, px_prev;
// Validate input parameters
if (data == NULL || out_len == NULL || desc == NULL ||
desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
desc->height >= QOI_PIXELS_MAX / desc->width)
{
return NULL;
}
// Calculate maximum possible size
max_size = desc->width * desc->height * (desc->channels + 1) +
QOI_HEADER_SIZE + sizeof(qoi_padding);
// Allocate output buffer
bytes = (unsigned char *) QOI_MALLOC(max_size);
if (!bytes) {
return NULL;
}
// Write file header
p = 0;
qoi_write_32(bytes, &p, QOI_MAGIC);
qoi_write_32(bytes, &p, desc->width);
qoi_write_32(bytes, &p, desc->height);
bytes[p++] = desc->channels;
bytes[p++] = desc->colorspace;
// Initialize encoding state
pixels = (const unsigned char *)data;
QOI_ZEROARR(index);
run = 0;
px_prev.rgba.r = 0;
px_prev.rgba.g = 0;
px_prev.rgba.b = 0;
px_prev.rgba.a = 255;
px = px_prev;
// Calculate pixel parameters
px_len = desc->width * desc->height * desc->channels;
px_end = px_len - desc->channels;
channels = desc->channels;
// Main encoding loop
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
// Read pixel values
px.rgba.r = pixels[px_pos + 0];
px.rgba.g = pixels[px_pos + 1];
px.rgba.b = pixels[px_pos + 2];
if (channels == 4) {
px.rgba.a = pixels[px_pos + 3];
}
// Check for run of identical pixels
if (px.v == px_prev.v) {
run++;
if (run == 62 || px_pos == px_end) {
bytes[p++] = QOI_OP_RUN | (run - 1);
run = 0;
}
}
else {
// End any current run
if (run > 0) {
bytes[p++] = QOI_OP_RUN | (run - 1);
run = 0;
}
// Check index for previously seen pixel
int index_pos = QOI_COLOR_HASH(px) % 64;
if (index[index_pos].v == px.v) {
bytes[p++] = QOI_OP_INDEX | index_pos;
}
else {
// Store pixel in index
index[index_pos] = px;
// Check if we can encode a small difference
if (px.rgba.a == px_prev.rgba.a) {
signed char vr = px.rgba.r - px_prev.rgba.r;
signed char vg = px.rgba.g - px_prev.rgba.g;
signed char vb = px.rgba.b - px_prev.rgba.b;
signed char vg_r = vr - vg;
signed char vg_b = vb - vg;
if (vr > -3 && vr < 2 &&
vg > -3 && vg < 2 &&
vb > -3 && vb < 2)
{
// Small difference - use QOI_OP_DIFF
bytes[p++] = QOI_OP_DIFF |
((vr + 2) << 4) |
((vg + 2) << 2) |
(vb + 2);
}
else if (vg_r > -9 && vg_r < 8 &&
vg > -33 && vg < 32 &&
vg_b > -9 && vg_b < 8)
{
// Larger difference - use QOI_OP_LUMA
bytes[p++] = QOI_OP_LUMA | (vg + 32);
bytes[p++] = ((vg_r + 8) << 4) | (vg_b + 8);
}
else {
// Full RGB values needed
bytes[p++] = QOI_OP_RGB;
bytes[p++] = px.rgba.r;
bytes[p++] = px.rgba.g;
bytes[p++] = px.rgba.b;
}
}
else {
// Alpha changed - need full RGBA
bytes[p++] = QOI_OP_RGBA;
bytes[p++] = px.rgba.r;
bytes[p++] = px.rgba.g;
bytes[p++] = px.rgba.b;
bytes[p++] = px.rgba.a;
}
}
}
px_prev = px;
}
// Write end marker
for (i = 0; i < (int)sizeof(qoi_padding); i++) {
bytes[p++] = qoi_padding[i];
}
*out_len = p;
return bytes;
}
// Core decoding function implementing the inverse operations
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
const unsigned char *bytes;
unsigned int header_magic;
unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px;
int px_len, chunks_len, px_pos;
int p = 0, run = 0;
// Input validation
if (data == NULL || desc == NULL ||
(channels != 0 && channels != 3 && channels != 4) ||
size < QOI_HEADER_SIZE + (int)sizeof(qoi_padding))
{
return NULL;
}
// Parse header
bytes = (const unsigned char *)data;
header_magic = qoi_read_32(bytes, &p);
desc->width = qoi_read_32(bytes, &p);
desc->height = qoi_read_32(bytes, &p);
desc->channels = bytes[p++];
desc->colorspace = bytes[p++];
// Validate header
if (desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
header_magic != QOI_MAGIC ||
desc->height >= QOI_PIXELS_MAX / desc->width)
{
return NULL;
}
// Set output channels
if (channels == 0) {
channels = desc->channels;
}
// Allocate pixel buffer
px_len = desc->width * desc->height * channels;
pixels = (unsigned char *) QOI_MALLOC(px_len);
if (!pixels) {
return NULL;
}
// Initialize decoder state
QOI_ZEROARR(index);
px.rgba.r = 0;
px.rgba.g = 0;
px.rgba.b = 0;
px.rgba.a = 255;
// Main decoding loop
chunks_len = size - (int)sizeof(qoi_padding);
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
if (run > 0) {
run--;
}
else if (p < chunks_len) {
int b1 = bytes[p++];
if (b1 == QOI_OP_RGB) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
}
else if (b1 == QOI_OP_RGBA) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
px.rgba.a = bytes[p++];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
px = index[b1];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_DIFF) {
px.rgba.r += ((b1 >> 4) & 0x03) - 2;
px.rgba.g += ((b1 >> 2) & 0x03) - 2;
px.rgba.b += ( b1 & 0x03) - 2;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
int b2 = bytes[p++];
int vg = (b1 & 0x3f) - 32;
px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
px.rgba.g += vg;
px.rgba.b += vg - 8 + (b2 & 0x0f);
}
else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
run = (b1 & 0x3f);
}
index[QOI_COLOR_HASH(px) % 64] = px;
}
// Write pixel values
pixels[px_pos + 0] = px.rgba.r;
pixels[px_pos + 1] = px.rgba.g;
pixels[px_pos + 2] = px.rgba.b;
if (channels == 4) {
pixels[px_pos + 3] = px.rgba.a;
}
}
return pixels;
}
// File handling functions if stdio is enabled
#ifndef QOI_NO_STDIO
// File I/O functions continued...
int qoi_write(const char *filename, const void *data, const qoi_desc *desc) {
FILE *f = fopen(filename, "wb");
int size, err;
void *encoded;
if (!f) {
return 0;
}
// Encode the pixel data into QOI format
encoded = qoi_encode(data, desc, &size);
if (!encoded) {
fclose(f);
return 0;
}
// Write the encoded data to file
fwrite(encoded, 1, size, f);
fflush(f);
err = ferror(f);
fclose(f);
QOI_FREE(encoded);
return err ? 0 : size;
}
void *qoi_read(const char *filename, qoi_desc *desc, int channels) {
FILE *f = fopen(filename, "rb");
int size, bytes_read;
void *pixels, *data;
if (!f) {
return NULL;
}
// Get file size
fseek(f, 0, SEEK_END);
size = ftell(f);
if (size <= 0 || fseek(f, 0, SEEK_SET) != 0) {
fclose(f);
return NULL;
}
// Read entire file into memory
data = QOI_MALLOC(size);
if (!data) {
fclose(f);
return NULL;
}
// Read file content and decode
bytes_read = fread(data, 1, size, f);
fclose(f);
pixels = (bytes_read != size) ? NULL : qoi_decode(data, bytes_read, desc, channels);
QOI_FREE(data);
return pixels;
}
#endif /* QOI_NO_STDIO */
#endif /* QOI_IMPLEMENTATION */
Read PNG (using stb_image.h
) and convert the PNG file to QOI format (using vec.s
).
main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#define STB_IMAGE_IMPLEMENTATION
#define STBI_ONLY_PNG
#define STBI_NO_LINEAR
#include "stb_image.h"
#define QOI_IMPLEMENTATION
#include "qoi.h"
struct rgba_pixel {
unsigned char r, g, b, a;
};
void encode_pixels_rvv(unsigned char *out, const struct rgba_pixel *pixels, int n);
static unsigned char* read_file(const char* filename, size_t* size_out) {
FILE* f = fopen(filename, "rb");
if (!f) {
fprintf(stderr, "Failed to open %s: %s\n", filename, strerror(errno));
return NULL;
}
struct stat st;
if (fstat(fileno(f), &st) != 0) {
fprintf(stderr, "Failed to stat %s: %s\n", filename, strerror(errno));
fclose(f);
return NULL;
}
unsigned char* buffer = malloc(st.st_size);
if (!buffer) {
fprintf(stderr, "Failed to allocate %ld bytes\n", (long)st.st_size);
fclose(f);
return NULL;
}
size_t bytes_read = fread(buffer, 1, st.st_size, f);
if (bytes_read != (size_t)st.st_size) {
fprintf(stderr, "Failed to read file: expected %ld bytes, got %ld\n",
(long)st.st_size, (long)bytes_read);
free(buffer);
fclose(f);
return NULL;
}
fclose(f);
*size_out = st.st_size;
return buffer;
}
static int write_file(const char* filename, const unsigned char* data, size_t size) {
FILE* f = fopen(filename, "wb");
if (!f) {
fprintf(stderr, "Failed to create %s: %s\n", filename, strerror(errno));
return 0;
}
size_t written = fwrite(data, 1, size, f);
if (written != size) {
fprintf(stderr, "Failed to write file: expected %ld bytes, wrote %ld\n",
(long)size, (long)written);
fclose(f);
return 0;
}
fclose(f);
return 1;
}
int main(int argc, char **argv) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <input.png> <output.qoi>\n", argv[0]);
return 1;
}
printf("Reading input file: %s\n", argv[1]);
int width, height, channels;
if (!stbi_info(argv[1], &width, &height, &channels)) {
fprintf(stderr, "Failed to read PNG header: %s\n", stbi_failure_reason());
return 1;
}
printf("Image: %dx%d, %d channels\n", width, height, channels);
channels = 4; // Force RGBA
unsigned char *png_data = stbi_load(argv[1], &width, &height, NULL, channels);
if (!png_data) {
fprintf(stderr, "Failed to load PNG: %s\n", stbi_failure_reason());
return 1;
}
int pixel_count = width * height;
struct rgba_pixel *pixels = malloc(pixel_count * sizeof(struct rgba_pixel));
if (!pixels) {
fprintf(stderr, "Failed to allocate pixel buffer\n");
stbi_image_free(png_data);
return 1;
}
// Convert to RGBA struct format
for (int i = 0; i < pixel_count; i++) {
pixels[i].r = png_data[i * 4 + 0];
pixels[i].g = png_data[i * 4 + 1];
pixels[i].b = png_data[i * 4 + 2];
pixels[i].a = png_data[i * 4 + 3];
}
stbi_image_free(png_data);
unsigned char *processed = malloc(pixel_count * sizeof(struct rgba_pixel));
if (!processed) {
fprintf(stderr, "Failed to allocate processing buffer\n");
free(pixels);
return 1;
}
printf("Processing pixels with RVV...\n");
encode_pixels_rvv(processed, pixels, pixel_count);
qoi_desc desc = {
.width = width,
.height = height,
.channels = 4,
.colorspace = QOI_SRGB
};
int qoi_size;
void *qoi_data = qoi_encode(processed, &desc, &qoi_size);
if (!qoi_data) {
fprintf(stderr, "QOI encoding failed\n");
free(pixels);
free(processed);
return 1;
}
printf("Writing output file: %s\n", argv[2]);
if (!write_file(argv[2], qoi_data, qoi_size)) {
free(pixels);
free(processed);
free(qoi_data);
return 1;
}
free(pixels);
free(processed);
free(qoi_data);
printf("Conversion successful\n");
return 0;
}
RVV QOI encoder implementation
Vec.s
# vec.S - RISC-V Vector Extension implementation for QOI encoding
# Register Conventions:
# a0 = output buffer pointer
# a1 = input pixel array pointer
# a2 = number of pixels to process
# t0 = remaining pixels counter
# t1 = vector length after vsetvli
# v0-v3 = RGB(A) components
# v4 = temporary calculations
# v8-v11 = previous pixel values for difference calculation
# v16 = pixel hash results
# v24 = run length detection mask
.text
.balign 4
.global encode_pixels_rvv
# void encode_pixels_rvv(unsigned char *out, const struct rgba_pixel *pixels, int n)
encode_pixels_rvv:
# Preserve return address and callee-saved registers
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
sw s1, 4(sp)
# Initialize our working registers
mv t0, a2 # Copy pixel count to counter
mv s0, a0 # Save output buffer pointer
mv s1, a1 # Save input pixel pointer
process_loop:
# Set vector length based on remaining pixels
vsetvli t1, t0, e8, ta, ma # 8-bit elements
# Load RGBA components using strided load
# Each component is 4 bytes apart in the struct
vlse8.v v0, (s1), x4 # Load R components
addi t2, s1, 1
vlse8.v v1, (t2), x4 # Load G components
addi t2, s1, 2
vlse8.v v2, (t2), x4 # Load B components
addi t2, s1, 3
vlse8.v v3, (t2), x4 # Load A components
# Calculate QOI hash: (r*3 + g*5 + b*7 + a*11) % 64
# First, multiply components by their coefficients
vwmulu.vx v4, v0, x3 # v4 = r * 3 (16-bit result)
vwmulu.vx v6, v1, x5 # v6 = g * 5
vwmulu.vx v8, v2, x7 # v8 = b * 7
vwmulu.vx v10, v3, x11 # v10 = a * 11
# Add all components together
vadd.vv v4, v4, v6 # Add g component
vadd.vv v4, v4, v8 # Add b component
vadd.vv v4, v4, v10 # Add a component
# Perform modulo 64 (using AND since 64 is power of 2)
vand.vi v16, v4, 63 # v16 contains final hash values
# Detect runs of identical pixels
# Compare each pixel with its predecessor
vmseq.vv v24, v0, v8 # Compare R components
vmand.vv v24, v24, v25 # AND with G comparison
vmand.vv v24, v24, v26 # AND with B comparison
vmand.vv v24, v24, v27 # AND with A comparison
# Store results
# We'll store the hash values and run detection mask for the C code to process
vse8.v v16, (s0) # Store hash values
addi t2, s0, t1
vse8.v v24, (t2) # Store run detection mask
# Calculate the number of bytes processed
slli t2, t1, 2 # Multiply vector length by 4 (RGBA)
add s1, s1, t2 # Update input pointer
add s0, s0, t1 # Update output pointer for hash values
add s0, s0, t1 # Update output pointer for run mask
# Update remaining pixel count
sub t0, t0, t1 # Decrease remaining elements
# Continue if there are more pixels
bnez t0, process_loop
# Restore registers and return
lw ra, 12(sp)
lw s0, 8(sp)
lw s1, 4(sp)
addi sp, sp, 16
ret
# Additional helper functions if needed
compute_differences:
# Compute differences between consecutive pixels
vsub.vv v4, v0, v8 # R differences
vsub.vv v5, v1, v9 # G differences
vsub.vv v6, v2, v10 # B differences
vsub.vv v7, v3, v11 # A differences
ret
detect_small_diffs:
# Check if differences are within small range (-2 to 1)
vmslt.vi v20, v4, 2 # Check upper bound for R
vmsgt.vi v21, v4, -3 # Check lower bound for R
vmand.vv v20, v20, v21 # Combine R bounds
# Repeat for G and B...
ret
//set array 0
#ifndef QOI_ZEROARR
#define QOI_ZEROARR(a) memset((a),0,sizeof(a))
#endif
#define QOI_OP_DIFF 0x40 /* 01xxxxxx */
#define QOI_OP_LUMA 0x80 /* 10xxxxxx */
#define QOI_OP_INDEX 0x00 /* 00xxxxxx */
#define QOI_OP_RUN 0xc0 /* 11xxxxxx */
#define QOI_OP_RGB 0xfe /* 11111110 */
#define QOI_OP_RGBA 0xff /* 11111111 */
#define QOI_MASK_2 0xc0 /* 11000000 */
#define QOI_COLOR_HASH(C) (C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11)
//malloc and free, but risc-v don't need to free
#ifndef QOI_MALLOC
#define QOI_MALLOC(sz) malloc(sz)
#define QOI_FREE(p) free(p)
#endif
static const unsigned char qoi_padding[8] = {0,0,0,0,0,0,0,1};
//similar with struct but using same memory. Ex. onlt exist int or struct at the same time.
typedef union {
struct { unsigned char r, g, b, a; } rgba;
unsigned int v;
} qoi_rgba_t;
// read image bytes and return abcd
static unsigned int qoi_read_32(const unsigned char *bytes, int *p) {
unsigned int a = bytes[(*p)++];
unsigned int b = bytes[(*p)++];
unsigned int c = bytes[(*p)++];
unsigned int d = bytes[(*p)++];
return a << 24 | b << 16 | c << 8 | d;
}
typedef struct {
unsigned int width;
unsigned int height;
unsigned char channels;
unsigned char colorspace;
} qoi_desc;
void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
const unsigned char *bytes;
unsigned int header_magic;
unsigned char *pixels;
qoi_rgba_t index[64];
qoi_rgba_t px;
int px_len, chunks_len, px_pos;
int p = 0, run = 0;
//if NULL return
if (
data == NULL || desc == NULL ||
(channels != 0 && channels != 3 && channels != 4) ||
size < QOI_HEADER_SIZE + (int)sizeof(qoi_padding)
) {
return NULL;
}
//get image
bytes = (const unsigned char *)data;
//png format
header_magic = qoi_read_32(bytes, &p);
desc->width = qoi_read_32(bytes, &p);
desc->height = qoi_read_32(bytes, &p);
desc->channels = bytes[p++];
desc->colorspace = bytes[p++];
//if formate wrong, NULL
if (
desc->width == 0 || desc->height == 0 ||
desc->channels < 3 || desc->channels > 4 ||
desc->colorspace > 1 ||
header_magic != QOI_MAGIC ||
desc->height >= QOI_PIXELS_MAX / desc->width
) {
return NULL;
}
if (channels == 0) {
channels = desc->channels;
}
//count pixels
px_len = desc->width * desc->height * channels;
pixels = (unsigned char *) QOI_MALLOC(px_len);
if (!pixels) {
return NULL;
}
QOI_ZEROARR(index);
px.rgba.r = 0;
px.rgba.g = 0;
px.rgba.b = 0;
px.rgba.a = 255;
//count chunk, get format
chunks_len = size - (int)sizeof(qoi_padding);
for (px_pos = 0; px_pos < px_len; px_pos += channels) {
if (run > 0) {
run--;
}
else if (p < chunks_len) {
int b1 = bytes[p++];
if (b1 == QOI_OP_RGB) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
}
else if (b1 == QOI_OP_RGBA) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
px.rgba.a = bytes[p++];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
px = index[b1];
}
else if ((b1 & QOI_MASK_2) == QOI_OP_DIFF) {
px.rgba.r += ((b1 >> 4) & 0x03) - 2;
px.rgba.g += ((b1 >> 2) & 0x03) - 2;
px.rgba.b += ( b1 & 0x03) - 2;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
int b2 = bytes[p++];
int vg = (b1 & 0x3f) - 32;
px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
px.rgba.g += vg;
px.rgba.b += vg - 8 + (b2 & 0x0f);
}
else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
run = (b1 & 0x3f);
}
index[QOI_COLOR_HASH(px) % 64] = px;
}
pixels[px_pos + 0] = px.rgba.r;
pixels[px_pos + 1] = px.rgba.g;
pixels[px_pos + 2] = px.rgba.b;
if (channels == 4) {
pixels[px_pos + 3] = px.rgba.a;
}
}
return pixels;
}
RISC-V
# void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
# define data a0
# define size a1
# define desc a2
# define channels a3
# define pixels s4
.data
QOI_OP_DIFF: .word 0x40 # QOI_OP_DIFF
QOI_OP_LUMA: .word 0x80 # QOI_OP_LUMA
QOI_OP_INDEX: .word 0x00 # QOI_OP_INDEX
QOI_OP_RUN: .word 0xc0 # QOI_OP_RUN
QOI_OP_RGB: .word 0xfe # QOI_OP_RGB
QOI_OP_RGBA: .word 0xff # QOI_OP_RGBA
QOI_MASK_2: .word 0xc0 # QOI_MASK_2
.align 4
qoi_padding:
.byte 0, 0, 0, 0, 0, 0, 0, 1
.bss
index: .space 256 # index[64]
.text
qoi_decode:
#data in a0
li s5 2147483647
li a1 1234
mv t1,a1 #size
# header_magic = qoi_read_32(t2, &p);
jal qoi_read_32 # oi_read_32
mv t1, a0 # result a0 to t1
mv t2,a2 #desc
# desc->width = qoi_read_32(t2, &p);
jal qoi_read_32 # oi_read_32
sw a0, 0(t2) # store a0 to 0(t2)
mv t2,a2 #desc
# desc->height = qoi_read_32(t2, &p);
jal qoi_read_32 # qoi_read_32
sw a0, 4(t2) # store a0 to 4(t2)
# desc->channels = t2[p++];
lb t0, 0(t5) # load t2[p]
addi t5, t5, 1 # p++
sb t0, 8(t2) # store to desc->channels
# desc->colorspace = t2[p++];
lb t0, 0(t5) # load t2[p]
addi t5, t5, 1 # p++
sb t0, 12(t2) # store to desc->colorspace
# c=if (channels == 0) { channels = desc->channels; }
mv t3,a3 #channels
addi sp, sp, -4
auipc ra, 0
sw ra, 0(sp)
beqz t3, set_channels # if t3 (channels) == 0, j set_channels
lw ra, 0(sp)
addi sp, sp, 4
#px_len = desc->width * desc->height * channels;
lw t0, 0(t2) # t0 = desc->width
lw t1, 4(t2) # t1 = desc->height
lw t2, 8(t2) # t2 = desc->channels
t2
mul t3, t0, t1 # t3 = width * height
mul t3, t3, t2 # t3 = (width * height) * channels
sw t3, 0(a0) # px_len = t3
mv a1,t3
jal QOI_MALLOC
mv s4, a0 #store pixels' address in s4
#s0=px.rgba.r,s1=px.rgba.g,s2=px.rgba.b,s3=px.rgba.a
li s0,0t2ize - sizeof(qoi_padding)
li t1, 0 # px_pos = 0
loop_start:
bge t1, t3, loop_end # if px_pos >= px_len, end
li a4, 0 # run = 0
beqz a4, process_chunks # run == 0, process_chunks
addi a4, a4, -1 # run--
j loop_continue
process_chunks:
# b1 = byte[p++]
lb t1, 0(s5)
addi p, p, 1 # p++
li t2, QOI_OP_RGB
beq t1, t2, handle_rgb # if b1 == QOI_OP_RGB, handle_rgb
li t2, QOI_OP_RGBA
beq t1, t2, handle_rgba # if b1 == QOI_OP_RGBA, handle_rgba
li t2, QOI_MASK_2
and t3, t1, t2
li t4, QOI_OP_INDEX
beq t3, t4, handle_index # if (b1 & QOI_MASK_2) == QOI_OP_INDEX, handle_index
li t4, QOI_OP_DIFF
beq t3, t4, handle_diff # if (b1 & QOI_MASK_2) == QOI_OP_DIFF, handle_diff
li t4, QOI_OP_LUMA
beq t3, t4, handle_luma # if (b1 & QOI_MASK_2) == QOI_OP_LUMA, handle_luma
li t4, QOI_OP_RUN
beq t3, t4, handle_run # if (b1 & QOI_MASK_2) == QOI_OP_RUN, handle_run
j main_loop
main_loop:
jal QOI_COLOR_HASH #return t4
li t3,64
rem t2,t2,t3 #t2,hash_px%64
slli t2,t2,2 #t2*=4
add t3,t1,t2 #t3 = index + offest
sw t0,0(index)
j loop_continue
loop_continue:
sw s0, 0(pixels) # px.rgba.r
sw s1, 1(pixels) # px.rgba.g
sw s2, 2(pixels) # px.rgba.b
beqz a1, skip_alpha # if channels == 3, skip alpha
sw s3, 3(pixels) # px.rgba.a
skip_alpha:
addi t0, t0, a1 # px_pos += channels
j loop_start
loop_end:
ret
set_channels:
lw ra, 0(sp)
lw t3, 8(t2) # desc->channels to t0 (channels)
jr ra
QOI_COLOR_HASH:
li s0, 1
li s1, 2
li s2, 3
li s3, 4
li t0, 3
mul t1, s0, t0 # t0 = C.rgba.r * 3
li t0, 5
mul t2, s1, t0 # t1 = C.rgba.g * 5
li t0, 7
mul t3, s2, t0 # t2 = C.rgba.b * 7
li t0, 11
mul t4, s3, t0 # t3 = C.rgba.a * 11
add t2, t2, t1
add t3, t3, t2
add t4, t4, t3
ret
QOI_MALLOC:
li a7, 214 # sbrk
mv a0, a1 # a1=sz
ecall
ret
# QOI_FREE:
# ret
qoi_read_32:
# li a0 00000040
addi sp, sp, -16
sw ra, 12(sp) # ra
sw t5, 8(sp) # p
mv t5, s5 # p = a0
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t1, t0 # q
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t2, t0 # b
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t3, t0 # c
lb t0, 0(t5) # t0 = t2[*p]
addi t5, t5, 1 # p++
mv t4, t0 # d
slli t1, t1, 24 # a << 24
slli t2, t2, 16 # b << 16
slli t3, t3, 8 # c << 8
or t1, t1, t2 # a << 24 | b << 16
or t1, t1, t3 # a << 24 | b << 16 | c << 8
or t1, t1, t4 # restore p
addi sp, sp, 16
ret
handle_rgb:
lb s0, 0(s5) # px.rgba.r = bytes[p++]
addi a2, a2, 1
lb s1, 0(s5) # px.rgba.g = bytes[p++]
addi a2, a2, 1
lb s2, 0(s5) # px.rgba.b = bytes[p++]
addi a2, a2, 1
j main_loop
handle_rgba:
lb s0, 0(s5) # px.rgba.r = bytes[p++]
addi a2, a2, 1
lb s1, 0(s5) # px.rgba.g = bytes[p++]
addi a2, a2, 1
lb s2, 0(s5) # px.rgba.b = bytes[p++]
addi a2, a2, 1
lb s3, 0(s5) # px.rgba.a = bytes[p++]
addi a2, a2, 1
j main_loop
handle_index:
slli t0, t3, 2 # t0 = b1 * 4
add t1, index, t0 # t1 = index
lw t0, 0(t1) # index[b1]
mv s0, t0 # px = index[b1]
j main_loop
handle_diff:
srli t0, t3, 4 # (b1 >> 4) & 0x03
andi t0, t0, 0x03
addi s0, s0, -2 # px.rgba.r += ((b1 >> 4) & 0x03) - 2
srli t1, t3, 2 # (b1 >> 2) & 0x03
andi t1, t1, 0x03
addi s1, s1, -2 # px.rgba.g += ((b1 >> 2) & 0x03) - 2
andi t2, t3, 0x03 # b1 & 0x03
addi s2, s2, -2 # px.rgba.b += (b1 & 0x03) - 2
j main_loop
handle_luma:
lb t0, 0(s5) # b2 = bytes[p++]
addi a2, a2, 1
andi t1, t3, 0x3f # vg = (b1 & 0x3f) - 32
addi t1, t1, -32
srli t2, t0, 4 # (b2 >> 4) & 0x0f
andi t2, t2, 0x0f
sub t3, t1, t2
addi s0, s0, -8 # px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f)
add s1, s1, t1 # px.rgba.g += vg
andi t2, t0, 0x0f # b2 & 0x0f
sub t3, t1, t2
addi s2, s2, -8 # px.rgba.b += vg - 8 + (b2 & 0x0f)
j main_loop
handle_run:
andi t0, t3, 0x3f # run = b1 & 0x3f
mv a4, t0 # run to s4
j main_loop
RISC-V with vector extension
# Vector registers used:
# v0-v3: RGBA components
# v4: temporary calculations
# v8-v11: for index operations
.data
QOI_OP_DIFF: .word 0x40
QOI_OP_LUMA: .word 0x80
QOI_OP_INDEX: .word 0x00
QOI_OP_RUN: .word 0xc0
QOI_OP_RGB: .word 0xfe
QOI_OP_RGBA: .word 0xff
QOI_MASK_2: .word 0xc0
.align 4
qoi_padding:
.byte 0, 0, 0, 0, 0, 0, 0, 1
.bss
index: .space 256 # index[64]
.text
qoi_decode:
# Configure vector unit
li t0, 32 # Set vector length to 32 bytes
vsetvli t1, t0, e8, m1 # 8-bit elements, single vector register
# Save original arguments
mv s5, a0 # Save data pointer
mv s6, a1 # Save size
mv s7, a2 # Save desc pointer
mv s8, a3 # Save channels
# Read header as before
jal qoi_read_32
mv t1, a0
# Process pixels in vector mode
process_pixels_vector:
# Load multiple pixels into vector registers
vle8.v v0, (s4) # Load R components
vle8.v v1, (s4) # Load G components
vle8.v v2, (s4) # Load B components
vle8.v v3, (s4) # Load A components
handle_diff_vector:
# Vector version of diff handling
vand.vi v4, v0, 0x3f # Mask for diff
vsub.vi v4, v4, 2 # Subtract 2
vadd.vv v0, v0, v4 # Add to red channel
vand.vi v4, v1, 0x3f
vsub.vi v4, v4, 2
vadd.vv v1, v1, v4 # Add to green channel
vand.vi v4, v2, 0x3f
vsub.vi v4, v4, 2
vadd.vv v2, v2, v4 # Add to blue channel
QOI_COLOR_HASH_vector:
# Vector version of color hash
vmul.vi v8, v0, 3 # r * 3
vmul.vi v9, v1, 5 # g * 5
vmul.vi v10, v2, 7 # b * 7
vmul.vi v11, v3, 11 # a * 11
vadd.vv v8, v8, v9 # Add components
vadd.vv v8, v8, v10
vadd.vv v8, v8, v11
# Store results back
vse8.v v0, (s4) # Store R components
vse8.v v1, (s4) # Store G components
vse8.v v2, (s4) # Store B components
vse8.v v3, (s4) # Store A components
process_chunks_vector:
# Load chunk of bytes into vector register
vsetvli t0, a1, e8, m1 # Set vector length for byte operations
vle8.v v4, (s5) # Load chunk of bytes
# Check for different opcodes in parallel
vandi.v v5, v4, 0xC0 # Apply QOI_MASK_2 to all elements
# Create masks for different opcodes
vmseq.vi v6, v4, 0xFE # Mask for QOI_OP_RGB
vmseq.vi v7, v4, 0xFF # Mask for QOI_OP_RGBA
vmseq.vi v8, v5, 0x00 # Mask for QOI_OP_INDEX
vmseq.vi v9, v5, 0x40 # Mask for QOI_OP_DIFF
vmseq.vi v10, v5, 0x80 # Mask for QOI_OP_LUMA
vmseq.vi v11, v5, 0xC0 # Mask for QOI_OP_RUN
# Handle RGB chunks
vcompress.vm v12, v0, v6 # Gather RGB chunks
vrgather.vv v0, v12, v6 # Load R components
vrgather.vv v1, v12, v6 # Load G components
vrgather.vv v2, v12, v6 # Load B components
# Handle RGBA chunks
vcompress.vm v12, v0, v7 # Gather RGBA chunks
vrgather.vv v0, v12, v7 # Load R components
vrgather.vv v1, v12, v7 # Load G components
vrgather.vv v2, v12, v7 # Load B components
vrgather.vv v3, v12, v7 # Load A components
# Handle INDEX chunks
vcompress.vm v12, v0, v8 # Gather INDEX chunks
vsll.vi v13, v12, 2 # Multiply by 4 for index lookup
vluxei8.v v14, (index), v13 # Load from index array
# Handle DIFF chunks (similar to original but vectorized)
vcompress.vm v12, v0, v9
vsra.vi v15, v12, 4 # (b1 >> 4) & 0x03
vand.vi v15, v15, 0x03
vsub.vi v15, v15, 2 # -2
vadd.vv v0, v0, v15 # Add to R
vsra.vi v15, v12, 2 # (b1 >> 2) & 0x03
vand.vi v15, v15, 0x03
vsub.vi v15, v15, 2
vadd.vv v1, v1, v15 # Add to G
vand.vi v15, v12, 0x03 # b1 & 0x03
vsub.vi v15, v15, 2
vadd.vv v2, v2, v15 # Add to B
# Continue to main_loop_vector
main_loop_vector:
# Vector version of main processing loop
vsetvli t0, a1, e8, m1 # Set vector length based on remaining pixels
vle8.v v0, (s5) # Load chunk of pixels
# Process op codes in vector mode
vand.vi v4, v0, 0xc0 # Mask for op codes
vmseq.vi v0, v4, 0x40 # Check for QOI_OP_DIFF
vmseq.vi v1, v4, 0x80 # Check for QOI_OP_LUMA
vmseq.vi v2, v4, 0x00 # Check for QOI_OP_INDEX
# Parallel processing based on op codes
vrgather.vi v8, v0, 0 # Gather DIFF operations
vrgather.vi v9, v1, 0 # Gather LUMA operations
vrgather.vi v10, v2, 0 # Gather INDEX operations
# Continue with the rest of the decoder logic
ret
# Helper functions remain mostly unchanged
qoi_read_32:
addi sp, sp, -16
sw ra, 12(sp) # save ra
sw t5, 8(sp) # save p
mv t5, s5 # p = data pointer
# Could potentially use vector load for 4 bytes at once
# but keeping scalar for header reading since it's not performance critical
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t1, t0 # q = first byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t2, t0 # b = second byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t3, t0 # c = third byte
lb t0, 0(t5) # t0 = data[*p]
addi t5, t5, 1 # p++
mv t4, t0 # d = fourth byte
# Combine bytes into 32-bit value
slli t1, t1, 24 # q << 24
slli t2, t2, 16 # b << 16
slli t3, t3, 8 # c << 8
or t1, t1, t2 # combine q and b
or t1, t1, t3 # combine with c
or t1, t1, t4 # combine with d to get final 32-bit value
mv a0, t1 # move result to return register
# Restore stack and return
lw ra, 12(sp)
lw t5, 8(sp)
addi sp, sp, 16
ret
QOI_MALLOC:
li a7, 214 # sbrk syscall number
mv a0, a1 # move size to a0
ecall # make syscall
ret # return allocated address in a0
sudo apt update && sudo apt upgrade -y
Install some prerequisites (Ubuntu)
sudo apt-get install autoconf automake autotools-dev curl python3 python3-pip python3-tomli libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev
mkdir tool && cd tool
git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git --recursive
riscv-gnu-toolchain
, create and enter a folder called build
and then configure what we need to compile in the makefile.cd riscv-gnu-toolchain
mkdir build && cd build
../configure --prefix=$HOME/riscv-gnu-toolchain/build --with-arch=rv32gcv --with-abi=ilp32d --enable-multilib
This step takes a while on my shabby ASUS Mini PN41. (Around 3 hours)
make
echo 'export PATH=$PATH:~/riscv-gnu-toolchain/build/bin' >> ~/.bashrc
source ~/.bashrc
riscv-gnu-toolchain
and compile qemu which is compatible to the riscv-gnu-toolchain.Please keep working in the directory where_you_git_clone_repo/riscv-gnu-toolchain/build
, or you will suffer from failure.
make build-qemu
vector-test.s
.section .text
.global _start
_start:
# Initialize stack pointer
lui sp, %hi(stack_top)
addi sp, sp, %lo(stack_top)
# Print start message
lui a0, %hi(msg_start)
addi a0, a0, %lo(msg_start)
jal print_string
# Initialize vector configuration with explicit configuration
vsetvli t0, x0, e32, m1, ta, ma # SEW=32, LMUL=1, tail agnostic, mask agnostic
# Load vector register v0 with data
la a0, vector_data
vle32.v v0, (a0) # Load 32-bit elements into v0
# Add 1 to each element
li t0, 1
vadd.vx v0, v0, t0 # Add scalar t0 to each element
# Store result back to memory
la a0, vector_result
vse32.v v0, (a0) # Store 32-bit elements from v0
# Print results
la s0, vector_result # Load result address
li s1, 4 # Counter for 4 numbers
print_loop:
# Print "Result: "
lui a0, %hi(msg_result)
addi a0, a0, %lo(msg_result)
jal print_string
# Load and print original value
lui a0, %hi(msg_orig)
addi a0, a0, %lo(msg_orig)
jal print_string
la t0, vector_data
la t2, vector_result # Load address of vector_result
sub t1, s0, t2 # Now subtract registers
add t0, t0, t1
lw a0, 0(t0)
jal print_num
# Print arrow
lui a0, %hi(msg_arrow)
addi a0, a0, %lo(msg_arrow)
jal print_string
# Load and print result value
lw a0, 0(s0)
jal print_num
# Print newline
lui a0, %hi(msg_newline)
addi a0, a0, %lo(msg_newline)
jal print_string
# Move to next number
addi s0, s0, 4
addi s1, s1, -1
bnez s1, print_loop
# Print completion message
lui a0, %hi(msg_done)
addi a0, a0, %lo(msg_done)
jal print_string
# Exit success
li a0, 0
li a7, 93
ecall
# Print string function - expects pointer in a0
print_string:
addi sp, sp, -4
sw ra, 0(sp)
mv t0, a0
1: lbu t1, 0(t0)
beqz t1, 2f
addi t0, t0, 1
j 1b
2: sub t0, t0, a0
mv a2, t0
mv a1, a0
li a0, 1
li a7, 64
ecall
lw ra, 0(sp)
addi sp, sp, 4
ret
# Print number function - expects number in a0
print_num:
addi sp, sp, -20
sw ra, 16(sp)
sw s0, 12(sp)
sw s1, 8(sp)
sw s2, 4(sp)
sw s3, 0(sp)
mv s0, a0 # Save original number
li s1, 10 # Divisor
mv s2, sp # Buffer pointer
# Handle negative numbers
bgez s0, positive
neg s0, s0
li t0, '-'
li a0, 1
mv a1, sp
sb t0, 0(a1)
li a2, 1
li a7, 64
ecall
positive:
# Convert number to string (backwards)
mv t0, s2 # Current buffer position
digit_loop:
rem t1, s0, s1 # Get remainder (current digit)
addi t1, t1, '0' # Convert to ASCII
sb t1, 0(t0) # Store digit
addi t0, t0, 1 # Move buffer pointer
div s0, s0, s1 # Divide number by 10
bnez s0, digit_loop
# Print the number
mv a1, s2 # Buffer start
sub a2, t0, s2 # Calculate length
li a0, 1
li a7, 64
ecall
# Restore registers and return
lw ra, 16(sp)
lw s0, 12(sp)
lw s1, 8(sp)
lw s2, 4(sp)
lw s3, 0(sp)
addi sp, sp, 20
ret
.section .rodata
msg_start:
.string "Starting vector test...\n"
msg_done:
.string "\nVector operations completed.\n"
msg_result:
.string "Element "
msg_orig:
.string "Original: "
msg_arrow:
.string " -> Result: "
msg_newline:
.string "\n"
.section .data
.align 4
vector_data:
.word 1, 2, 3, 4 # Pre-initialized input data
.align 4
vector_result:
.word 0, 0, 0, 0 # Space for results
.section .bss
.align 4
.space 4096 # Stack
stack_top:
riscv32-unknown-elf-as -march=rv32gcv_zba vector-test.s -o vector-test.o
riscv32-unknown-elf-ld -nostdlib vector-test.o -o vector-test
qemu-riscv32 -cpu rv32,v=true,zba=true,vlen=128 ./vector-test
Starting vector test...
Element Original: 1 -> Result: 2
Element Original: 2 -> Result: 3
Element Original: 3 -> Result: 4
Element Original: 4 -> Result: 5
Vector operations completed.
PNG (Portable Network Graphics) is a raster graphics file format that uses lossless compression, designed to replace the older GIF format.
Here is the python script to transform the PNG file to binary file.
import numpy as np
from PIL import Image
import struct
import argparse
def png_to_binary(input_png, output_binary):
"""
Convert a PNG file to a binary format with the following structure:
- First 4 bytes: width (int32)
- Next 4 bytes: height (int32)
- Next 1 byte: channels (uint8)
- Remaining bytes: pixel data in row-major order
"""
try:
# Open and read the PNG file
with Image.open(input_png) as img:
# Convert to RGB or RGBA if not already
if img.mode not in ['RGB', 'RGBA']:
img = img.convert('RGB')
# Get image dimensions and channel count
width, height = img.size
channels = len(img.getbands()) # 3 for RGB, 4 for RGBA
# Convert image to numpy array
img_array = np.array(img)
# Write to binary file
with open(output_binary, 'wb') as f:
# Write header information
f.write(struct.pack('>I', width)) # Big-endian uint32
f.write(struct.pack('>I', height)) # Big-endian uint32
f.write(struct.pack('B', channels)) # uint8
# Write pixel data
# Flatten array and ensure correct byte order
img_array.astype(np.uint8).tobytes('C')
f.write(img_array.tobytes())
return True, f"Successfully converted {input_png} to {output_binary}"
except FileNotFoundError:
return False, f"Error: Input file {input_png} not found"
except Exception as e:
return False, f"Error during conversion: {str(e)}"
def binary_to_png(input_binary, output_png):
"""
Convert our binary format back to PNG to verify the conversion worked correctly.
"""
try:
with open(input_binary, 'rb') as f:
# Read header
width = struct.unpack('>I', f.read(4))[0]
height = struct.unpack('>I', f.read(4))[0]
channels = struct.unpack('B', f.read(1))[0]
# Read pixel data
mode = 'RGBA' if channels == 4 else 'RGB'
size = width * height * channels
data = f.read(size)
# Convert to numpy array and reshape
img_array = np.frombuffer(data, dtype=np.uint8)
img_array = img_array.reshape((height, width, channels))
# Create and save image
img = Image.fromarray(img_array, mode)
img.save(output_png)
return True, f"Successfully converted {input_binary} to {output_png}"
except FileNotFoundError:
return False, f"Error: Input file {input_binary} not found"
except Exception as e:
return False, f"Error during conversion: {str(e)}"
def main():
parser = argparse.ArgumentParser(description='Convert between PNG and binary format')
parser.add_argument('input_file', help='Input file path')
parser.add_argument('output_file', help='Output file path')
parser.add_argument('--to-png', action='store_true',
help='Convert from binary to PNG (default is PNG to binary)')
args = parser.parse_args()
if args.to_png:
success, message = binary_to_png(args.input_file, args.output_file)
else:
success, message = png_to_binary(args.input_file, args.output_file)
print(message)
return 0 if success else 1
if __name__ == "__main__":
main()
Here is how you can use this python script.
python3 png2bin.py "input_file_name" "output_file_name"
Example
python3 png2bin.py A.png A.bin
Here is an online QOI viewer. You may drag and drop a QOI format image to test the result.
QOI Viewer - The Brain Dump - GitHub Pages
Original Page
Drag and Drop the image dice.qoi
from floooh
's GitHub repository.
You may download the QOI format image from floooh
's GitHub using the following link.