Linux 核心專題: 全向量視窗系統實作

執行人: marvin0102

Reviewed by `popo8712`

在將 twin 移植到 SDL 的過程中，您提到需要注意 SDL_Renderer 的雙緩衝特性。能否詳細說明在處理雙緩衝時遇到的具體問題以及您採取了哪些措施來解決這些問題？

由於 SDL_Renderer 處理雙緩衝時所使用的暫存記憶體 backbuffer 並不穩定，因此有可能會出現資料遺失的情況。

Reviewed by `164253`

在 sin 與 cos 的計算中，是否可以直接對
$\frac{π}{2}, \frac{π}{4} \dots$ 的 sin 和 cos 值預先計算並打表，也可以避免多次使用半角公式的精度誤差。

是可以的，透過建表的方式可以有效地減少逼近時的運算次數，但我們並不能因為降低精准度誤差的需求而將表的大小無線擴增，因此逼近運算仍是必須的。

Reviewed by `cheezad`

在 X11 運作流程最下面的流程用圖表示會更好

會再做修正

專案簡介

全向量的圖形 (vector graphics) 處理系統專題，希望可以理解相關 rect clipping 相關演算法以及精進 fixed point 的操作並且改進，最終利用 Linux framebuffer 和 input 子系統來展現整個視窗系統。

類似 Cairo API 風格的向量繪圖函式庫，後續的改進:

不依賴 libm，完全用 fixed point 建構
整合進 twin
以 perf 一類的工具找出效能瓶頸，並提出改進方案
在 Linux framebuffer 展現，搭配 input 子系統處理鍵盤/滑鼠事件

TODO: 回顧 2016 年的研究成果

論文解讀:

副產品:

SVGirl

TODO: 將 twin 移植到 SDL 作為模擬和開發用途

github
原本的 twin 依賴 libX11，一旦移植到 SDL，則可用於多種圖形系統

X11 運作流程

在將 twin 移植到 SDL 前，需先了解 twin 中 libX11 的運作流程。關於 twin 的運作流程可以參考以 Clock 應用為例追蹤 TWIN 的運作流程。

結構體

在 twin_x11 中利用結構體 twin_x11_t 紀錄 libX11 畫面顯示所需的資料。其中 screen 為紀錄 twin 更新內容的大小及位置，其餘包括視窗 win、將更新內容寫入視窗的暫存圖像 image 等等。

typedef struct _twin_x11 {
    twin_screen_t   *screen;
    Display	    *dpy;
    Window	    win;
    GC		    gc;
    Visual	    *visual;
    int		    depth;
    XImage	    *image;
    int		    image_y;
} twin_x11_t;

建立視窗與顯示流程

函式 twin_x11_create_ext 負責建立視窗。並且將函式 twin_x11_work 加入任務排程中。

twin_x11_create_ext (Display *dpy, int width, int height, int handle_events)
{
    ...
    //將事件處理函式 twin_x11_read_events 加入任務排程中
    if (handle_events)
	    twin_set_file (twin_x11_read_events,
		      ConnectionNumber (dpy),
		      TWIN_READ,
		      tx);
    //將畫面更新函式 twin_x11_work 加入任務排程中
    twin_set_work (twin_x11_work, TWIN_WORK_REDISPLAY, tx);
    ...
    //建立視窗
    tx->win = XCreateWindow (dpy, RootWindow (dpy, scr),
			     0, 0, width, height, 0,
			     tx->depth, InputOutput,
			     tx->visual, CWBackPixmap|CWEventMask, &wa);

    return tx;
}

在 twin 需要顯示畫面的時候，會呼叫函式 twin_x11_work，透過 twin_x11_work 將更新的內容顯示於視窗中。
函式 twin_x11_work 接著呼叫函式 twin_screen_update 更新內容，函式 twin_screen_update 首先利用函式 _twin_x11_put_begin 建立 twin 欲更新內容同等範圍的 XImage tx->image 並且利用函式 _twin_x11_put_span 逐一將更新內容寫入 tx->image 中。
最後利用函式 XPutImage 將 tx->image 顯示於指定區域。顯示完成後清空 output buffer 完成一次顯示。

而事件處理則是由函式 twin_x11_read_events 負責處理，並且在函式 twin_x11_create_ext 中被加入任務佇列中。

流程為 twin_x11_create_ext

\to

twin_x11_work

\to

twin_screen_update _twin_x11_put_begin

\to

_twin_x11_put_span XPutImage

\to

XFlush。

SDL 結構體

透過 pixels 作為將更新內容寫入視窗的暫存圖像 buffer ，之後將 pixels 的內容複製到 texture 再傳入 render 作算繪。

注意用語！參見資訊科技詞彙翻譯

注意 : 不能夠直接對 SDL_Renderer render 作寫入，因為 render double buffering 的特性，使得 render 中不一定會保存上一次寫入的資料。因此，除非每一次都會算繪整個畫面，不然直接寫入 render 容易造成未預期行為。

typedef struct _twin_x11 {
    twin_screen_t   *screen;
    SDL_Window	*win;
    int         dpy;
    int		    depth;
    int         *pixels;
    SDL_Renderer    *render;
    SDL_Texture     *texture;
    twin_coord_t     width;
    twin_coord_t     height;
    int		    image_y;
} twin_x11_t;

SDL

實際的運作流程與 x11 類似，主要差別在於，x11 函式庫中 XPutImage 會自行處理 double buffering ，而 SDL 中，雖然 SDL_Renderer 也具有 double buffering 的特性，但並不會儲存目前顯示的狀態與畫面資料，因此需要我們自己管理與更新畫面。

pixels

我們使用 pixels 紀錄畫面資料，首先依照視窗大小為 pixels 配置記憶體

twin_x11_create_ext (int width, int height, int handle_events)
{
    ...
    tx->pixels = malloc(width * height * sizeof(uint32_t));
    memset(tx->pixels, 255, width * height * sizeof(uint32_t));
    ...
}

在算繪畫面時，相比於直接繪製於 SDL_Renderer render，我們先將所需畫面繪製於 pixels 中，再透過 SDL_Texture text 將 pixels 的內容轉入 render 中，再作算繪。

static void
_twin_x11_put_span (twin_coord_t    left,
		    twin_coord_t    top,
		    twin_coord_t    right,
		    twin_argb32_t   *pixels,
		    void	    *closure)
{
    ...
    for (ix = left; ix < right; ix++)
    {
		twin_argb32_t	pixel = *pixels++;
		...
		tx->pixels[iy*tx->screen->width + ix] = pixel;
    }
    if ((top + 1 - tx->image_y) == tx->height)
    {
		SDL_UpdateTexture(tx->texture, NULL, tx->pixels, tx->screen->width * sizeof(uint32_t));
        SDL_RenderCopy(tx->render, tx->texture, NULL, NULL);
        SDL_RenderPresent(tx->render);
    }
}

如果直接將畫面寫入 render 會發生什麼事？

static void
_twin_x11_put_span (twin_coord_t    left,
		    twin_coord_t    top,
		    twin_coord_t    right,
		    twin_argb32_t   *pixels,
		    void	    *closure)
{
    ...

-		tx->pixels[iy*tx->screen->width + ix] = pixel;
+       SDL_SetRenderDrawColor(renderer, piexl>>16&0xff, piexl>>8&0xff, pixel&0xff, 255);
+       SDL_RenderDrawPoint(renderer, ix, iy);

    ...

-    	SDL_UpdateTexture(tx->texture, NULL, tx->pixels, tx->screen->width * sizeof(uint32_t));
-       SDL_RenderCopy(tx->render, tx->texture, NULL, NULL);
    ...
}

SDL_Render 的運作原理為，SDL_Render render 中會持有一個 backbuffer ，當我們透過 SDL API 對 render 做更新時，實際上就是更改 backbuffer 的內容，最後要作為畫面顯示時，render 會將自己所持有的 backbuffer 與目前顯示的 backbuffer 交換，交換後的 backbuffer 繪作為一張完整的圖片更新於畫面，而 render 會持有就畫面的 backbuffer。

但 SDL 並不會保證 render 所持有的 backbuffer 與畫面顯示的 backbuffer 在程式結束前都不會被重新配置，如果我們將更新的內容直接寫入 render 中，由於 rect cliping 演算法的關係，我們的每一次更新並不會重新寫入整個畫面，而是指更新內容有所改動的部分，因此背景畫面就會的不穩定，甚至有丟失的風險。

考慮以下修改:

diff --git a/Makefile b/Makefile
index 674c196..5fcc422 100644
--- a/Makefile
+++ b/Makefile
@@ -62,7 +62,7 @@ all: $(PRODUCTS) xdemo
 xdemo: $(PRODUCTS) $(DEMO_OBJS) $(X_BACKEND_OBJS)
        $(CC) $(CFLAGS) -o $@ \
                $(DEMO_OBJS) $(X_BACKEND_SRCS) $(PRODUCTS) \
-               -l SDL2-2.0.0 -l SDL2_image-2.0.0 -l SDL2_ttf-2.0.0
+               $(shell sdl2-config --libs)
 
 fbdemo: $(PRODUCTS) $(DEMO_OBJS) $(FB_BACKEND_OBJS)
        $(CC) $(CFLAGS) -o $@ \
@@ -76,7 +76,8 @@ CFLAGS = \
        -I include \
        -I backend \
        -I apps \
-       -g -DEZXML_NOMMAP
+       -g -DEZXML_NOMMAP \
+       $(shell sdl2-config --cflags)
 
 %.o: %.c
        $(CC) $(CFLAGS) -o $@ -c $< -lm
diff --git a/backend/twin_x11.h b/backend/twin_x11.h
index 3a093d4..a272f96 100644
--- a/backend/twin_x11.h
+++ b/backend/twin_x11.h
@@ -12,8 +12,8 @@
 // #include <X11/Xlib.h>
 // #include <X11/Xutil.h>
 // #include <X11/Xatom.h>
-#include <SDL2/SDL.h>
-#include <SDL2/SDL_render.h>
+#include <SDL.h>
+#include <SDL_render.h>
 
 typedef struct _twin_x11 {
     twin_screen_t   *screen;

考慮到 Linux framebuffer 和 SDL 程式碼的行為，應參照 backend/twin_fbdev.c 的程式碼，留意 _IMMEDIATE_REFRESH，使二者行為更一致。

fbdev 的實作中，由於 glibc (或其他 libc) 通常會針對硬體架構提供最佳化的 memcpy，因此我們可改寫如下:

--- a/backend/twin_fbdev.c
+++ b/backend/twin_fbdev.c
@@ -53,8 +53,7 @@ static void _twin_fbdev_put_span (twin_coord_t    left,
 
        dest = (unsigned int *)(tf->fb_ptr + top * tf->fb_fix.line_length);
        dest += left;
-       while(width--)
-               *(dest++) = *(pixels++);
+       memcpy(dest, pixels, width * sizeof(unsigned int));
 }
 
 static twin_bool_t twin_fbdev_apply_config(twin_fbdev_t *tf)

TODO: 將 libcg 移植到 twin

選定展示用的程式和 API 集合

理解 libcg 的矩陣運算

cairo 是一個用於提供向量圖形繪圖的函式庫，Cairo 提供在多個背景下做二維空間的繪圖，進階的更可以使用硬體加速功能。

libcg 為開放原始碼的 2D 向量圖型函式庫，原始程式碼約 6000 行，API 介面模仿 cairo，因此我們可以透過 cairo API 了解 libcg 的矩陣運算功能。

data struct

cg_matrix_t 保存矩陣變換，例如縮放、旋轉、剪切或這些變換的組合。

struct cg_matrix_t {
	double a; double b;
	double c; double d;
	double tx; double ty;
};

函式 cg_matrix_map_point 將給定之點 (x,y) 做矩陣變換，該點的轉換如下:

x_new = a * x + b * y + tx;
y_new = c * x + d * y + ty;

矩陣運算

cg_matrix_translate

void cg_matrix_translate(struct cg_matrix_t * m, double tx, double ty)

將 (tx, ty) 的平移應用於變換矩陣中。新的矩陣變換效果為先將座標平移 (tx, ty) ，再將原本的變換效果應用於座標。

cg_matrix_scale

void cg_matrix_scale(struct cg_matrix_t * m, double sx, double sy)

將 (sx, sy) 的縮放應用於變換矩陣中。新的矩陣變換效果為先將座標縮放 (sx, sy) ，再將原本的變換效果應用於座標。

cg_matrix_rotate

void cg_matrix_rotate(struct cg_matrix_t * m, double r)

將 r 的旋轉應用於變換矩陣中。新的矩陣變換效果為先將座標旋轉弧度 r ，再將原本的變換效果應用於座標。
其中， r 為旋轉角度，以弧度為單位。

cg_matrix_multiply

void cg_matrix_multiply(struct cg_matrix_t * m, struct cg_matrix_t * m1, struct cg_matrix_t * m2)

將 m1 和 m2 中的變換矩陣相乘並將結果儲存在矩陣 m 中。所得變換的效果是先將 m1 中的變換套用到座標，然後將 m2 中的變換套用到座標。

指標 m 可以與 m1 或 m2 相同。

cg_matrix_invert

void cg_matrix_invert(struct cg_matrix_t * m)

更改矩陣 m 為其反矩陣。並非所有變換矩陣都有反矩陣；如果矩陣的行列式為 0 ，則該矩陣沒有反矩陣，輸出矩陣不變。

將 double (64-bit, at least) 換成 fixed point (32-bit)

Optimize 2D line drawing for RV32IM using fixed-point arithmetic

定點數運算中， Q notation 是一種指定二進位定點數參數的方法。

Qf：稱作「Q 格式」，Q15 表示 15 個小數位。這樣的記法存在歧義，因為沒有包含字長，但實際使用時一般可以根據目標處理器平台判斷字長為 16 位元或者 32 位元。
Qm.f：無歧義的 Q 格式，因為整個字是二補數，所以符號位元長度可以根據其它資訊推導而來。例如 Q1.30 表示該數有 1 個整數位、30 個小數位，是 32 位元二補數。

sqrt 的計算

此處，利用第三周測驗一的方法實作，並且轉為定點數。
實作如下：

q_fmt fixed_sqrt(q_fmt x)
{
    if (x <= 1 << Q) /* Assume x is always positive */
        return x;

    q_fmt z = 0;
    for (q_fmt m = 1UL << ((31 - __builtin_clz(x)) & ~1UL); m; m >>= 2) {
        int b = z + m;
        z >>= 1;
        if (x >= b)
            x -= b, z += m;
    }
    z = z << Q / 2;
    return z;
}

缺點是會損失

Q / 2

的精度，改良版本為:

static inline q_fmt sqrtq(q_fmt x){
    if(x <= 0) return 0;
    if(x < I2Q(1) + (1<<(Q/2-1)) && x > I2Q(1) - (1<<(Q/2-1))) return I2Q(1);

    int offset = 0;
    for(q_fmt i = x; !(0x40000000 & i); i <<= 1){
        ++offset;
    }
    offset = (offset & ~1);
    x <<= offset;

    offset >>= 1;
    offset -= (Q >> 1);

    q_fmt z = 0;
    for (q_fmt m = 1UL << ((31 - __builtin_clz(x)) & ~1UL); m; m >>= 2) {
        int b = z + m;
        z >>= 1;
        if (x >= b)
            x -= b, z += m;
    }

    return (offset >= 0)? z >> offset : z << (-offset) ;
}

sin 與 cos 的計算

我們知道 sin(

\frac{π}{2}

)、cos(

\frac{π}{2}

) 分別為 1 和 0 ，因此我們可以透過半角公式求得弧度

\frac{π}{4}

、

\frac{π}{8}

、

\frac{π}{16}

的 sin 與 cos 值。

s i n (\frac{A}{2}) = \pm \sqrt{\frac{1 - c o s (A)}{2}}

c o s (\frac{A}{2}) = \pm \sqrt{\frac{1 + c o s (A)}{2}}

static inline q_fmt cosHalf(q_fmt cosx)
{
    return sqrtq((I2Q(1) + cosx) >> 1);
}
static inline q_fmt sinHalf(q_fmt cosx)
{
    return sqrtq((I2Q(1) - cosx) >> 1);
}

接著透過和差公式計算給定的弧度，將弧度 radius 除以

\frac{π}{2}

並取 4 的餘數即可得到象限 region ， radius 取

\frac{π}{2}

的餘數即一象限內的角度 theta 。
接著利用和差公式陸續以弧度

\frac{π}{4}

\frac{π}{8}

\frac{π}{16}

… 逼近 theta ，最後透過象限 region 判斷 sin 與 cos 的正負。

和差公式:

s i n (x + y) = s i n (x) c o s (y) + c o s (x) s i n (y)

c o s (x + y) = c o s (x) c o s (y) - s i n (x) s i n (y)

static inline void sinAndcos(q_fmt radius, q_fmt *sin_t, q_fmt *cos_t)
{
    int region = (radius / (PI_Q >> 1)) & 0b11;

    // theta must be pi/2 to 0 and start from x-axis
    q_fmt theta = radius % (PI_Q >> 1);
    if (region & 0b1) theta = (PI_Q >> 1) - theta;

    // start from cos(pi/2) and sin(pi/2)
    radius = PI_Q >> 1;
    q_fmt cos_rad = 0;
    q_fmt sin_rad = I2Q(1);

    // start from cos(0) and sin(0)
    *cos_t = I2Q(1);
    *sin_t = 0;

    while(radius > 0){
        if(radius <= theta){
            theta -= radius;
            q_fmt tmp = q_mul(*cos_t, cos_rad) - q_mul(*sin_t, sin_rad);
            *sin_t = q_mul(*sin_t, cos_rad) + q_mul(*cos_t, sin_rad);
            *cos_t = tmp;
        }
        if(theta == 0) break;
        radius >>= 1;
        sin_rad = sinHalf(cos_rad);
        cos_rad = cosHalf(cos_rad);
    }

    if(region == 0b10 || region == 0b01) *cos_t *= -1;
    if(region & 0b10) *sin_t *= -1;
}

矩陣運算後的誤差

依序執行
cg_translate(ctx, 128.0, 128.0);
cg_rotate(ctx, 45 * M_PI / 180);
cg_scale(ctx, 256.0 / img->width, 256.0 / img->height);
cg_translate(ctx, -0.5 * img->width, -0.5 * img->height);
後的結果

double

matrix0 a , b : 1.000000 , 0.000000
matrix0 c , d : 0.000000 , 1.000000
matrix0 tx, ty: 0.000000 , 0.000000
matrix1 a , b : 1.000000 , 0.000000
matrix1 c , d : 0.000000 , 1.000000
matrix1 tx, ty: 128.000000 , 128.000000
matrix2 a , b : 0.707107 , 0.707107
matrix2 c , d : -0.707107 , 0.707107
matrix2 tx, ty: 128.000000 , 128.000000
matrix3 a , b : 1.414214 , 1.414214
matrix3 c , d : -1.414214 , 1.414214
matrix3 tx, ty: 128.000000 , 128.000000
matrix4 a , b : 1.414214 , 1.414214
matrix4 c , d : -1.414214 , 1.414214
matrix4 tx, ty: 128.000000 , -53.019336

q_fmt

matrix0 a , b : 1.000000 , 0.000000
matrix0 c , d : 0.000000 , 1.000000
matrix0 tx, ty: 0.000000 , 0.000000
matrix1 a , b : 1.000000 , 0.000000
matrix1 c , d : 0.000000 , 1.000000
matrix1 tx, ty: 128.000000 , 128.000000
matrix2 a , b : 0.707092 , 0.707092
matrix2 c , d : -0.707092 , 0.707092
matrix2 tx, ty: 128.000000 , 128.000000
matrix3 a , b : 1.414185 , 1.414185
matrix3 c , d : -1.414185 , 1.414185
matrix3 tx, ty: 128.000000 , 128.000000
matrix4 a , b : 1.414185 , 1.414185
matrix4 c , d : -1.414185 , 1.414185
matrix4 tx, ty: 128.000000 , -53.015625

在計算弧度的時候需要考慮精度問題，如:

\frac{45^{\circ} \times π}{180^{\circ}}

，如果先計算

\frac{π}{180^{\circ}}

在計算

45^{\circ} \times r a d i u s

會因為先除在乘而失去一些精度，造成誤差擴大。

TODO: 撰寫效能評比程式並定位出效能瓶頸

TODO: 確保 twin + libcg 均可採用定點數

TODO: 將上述成果整合到 Linux framebuffer

搭配 /dev/event* 作為輸入