Try   HackMD

Preface

跟著Jserv老師學習coroutine時,老師在上課期間原本要demo cserv,結果因為使用的server是Aarch64架構而作罷了。小弟我非常膨脹的想,不就是稍微加一點組合語言的雜活嗎?這麼簡單的事我分分鐘就搞定了,於是我就興沖沖地抓code來改,想不到這就是我上班偷懶下班失眠的開始,在此記錄一下。

Code Trace

其實要改的東西也不多,就只是coroutine的context switch從x64換成aarch64的組語即可,其中context_switch跟linux kernel在__switch_to_asm做的事情有87%像。這不就簡單了嗎,只要把kernel中對應aarch64的cpu_switch_to抄過來就完事了,聰明如我抄完之後就出現了以下畫面:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

雖然看起來是有在執行,但是我的CPU卻忙到爆炸並且我用wget localhost:8081會直接停住啥屁都看不到,用gdb看起來也好好的看不出問題所在。偷偷回去用x64跑跑看才發現由於我沒有先理解這個專案的行為,所以gdb設定錯誤導致我看不出問題在哪裡。因為此專案會fork出CPU數量的processes,而gdb在執行時預設是追蹤parent process,所以要trace此問題需要set follow-fork-mode child才會看出問題出在child processes執行到一半SIGSEGV。既然知道有Segmentation Fault就可以用backtrace確認是從哪裡走到哪裡死掉的。
//TODO:補關於bt的圖
兜兜轉轉了一圈為了瞭解是怎麼死的還是要看清楚在x86_64架構下是怎麼運作的,下面是我追蹤coro_stack_init一路到coro_routine_proxy後對coro_stack的理解 。

coro_stack

void coro_stack_init(struct context *ctx, struct coro_stack *stack, coro_routine routine, void *args) { ctx->sp = (void **) stack->ptr; *--ctx->sp = (void *) routine; *--ctx->sp = (void *) args; *--ctx->sp = (void *) coro_routine_entry; ctx->sp -= NUM_SAVED; }

根據coro_stack_init推論為每個process建立的coro_stack長相如下:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

__asm__( ".text\n" ".globl context_switch\n" "context_switch:\n" #if defined(__x86_64__) #define NUM_SAVED 6 "push %rbp\n" "push %rbx\n" "push %r12\n" "push %r13\n" "push %r14\n" "push %r15\n" "mov %rsp, (%rdi)\n" "mov (%rsi), %rsp\n" "pop %r15\n" "pop %r14\n" "pop %r13\n" "pop %r12\n" "pop %rbx\n" "pop %rbp\n" "pop %rcx\n" "jmp *%rcx\n" #else #error "unsupported architecture" #endif );

然後根據context_switch的21, 22行描述,推論最後%rcx指向的address就是coro_stack中的coro_routine_entry。

__asm__( ".text\n" "coro_routine_entry:\n" #if defined(__x86_64__) "pop %rdi\n" "pop %rcx\n" "call *%rcx\n" #else #error "unsupported architecture" #endif );

最後根據coro_routine_entry第5行推論%rdi為coro_stack中的args並且最後會跳去執行coro_stack中預先註冊好的coro_routine_proxy(圖中的roution)。

所以執行流程是context_switch -> coro_routine_entry -> coro_routine_proxy。


問題來了,為什麼要繞這麼大一圈一直跳來跳去呢?
在看完Coroutine in Depth - Context Switch豁然開朗,真的很佩服當初這樣設計的人。

既然知道整個流程要復刻就容易了,只需要確保ARM Calling convention中要保存的暫存器把它們塞到stack中就OK了。

issue

開發中遇到大大小小奇怪的現象都跟我想的不一樣,記錄一下。

linker

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

左邊是編譯中產出的組合語言中間檔src/coro/switch.i, 右邊則是objdump的結果objdump -S switch.o

奇怪的事情是左邊第56行顯示要去查詢的got是coro_routine_entry(與程式碼相符合),但是在轉成obj file時卻顯示要去抓的label變成context_switch

這部分目前猜測是因為linker在link symbols時coro_routine_entry因為沒有被宣告成.globl所以linker不知怎麼地就把coro_routine_entry上面的context_switch放進去當作跳躍目的了。

Usage Knowledge

x86_64 vs aarch64

instruction set:
由於x86_64是CSIC指令集所以有很多組語指令在aarch64上需要透過組合的方式去實現,其中就包含push & pop這種指令。

gdb debug

Debugging Forks
追蹤有fork的程式時記得要設定
Examining the Symbol Table
快速定位你要追蹤symbols的address,方便下斷點
peda:
好用的GDB插件,讓你可以很快地掌握各種資訊包含stack, reg, source code(need -g when compile)等等
peda-arm
支援aarch64版本

Home Work

Use uftrace to verify performance.

reference

Coroutine in Depth - Context Switch
A64 general instructions in alphabetical order
Declaring Attributes of Functions
GDB Basic Command
Evolution of the x86 context switch in Linux